evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Susheel Khiani	793d80f988	mm: Update is_vmalloc_addr to account for vmalloc savings is_vmalloc_addr currently assumes that all vmalloc addresses exist between VMALLOC_START and VMALLOC_END. This may not be the case when interleaving vmalloc and lowmem. Update the is_vmalloc_addr to properly check for this. Correspondingly we need to ensure that VMALLOC_TOTAL accounts for all the vmalloc regions when CONFIG_ENABLE_VMALLOC_SAVING is enabled. Change-Id: I5def3d6ae1a4de59ea36f095b8c73649a37b1f36 Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>	2016-03-22 11:03:59 -07:00
Susheel Khiani	c064333eac	msm: Allow lowmem to be non contiguous and mixed Currently on 32 bit systems, virtual space above PAGE_OFFSET is reserved for direct mapped lowmem and part of virtual address space is reserved for vmalloc. We want to optimize such as to have as much direct mapped memory as possible since there is penalty for mapping/unmapping highmem. Now, we may have an image that is expected to have a lifetime of the entire system and is reserved in physical region that would be part of direct mapped lowmem. The physical memory which is thus reserved is never used by Linux. This means that even though the system is not actually accessing the virtual memory corresponding to the reserved physical memory, we are still losing that portion of direct mapped lowmem space. So by allowing lowmem to be non contiguous we can give this unused virtual address space of reserved region back for use in vmalloc. Change-Id: I980b3dfafac71884dcdcb8cd2e4a6363cde5746a Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>	2016-03-22 11:03:58 -07:00
Susheel Khiani	7215a1cfee	msm: Increase the kernel virtual area to include lowmem Even though lowmem is accounted for in vmalloc space, allocation comes only from the region bounded by VMALLOC_START and VMALLOC_END. The kernel virtual area can now allocate from any unmapped region starting from PAGE_OFFSET. Change-Id: I291b9eb443d3f7445fd979bd7b09e9241ff22ba3 Signed-off-by: Neeti Desai <neetid@codeaurora.org> Signed-off-by: Susheel Khiani <skhiani@codeaurora.org>	2016-03-22 11:03:56 -07:00
Vinayak Menon	8dd433c495	mm: showmem: make the notifiers atomic There are places in kernel like the lowmemorykiller which invokes show_mem_call_notifiers from an atomic context. So move from a blocking notifier to atomic. At present the notifier callbacks does not call sleeping functions, but it should be made sure, it does not happen in future also. Change-Id: I9668e67463ab8a6a60be55dbc86b88f45be8b041 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:55 -07:00
Vinayak Menon	3086328d5f	mm: page-writeback: fix page state calculation in throttle_vm_writeout It was found that a number of tasks were blocked in the reclaim path (throttle_vm_writeout) for seconds, because of vmstat_diff not being synced in time. Fix that by adding a new function global_page_state_snapshot. Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Change-Id: Iec167635ad724a55c27bdbd49eb8686e7857216c	2016-03-22 11:03:54 -07:00
Vinayak Menon	d487a9f1f7	mm: compaction: fix the page state calculation in too_many_isolated Commit "mm: vmscan: fix the page state calculation in too_many_isolated" fixed an issue where a number of tasks were blocked in reclaim path for seconds, because of vmstat_diff not being synced in time. A similar problem can happen in isolate_migratepages_block, where similar calculation is performed. This patch fixes that. Change-Id: Ie74f108ef770da688017b515fe37faea6f384589 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:54 -07:00
Vinayak Menon	910a8bd108	mm: vmpressure: account allocstalls only on higher pressures At present any vmpressure value is scaled up if the pages are reclaimed through direct reclaim. This can result in false vmpressure values. Consider a case where a device is booted up and most of the memory is occuppied by file pages. kswapd will make sure that high watermark is maintained. Now when a sudden huge allocation request comes in, the system will definitely have to get into direct reclaims. The vmpressures can be very low, but because of allocstall accounting logic even these low values will be scaled to values nearing 100. This can result in unnecessary LMK kills for example. So define a tunable threshold for vmpressure above which the allocstalls will be accounted. CRs-fixed: 893699 Change-Id: Idd7c6724264ac89f1f68f2e9d70a32390ffca3e5 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:53 -07:00
Vinayak Menon	e5ce54a9cb	mm: swap: don't delay swap free for fast swap devices There are couple of issues with swapcache usage when ZRAM is used as swap device. 1) Kernel does a swap readahead which can be around 6 to 8 pages depending on total ram, which is not required for zram since accesses are fast. 2) Kernel delays the freeing up of swapcache expecting a later hit, which again is useless in the case of zram. 3) This is not related to swapcache, but zram usage itself. As mentioned in (2) kernel delays freeing of swapcache, but along with that it delays zram compressed page free also. i.e. there can be 2 copies, though one is compressed. This patch addresses these issues using two new flags QUEUE_FLAG_FAST and SWP_FAST, to indicate that accesses to the device will be fast and cheap, and instructs the swap layer to free up swap space agressively, and not to do read ahead. Change-Id: I5d2d5176a5f9420300bb2f843f6ecbdb25ea80e4 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:52 -07:00
Vinayak Menon	0a8bf43567	mm: vmpressure: scale pressure based on reclaim context The existing calculation of vmpressure takes into account only the ratio of reclaimed to scanned pages, but not the time spent or the difficulty in reclaiming those pages. For e.g. when there are quite a number of file pages in the system, an allocation request can be satisfied by reclaiming the file pages alone. If such a reclaim is successful, the vmpressure value will remain low irrespective of the time spent by the reclaim code to free up the file pages. With a feature like lowmemorykiller, killing a task can be faster than reclaiming the file pages alone. So if the vmpressure values reflect the reclaim difficulty level, clients can make a decision based on that, for e.g. to kill a task early. This patch monitors the number of pages scanned in the direct reclaim path and scales the vmpressure level according to that. Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Change-Id: I6e643d29a9a1aa0814309253a8b690ad86ec0b13	2016-03-22 11:03:51 -07:00
Vinayak Menon	fb880fe5d1	mm: vmpressure: allow in-kernel clients to subscribe for events Currently, vmpressure is tied to memcg and its events are available only to userspace clients. This patch removes the dependency on CONFIG_MEMCG and adds a mechanism for in-kernel clients to subscribe for vmpressure events (in fact raw vmpressure values are delivered instead of vmpressure levels, to provide clients more flexibility to take actions on custom pressure levels which are not currently defined by vmpressure module). Change-Id: I38010f166546e8d7f12f5f355b5dbfd6ba04d587 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:50 -07:00
Liam Mark	a886f65ded	mm/Kconfig: support forcing allocators to return ZONE_DMA memory Add a new config item, CONFIG_FORCE_ALLOC_FROM_DMA_ZONE, which can be used to optionally force certain allocators to always return memory from ZONE_DMA. This option helps ensure that clients who require ZONE_DMA memory are always using ZONE_DMA memory. Change-Id: Id2d36214307789f27aa775c2bef2dab5047c4ff0 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:49 -07:00
Vignesh Radhakrishnan	12471ac6f8	kmemleak : Make kmemleak_stack_scan optional using config Currently we have kmemleak_stack_scan enabled by default. This can hog the cpu with pre-emption disabled for a long time starving other tasks. Make this optional at compile time, since if required we can always write to sysfs entry and enable this option. Change-Id: Ie30447861c942337c7ff25ac269b6025a527e8eb Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>	2016-03-22 11:03:47 -07:00
Se Wang (Patrick) Oh	ab5d4ae2e4	mm: switch KASan hook calling order in page alloc/free path When CONFIG_PAGE_POISONING is enabled, the pages are poisoned after setting free page in KASan Shadow memory and KASan reports the read after free warning. The same thing happens in the allocation path. So change the order of calling KASan_alloc/free API so that pages poisoning happens when the pages are in alloc status in KASan shadow memory. following is the KASan report for reference. ================================================================== BUG: KASan: use after free in memset+0x24/0x44 at addr ffffffc000000000 Write of size 4096 by task swapper/0 page:ffffffbac5000000 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-g5a4a5d5-07242-g6938a8b-dirty #1 Hardware name: Qualcomm Technologies, Inc. MSM 8996 v2 + PMI8994 MTP (DT) Call trace: [<ffffffc000089ea4>] dump_backtrace+0x0/0x1c4 [<ffffffc00008a078>] show_stack+0x10/0x1c [<ffffffc0010ecfd8>] dump_stack+0x74/0xc8 [<ffffffc00020faec>] kasan_report_error+0x2b0/0x408 [<ffffffc00020fd20>] kasan_report+0x34/0x40 [<ffffffc00020f138>] __asan_storeN+0x15c/0x168 [<ffffffc00020f374>] memset+0x20/0x44 [<ffffffc0002086e0>] kernel_map_pages+0x238/0x2a8 [<ffffffc0001ba738>] free_pages_prepare+0x21c/0x25c [<ffffffc0001bc7e4>] __free_pages_ok+0x20/0xf0 [<ffffffc0001bd3bc>] __free_pages+0x34/0x44 [<ffffffc0001bd5d8>] __free_pages_bootmem+0xf4/0x110 [<ffffffc001ca9050>] free_all_bootmem+0x160/0x1f4 [<ffffffc001c97b30>] mem_init+0x70/0x1ec [<ffffffc001c909f8>] start_kernel+0x2b8/0x4e4 [<ffffffc001c987dc>] kasan_early_init+0x154/0x160 Change-Id: Idbd3dc629be57ed55a383b069a735ae3ee7b9f05 Signed-off-by: Se Wang (Patrick) Oh <sewango@codeaurora.org>	2016-03-22 11:03:46 -07:00
Lee Susman	34144c841b	mm: change initial readahead window size calculation Change the logic which determines the initial readahead window size such that for small requests (one page) the initial window size will be x4 the size of the original request, regardless of the VM_MAX_READAHEAD value. This prevents a rapid ramp-up that could be caused due to increasing VM_MAX_READAHEAD. Change-Id: I93d59c515d7e6c6d62348790980ff7bd4f434997 Signed-off-by: Lee Susman <lsusman@codeaurora.org>	2016-03-22 11:03:39 -07:00
Liam Mark	78ec19c5f9	mm: split_free_page ignore memory watermarks for CMA Memory watermarks were sometimes preventing CMA allocations in low memory. Change-Id: I550ec987cbd6bc6dadd72b4a764df20cd0758479 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:38 -07:00
Laura Abbott	81f6201534	mm: Don't put CMA pages on per cpu lists CMA allocations rely on being able to migrate pages out quickly to fulfill the allocations. Most use cases for movable allocations meet this requirement. File system allocations may take an unaccpetably long time to migrate, which creates delays from CMA. Prevent CMA pages from ending up on the per-cpu lists to avoid code paths grabbing CMA pages on the fast path. CMA pages can still be allocated as a fallback under tight memory pressure. CRs-Fixed: 452508 Change-Id: I79a28f697275a2a1870caabae53c8ea345b4b47d Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2016-03-22 11:03:38 -07:00
Laura Abbott	e48a20a27c	mm: Add is_cma_pageblock definition Bring back the is_cma_pageblock definition for determining if a page is CMA or not. Change-Id: I39fd546e22e240b752244832c79514f109c8e84b Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2016-03-22 11:03:37 -07:00
Liam Mark	8918861861	mm: vmscan: support setting of kswapd cpu affinity Allow the kswapd cpu affinity to be configured. There can be power benefits on certain targets when limiting kswapd to run only on certain cores. CRs-fixed: 752344 Change-Id: I8a83337ff313a7e0324361140398226a09f8be0f Signed-off-by: Liam Mark <lmark@codeaurora.org> [imaund@codeaurora.org: Resolved trivial context conflicts.] Signed-off-by: Ian Maund <imaund@codeaurora.org>	2016-03-22 11:03:36 -07:00
Vinayak Menon	90863369e5	mm: vmscan: lock page on swap error in pageout A workaround was added ealier to move a page to active list if swapping to devices like zram fails. But this can result in try_to_free_swap being called from shrink_page_list, without a properly locked page. Lock the page when we indicate to activate a page in pageout(). Add a check to ensure that error is on swap, and clear the error flag before moving the page to active list. CRs-fixed: 760049 Change-Id: I77a8bbd6ed13efdec943298fe9448412feeac176 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-22 11:03:35 -07:00
Liam Mark	8a94faffd0	mm: vmscan: support complete shrinker reclaim Ensure that shrinkers are given the option to completely drop their caches even when their caches are smaller than the batch size. This change helps improve memory headroom by ensuring that under significant memory pressure shrinkers can drop all of their caches. This change only attempts to more aggressively call the shrinkers during background memory reclaim, inorder to avoid hurting the perforamnce of direct memory reclaim. Change-Id: I8dbc29c054add639e4810e36fd2c8a063e5c52f3 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:33 -07:00
David Keitel	747b0dceae	mm: slub: panic for object and slab errors If the SLUB_DEBUG_PANIC_ON Kconfig option is selected, also panic for object and slab errors to allow capturing relevant debug data. Change-Id: Idc582ef48d3c0d866fa89cf8660ff0a5402f7e15 Signed-off-by: David Keitel <dkeitel@codeaurora.org>	2016-03-22 11:03:32 -07:00
David Keitel	a31c7a448b	defconfig: 8994: enable CONFIG_DEBUG_SLUB_PANIC_ON Add the DEBUG_SLUB_PANIC_ON option to KCONFIG preventing the existing defconfig option from being overwritten by make config. This will induce a panic if slab debug catches corruptions within the padding of a given object. The intention here is to induce collection of data immediately after the corruption is detected with the goal to catch the possible source of the corruption. Change-Id: Ide0102d0761022c643a761989360ae5c853870a8 Signed-off-by: David Keitel <dkeitel@codeaurora.org> [imaund@codeaurora.org: Resolved trivial merge conflicts.] Signed-off-by: Ian Maund <imaund@codeaurora.org> [lmark@codeaurora.org: ensure change does not create arch/arm64/configs/msm8994_defconfig file] Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:31 -07:00
Abhimanyu Garg	4f98dec419	KSM: Start KSM by default Strat running KSM by default at device bootup. Change-Id: I7926c529ea42675f4279bffaf149a0cf1080d61b Signed-off-by: Abhimanyu Garg <agarg@codeaurora.org>	2016-03-22 11:03:30 -07:00
Liam Mark	71edb92d82	mm, oom: make dump_tasks public Allow other functions to dump the list of tasks. Useful for when debugging memory leaks. Change-Id: I76c33a118a9765b4c2276e8c76de36399c78dbf6 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:30 -07:00
Laura Abbott	1fe8bfe88a	ksm: Add showmem notifier KSM is yet another framework which may obfuscate some memory problems. Use the showmem notifier to show how KSM is being used to give some insight into potential issues or non-issues. Change-Id: If82405dc33f212d085e6847f7c511fd4d0a32a10 Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2016-03-22 11:03:28 -07:00
David Keitel	c5eff6c321	mm: slub: Panic instead of restoring corrupted bytes Resiliency of slub was added for production systems in an attempt to restore corruptions and allow production environments to continue to run. In debug setups, this may no be desirable. Thus rather than attempting to restore corrupted bytes in poisoned zones, panic to attempt to catch more context of what was going on in the system at the time. Add the CONFIG_SLUB_DEBUG_PANIC_ON defconfig option to allow debug builds to turn on this panic option. Change-Id: I01763e8eea40a4544e9b7e48c4e4d40840b6c82d Signed-off-by: David Keitel <dkeitel@codeaurora.org>	2016-03-22 11:03:27 -07:00
Chintan Pandya	9ec1d6c8e5	ksm: Provide support to use deferred timers for scanner thread KSM thread to scan pages is getting schedule on definite timeout. That wakes up CPU from idle state and hence may affect the power consumption. Provide an optional support to use deferred timer which suites low-power use-cases. To enable deferred timers, $ echo 1 > /sys/kernel/mm/ksm/deferred_timer Change-Id: I07fe199f97fe1f72f9a9e1b0b757a3ac533719e8 Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>	2016-03-22 11:03:27 -07:00
Liam Mark	585690954e	mm: vmscan: support equal reclaim for anon and file pages When performing memory reclaim support treating anonymous and file backed pages equally. Swapping anonymous pages out to memory can be efficient enough to justify treating anonymous and file backed pages equally. CRs-Fixed: 648984 Change-Id: I6315b8557020d1e27a34225bb9cefbef1fb43266 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-03-22 11:03:26 -07:00
Olav Haugan	80370b5f59	mm: vmscan: Move pages that fail swapout to LRU active list Move pages that fail swapout to the LRU active list to reduce pressure on swap device when swapping out is already failing. This helps when using a pseudo swap device such as zram which starts failing when memory is low. Change-Id: Ib136cd0a744378aa93d837a24b9143ee818c80b3 Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>	2016-03-22 11:03:25 -07:00
Laura Abbott	a7470fb452	mm: Remove __init annotations from free_bootmem_late free_bootmem_late is currently set up to only be used in init functions. Some clients need to use this function past initcalls. The functions themselves have no restrictions on being used later minus the __init annotations so remove the annotation. Change-Id: I7c7e15cf2780a8843ebb4610da5b633c9abb0b3d Signed-off-by: Laura Abbott <lauraa@codeaurora.org> [abhimany@codeaurora.org: resolve minor conflict and remove __init from nobootmem.c] Signed-off-by: Abhimanyu Kapur <abhimany@codeaurora.org>	2016-03-22 11:03:23 -07:00
Laura Abbott	8f13b60413	mm: Mark free pages as read only Drivers have a tendency to scribble on everything, including free pages. Make life easier by marking free pages as read only when on the buddy list and re-marking as read/write when allocating. Change-Id: I978ed2921394919917307b9c99217fdc22f82c59 Signed-off-by: Laura Abbott <lauraa@codeaurora.org> (cherry picked from commit 752f5aecb0511c4d661dce2538c723675c1e6449)	2016-03-22 11:03:20 -07:00
Laura Abbott	416cca9db2	mm: Add notifier framework for showing memory There are many drivers in the kernel which can hold on to lots of memory. It can be useful to dump out all those drivers at key points in the kernel. Introduct a notifier framework for dumping this information. When the notifiers are called, drivers can dump out the state of any memory they may be using. Change-Id: Ifb2946964bf5d072552dd56d8d6dfdd794af6d84 Signed-off-by: Laura Abbott <lauraa@codeaurora.org>	2016-03-22 11:03:15 -07:00
Stephen Boyd	75defbf367	memblock: Add memblock_overlaps_memory() Add a new function, memblock_overlaps_memory(), to check if a region overlaps with a memory bank. This will be used by peripheral loader code to detect when kernel memory would be overwritten. Change-Id: I851f8f416a0f36e85c0e19536b5209f7d4bd431c Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> (cherry picked from commit cc2753448d9f2adf48295f935a7eee36023ba8d3) Signed-off-by: Josh Cartwright <joshc@codeaurora.org>	2016-03-22 11:03:13 -07:00
dcashman	d49d88766b	FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR. (cherry picked from commit https://lkml.org/lkml/2015/12/21/337) ASLR only uses as few as 8 bits to generate the random offset for the mmap base address on 32 bit architectures. This value was chosen to prevent a poorly chosen value from dividing the address space in such a way as to prevent large allocations. This may not be an issue on all platforms. Allow the specification of a minimum number of bits so that platforms desiring greater ASLR protection may determine where to place the trade-off. Bug: 24047224 Signed-off-by: Daniel Cashman <dcashman@android.com> Signed-off-by: Daniel Cashman <dcashman@google.com> Change-Id: Ibf9ed3d4390e9686f5cc34f605d509a20d40e6c2	2016-02-16 13:54:14 -08:00
Colin Cross	586278d78b	mm: add a field to store names for private anonymous memory Userspace processes often have multiple allocators that each do anonymous mmaps to get memory. When examining memory usage of individual processes or systems as a whole, it is useful to be able to break down the various heaps that were allocated by each layer and examine their size, RSS, and physical memory usage. This patch adds a user pointer to the shared union in vm_area_struct that points to a null terminated string inside the user process containing a name for the vma. vmas that point to the same address will be merged, but vmas that point to equivalent strings at different addresses will not be merged. Userspace can set the name for a region of memory by calling prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name); Setting the name to NULL clears it. The names of named anonymous vmas are shown in /proc/pid/maps as [anon:<name>] and in /proc/pid/smaps in a new "Name" field that is only present for named vmas. If the userspace pointer is no longer valid all or part of the name will be replaced with "<fault>". The idea to store a userspace pointer to reduce the complexity within mm (at the expense of the complexity of reading /proc/pid/mem) came from Dave Hansen. This results in no runtime overhead in the mm subsystem other than comparing the anon_name pointers when considering vma merging. The pointer is stored in a union with fieds that are only used on file-backed mappings, so it does not increase memory usage. Includes fix from Jed Davis <jld@mozilla.com> for typo in prctl_set_vma_anon_name, which could attempt to set the name across two vmas at the same time due to a typo, which might corrupt the vma list. Fix it to use tmp instead of end to limit the name setting to a single vma at a time. Change-Id: I9aa7b6b5ef536cd780599ba4e2fba8ceebe8b59f Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-02-16 13:54:13 -08:00
Rik van Riel	f8ade3666c	add extra free kbytes tunable Add a userspace visible knob to tell the VM to keep an extra amount of memory free, by increasing the gap between each zone's min and low watermarks. This is useful for realtime applications that call system calls and have a bound on the number of allocations that happen in any short time period. In this application, extra_free_kbytes would be left at an amount equal to or larger than than the maximum number of allocations that happen in any burst. It may also be useful to reduce the memory use of virtual machines (temporarily?), in a way that does not cause memory fragmentation like ballooning does. [ccross] Revived for use on old kernels where no other solution exists. The tunable will be removed on kernels that do better at avoiding direct reclaim. Change-Id: I765a42be8e964bfd3e2886d1ca85a29d60c3bb3e Signed-off-by: Rik van Riel<riel@redhat.com> Signed-off-by: Colin Cross <ccross@android.com>	2016-02-16 13:54:12 -08:00
Rebecca Schultz Zavin	b0e7a582b2	mm: vmscan: Add a debug file for shrinkers This patch adds a debugfs file called "shrinker" when read this calls all the shrinkers in the system with nr_to_scan set to zero and prints the result. These results are the number of objects the shrinkers have available and can thus be used an indication of the total memory that would be availble to the system if a shrink occurred. Change-Id: Ied0ee7caff3d2fc1cb4bb839aaafee81b5b0b143 Signed-off-by: Rebecca Schultz Zavin <rebecca@android.com>	2016-02-16 13:54:12 -08:00
Martijn Coenen	e212fc789a	UPSTREAM: memcg: Only free spare array when readers are done A spare array holding mem cgroup threshold events is kept around to make sure we can always safely deregister an event and have an array to store the new set of events in. In the scenario where we're going from 1 to 0 registered events, the pointer to the primary array containing 1 event is copied to the spare slot, and then the spare slot is freed because no events are left. However, it is freed before calling synchronize_rcu(), which means readers may still be accessing threshold->primary after it is freed. Fixed by only freeing after synchronize_rcu(). Signed-off-by: Martijn Coenen <maco@google.com>	2016-02-16 13:53:47 -08:00
Amit Pundir	703920c14a	cgroup: refactor allow_attach handler for 4.4 Refactor *allow_attach() handler to align it with the changes from mainline commit `1f7dd3e5a6` "cgroup: fix handling of multi-destination migration from subtree_control enabling". Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-02-16 13:53:46 -08:00
Amit Pundir	dccfe9526b	cgroup: memcg: pass correct argument to subsys_cgroup_allow_attach Pass correct argument to subsys_cgroup_allow_attach(), which expects 'struct cgroup_subsys_state ' argument but we pass 'struct cgroup ' instead which doesn't seem right. This fixes following 'incompatible pointer type' compiler warning: ---------- CC mm/memcontrol.o mm/memcontrol.c: In function ‘mem_cgroup_allow_attach’: mm/memcontrol.c:5052:2: warning: passing argument 1 of ‘subsys_cgroup_allow_attach’ from incompatible pointer type [enabled by default] In file included from include/linux/memcontrol.h:22:0, from mm/memcontrol.c:29: include/linux/cgroup.h:953:5: note: expected ‘struct cgroup_subsys_state ’ but argument is of type ‘struct cgroup ’ ---------- Signed-off-by: Amit Pundir <amit.pundir@linaro.org>	2016-02-16 13:53:44 -08:00
Rom Lemarchand	e6f5c0c0ec	memcg: add permission check Use the 'allow_attach' handler for the 'mem' cgroup to allow non-root processes to add arbitrary processes to a 'mem' cgroup if it has the CAP_SYS_NICE capability set. Bug: 18260435 Change-Id: If7d37bf90c1544024c4db53351adba6a64966250 Signed-off-by: Rom Lemarchand <romlem@android.com>	2016-02-16 13:53:43 -08:00
John Stultz	ae3d36fac1	ashmem: Add shmem_set_file to mm/shmem.c NOT FOR STAGING This patch re-adds the original shmem_set_file to mm/shmem.c and converts ashmem.c back to using it. CC: Brian Swetland <swetland@google.com> CC: Colin Cross <ccross@android.com> CC: Arve Hjønnevåg <arve@android.com> CC: Dima Zavin <dima@android.com> CC: Robert Love <rlove@google.com> CC: Greg KH <greg@kroah.com> Signed-off-by: John Stultz <john.stultz@linaro.org>	2016-01-11 09:48:49 +05:30
Michal Hocko	751e5f5c75	vmstat: allocate vmstat_wq before it is used kernel test robot has reported the following crash: BUG: unable to handle kernel NULL pointer dereference at 00000100 IP: [<c1074df6>] __queue_work+0x26/0x390 pdpt = 0000000000000000 pde = f000ff53f000ff53 *pde = f000ff53f000ff53 Oops: 0000 [#1] PREEMPT PREEMPT SMP SMP CPU: 0 PID: 24 Comm: kworker/0:1 Not tainted 4.4.0-rc4-00139-g373ccbe #1 Workqueue: events vmstat_shepherd task: cb684600 ti: cb7ba000 task.ti: cb7ba000 EIP: 0060:[<c1074df6>] EFLAGS: 00010046 CPU: 0 EIP is at __queue_work+0x26/0x390 EAX: 00000046 EBX: cbb37800 ECX: cbb37800 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: cb7bbe68 ESP: cb7bbe38 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 CR0: 8005003b CR2: 00000100 CR3: 01fd5000 CR4: 000006b0 Stack: Call Trace: __queue_delayed_work+0xa1/0x160 queue_delayed_work_on+0x36/0x60 vmstat_shepherd+0xad/0xf0 process_one_work+0x1aa/0x4c0 worker_thread+0x41/0x440 kthread+0xb0/0xd0 ret_from_kernel_thread+0x21/0x40 The reason is that start_shepherd_timer schedules the shepherd work item which uses vmstat_wq (vmstat_shepherd) before setup_vmstat allocates that workqueue so if the further initialization takes more than HZ we might end up scheduling on a NULL vmstat_wq. This is really unlikely but not impossible. Fixes: `373ccbe592` ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress") Reported-by: kernel test robot <ying.huang@linux.intel.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2016-01-08 23:47:54 -08:00
Heiko Carstens	6cdb18ad98	mm/vmstat: fix overflow in mod_zone_page_state() mod_zone_page_state() takes a "delta" integer argument. delta contains the number of pages that should be added or subtracted from a struct zone's vm_stat field. If a zone is larger than 8TB this will cause overflows. E.g. for a zone with a size slightly larger than 8TB the line mod_zone_page_state(zone, NR_ALLOC_BATCH, zone->managed_pages); in mm/page_alloc.c:free_area_init_core() will result in a negative result for the NR_ALLOC_BATCH entry within the zone's vm_stat, since 8TB contain 0x8xxxxxxx pages which will be sign extended to a negative value. Fix this by changing the delta argument to long type. This could fix an early boot problem seen on s390, where we have a 9TB system with only one node. ZONE_DMA contains 2GB and ZONE_NORMAL the rest. The system is trying to allocate a GFP_DMA page but ZONE_DMA is completely empty, so it tries to reclaim pages in an endless loop. This was seen on a heavily patched 3.10 kernel. One possible explaination seem to be the overflows caused by mod_zone_page_state(). Unfortunately I did not have the chance to verify that this patch actually fixes the problem, since I don't have access to the system right now. However the overflow problem does exist anyway. Given the description that a system with slightly less than 8TB does work, this seems to be a candidate for the observed problem. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-29 17:45:49 -08:00
Andrew Banman	5f0f2887f4	mm/memory_hotplug.c: check for missing sections in test_pages_in_a_zone() test_pages_in_a_zone() does not account for the possibility of missing sections in the given pfn range. pfn_valid_within always returns 1 when CONFIG_HOLES_IN_ZONE is not set, allowing invalid pfns from missing sections to pass the test, leading to a kernel oops. Wrap an additional pfn loop with PAGES_PER_SECTION granularity to check for missing sections before proceeding into the zone-check code. This also prevents a crash from offlining memory devices with missing sections. Despite this, it may be a good idea to keep the related patch '[PATCH 3/3] drivers: memory: prohibit offlining of memory blocks with missing sections' because missing sections in a memory block may lead to other problems not covered by the scope of this fix. Signed-off-by: Andrew Banman <abanman@sgi.com> Acked-by: Alex Thorlton <athorlton@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Alex Thorlton <athorlton@sgi.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Seth Jennings <sjennings@variantweb.net> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-29 17:45:49 -08:00
Vladimir Davydov	6df38689e0	mm: memcontrol: fix possible memcg leak due to interrupted reclaim Memory cgroup reclaim can be interrupted with mem_cgroup_iter_break() once enough pages have been reclaimed, in which case, in contrast to a full round-trip over a cgroup sub-tree, the current position stored in mem_cgroup_reclaim_iter of the target cgroup does not get invalidated and so is left holding the reference to the last scanned cgroup. If the target cgroup does not get scanned again (we might have just reclaimed the last page or all processes might exit and free their memory voluntary), we will leak it, because there is nobody to put the reference held by the iterator. The problem is easy to reproduce by running the following command sequence in a loop: mkdir /sys/fs/cgroup/memory/test echo 100M > /sys/fs/cgroup/memory/test/memory.limit_in_bytes echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs memhog 150M echo $$ > /sys/fs/cgroup/memory/cgroup.procs rmdir test The cgroups generated by it will never get freed. This patch fixes this issue by making mem_cgroup_iter avoid taking reference to the current position. In order not to hit use-after-free bug while running reclaim in parallel with cgroup deletion, we make use of ->css_released cgroup callback to clear references to the dying cgroup in all reclaim iterators that might refer to it. This callback is called right before scheduling rcu work which will free css, so if we access iter->position from rcu read section, we might be sure it won't go away under us. [hannes@cmpxchg.org: clean up css ref handling] Fixes: `5ac8fb31ad` ("mm: memcontrol: convert reclaim iterator to simple css refcounting") Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: <stable@vger.kernel.org> [3.19+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-29 17:45:49 -08:00
Dan Streetman	8bc8b228d0	mm/zswap: change incorrect strncmp use to strcmp Change the use of strncmp in zswap_pool_find_get() to strcmp. The use of strncmp is no longer correct, now that zswap_zpool_type is not an array; sizeof() will return the size of a pointer, which isn't the right length to compare. We don't need to use strncmp anyway, because the existing params and the passed in params are all guaranteed to be null terminated, so strcmp should be used. Signed-off-by: Dan Streetman <ddstreet@ieee.org> Reported-by: Weijie Yang <weijie.yang@samsung.com> Cc: Seth Jennings <sjennings@variantweb.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-18 14:25:40 -08:00
Chen Jie	a2b829d959	mm/oom_kill.c: avoid attempting to kill init sharing same memory It's possible that an oom killed victim shares an ->mm with the init process and thus oom_kill_process() would end up trying to kill init as well. This has been shown in practice: Out of memory: Kill process 9134 (init) score 3 or sacrifice child Killed process 9134 (init) total-vm:1868kB, anon-rss:84kB, file-rss:572kB Kill process 1 (init) sharing same memory ... Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 And this will result in a kernel panic. If a process is forked by init and selected for oom kill while still sharing init_mm, then it's likely this system is in a recoverable state. However, it's better not to try to kill init and allow the machine to panic due to unkillable processes. [rientjes@google.com: rewrote changelog] [akpm@linux-foundation.org: fix inverted test, per Ben] Signed-off-by: Chen Jie <chenjie6@huawei.com> Signed-off-by: David Rientjes <rientjes@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Li Zefan <lizefan@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-12 10:15:34 -08:00
Hugh Dickins	267a4c76bb	tmpfs: fix shmem_evict_inode() warnings on i_blocks Dmitry Vyukov provides a little program, autogenerated by syzkaller, which races a fault on a mapping of a sparse memfd object, against truncation of that object below the fault address: run repeatedly for a few minutes, it reliably generates shmem_evict_inode()'s WARN_ON(inode->i_blocks). (But there's nothing specific to memfd here, nor to the fstat which it happened to use to generate the fault: though that looked suspicious, since a shmem_recalc_inode() had been added there recently. The same problem can be reproduced with open+unlink in place of memfd_create, and with fstatfs in place of fstat.) v3.7 commit `0f3c42f522` ("tmpfs: change final i_blocks BUG to WARNING") explains one cause of such a warning (a race with shmem_writepage to swap), and possible solutions; but we never took it further, and this syzkaller incident turns out to have a different cause. shmem_getpage_gfp()'s error recovery, when a freshly allocated page is then found to be beyond eof, looks plausible - decrementing the alloced count that was just before incremented - but in fact can go wrong, if a racing thread (the truncator, for example) gets its shmem_recalc_inode() in just after our delete_from_page_cache(). delete_from_page_cache() decrements nrpages, that shmem_recalc_inode() will balance the books by decrementing alloced itself, then our decrement of alloced take it one too low: leading to the WARNING when the object is finally evicted. Once the new page has been exposed in the page cache, shmem_getpage_gfp() must leave it to shmem_recalc_inode() itself to get the accounting right in all cases (and not fall through from "trunc:" to "decused:"). Adjust that error recovery block; and the reinitialization of info and sbinfo can be removed too. While we're here, fix shmem_writepage() to avoid the original issue: it will be safe against a racing shmem_recalc_inode(), if it merely increments swapped before the shmem_delete_from_page_cache() which decrements nrpages (but it must then do its own shmem_recalc_inode() before that, while still in balance, instead of after). (Aside: why do we shmem_recalc_inode() here in the swap path? Because its raison d'etre is to cope with clean sparse shmem pages being reclaimed behind our back: so here when swapping is a good place to look for that case.) But I've not now managed to reproduce this bug, even without the patch. I don't see why I didn't do that earlier: perhaps inhibited by the preference to eliminate shmem_recalc_inode() altogether. Driven by this incident, I do now have a patch to do so at last; but still want to sit on it for a bit, there's a couple of questions yet to be resolved. Signed-off-by: Hugh Dickins <hughd@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-12 10:15:34 -08:00
Mike Kravetz	dbe409e4f5	mm/hugetlb.c: fix resv map memory leak for placeholder entries Dmitry Vyukov reported the following memory leak unreferenced object 0xffff88002eaafd88 (size 32): comm "a.out", pid 5063, jiffies 4295774645 (age 15.810s) hex dump (first 32 bytes): 28 e9 4e 63 00 88 ff ff 28 e9 4e 63 00 88 ff ff (.Nc....(.Nc.... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: kmalloc include/linux/slab.h:458 region_chg+0x2d4/0x6b0 mm/hugetlb.c:398 __vma_reservation_common+0x2c3/0x390 mm/hugetlb.c:1791 vma_needs_reservation mm/hugetlb.c:1813 alloc_huge_page+0x19e/0xc70 mm/hugetlb.c:1845 hugetlb_no_page mm/hugetlb.c:3543 hugetlb_fault+0x7a1/0x1250 mm/hugetlb.c:3717 follow_hugetlb_page+0x339/0xc70 mm/hugetlb.c:3880 __get_user_pages+0x542/0xf30 mm/gup.c:497 populate_vma_page_range+0xde/0x110 mm/gup.c:919 __mm_populate+0x1c7/0x310 mm/gup.c:969 do_mlock+0x291/0x360 mm/mlock.c:637 SYSC_mlock2 mm/mlock.c:658 SyS_mlock2+0x4b/0x70 mm/mlock.c:648 Dmitry identified a potential memory leak in the routine region_chg, where a region descriptor is not free'ed on an error path. However, the root cause for the above memory leak resides in region_del. In this specific case, a "placeholder" entry is created in region_chg. The associated page allocation fails, and the placeholder entry is left in the reserve map. This is "by design" as the entry should be deleted when the map is released. The bug is in the region_del routine which is used to delete entries within a specific range (and when the map is released). region_del did not handle the case where a placeholder entry exactly matched the start of the range range to be deleted. In this case, the entry would not be deleted and leaked. The fix is to take these special placeholder entries into account in region_del. The region_chg error path leak is also fixed. Fixes: `feba16e25a` ("mm/hugetlb: add region_del() to delete a specific range of entries") Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com> Cc: <stable@vger.kernel.org> [4.3+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2015-12-12 10:15:34 -08:00

1 2 3 4 5 ...