evie/android_kernel_oneplus_msm8998 - Gay Catgirls Forgejo: gay catgirls having sex

evie/android_kernel_oneplus_msm8998

Author	SHA1	Message	Date
Vlastimil Babka	9ea305710e	mm/page_alloc: prevent merging between isolated and other pageblocks Hanjun Guo has reported that a CMA stress test causes broken accounting of CMA and free pages: > Before the test, I got: > -bash-4.3# cat /proc/meminfo \| grep Cma > CmaTotal: 204800 kB > CmaFree: 195044 kB > > > After running the test: > -bash-4.3# cat /proc/meminfo \| grep Cma > CmaTotal: 204800 kB > CmaFree: 6602584 kB > > So the freed CMA memory is more than total.. > > Also the the MemFree is more than mem total: > > -bash-4.3# cat /proc/meminfo > MemTotal: 16342016 kB > MemFree: 22367268 kB > MemAvailable: 22370528 kB Laura Abbott has confirmed the issue and suspected the freepage accounting rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA pageblocks: > CMA isolates MAX_ORDER aligned blocks, but, during the process, > partialy isolated block exists. If MAX_ORDER is 11 and > pageblock_order is 9, two pageblocks make up MAX_ORDER > aligned block and I can think following scenario because pageblock > (un)isolation would be done one by one. > > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, > MIGRATE_ISOLATE, respectively. > > CC -> IC -> II (Isolation) > II -> CI -> CC (Un-isolation) > > If some pages are freed at this intermediate state such as IC or CI, > that page could be merged to the other page that is resident on > different type of pageblock and it will cause wrong freepage count. This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but since it doesn't hold the zone->lock between pageblocks, a race window does exist. It's also likely that unexpected merging can occur between MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in __free_one_page() since commit `3c605096d3` ("mm/page_alloc: restrict max order of merging on isolated pageblock"). However, we only check the migratetype of the pageblock where buddy merging has been initiated, not the migratetype of the buddy pageblock (or group of pageblocks) which can be MIGRATE_ISOLATE. Joonsoo has suggested checking for buddy migratetype as part of page_is_buddy(), but that would add extra checks in allocator hotpath and bloat-o-meter has shown significant code bloat (the function is inline). This patch reduces the bloat at some expense of more complicated code. The buddy-merging while-loop in __free_one_page() is initially bounded to pageblock_border and without any migratetype checks. The checks are placed outside, bumping the max_order if merging is allowed, and returning to the while-loop with a statement which can't be possibly considered harmful. This fixes the accounting bug and also removes the arguably weird state in the original commit `3c605096d3` where buddies could be left unmerged. Fixes: `3c605096d3` ("mm/page_alloc: restrict max order of merging on isolated pageblock") Link: https://lkml.org/lkml/2016/3/2/280 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Debugged-by: Laura Abbott <labbott@redhat.com> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> [3.18+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Change-Id: I768a9c4886aa3fe2e827aba682f67bac2dba6f71 Git-commit: d9dddbf556674bf125ecd925b24e43a5cf2a568a Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git [vinemenon@codeaurora.org: fix trivial merge conflicts] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>	2016-06-07 16:05:12 -07:00
Vinayak Menon	6cc2fdb17c	mm: zcache: fix merge issues Fix 4.4 merge issues in zero page support, and add the missing label. Change-Id: I4bed7add011e0c9b0e148d1b44132ba1873cf607 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-06-07 11:57:44 -07:00
Sarangdhar Joshi	7ab05c20ad	arm64: Add support for app specific settings Add support to provide an interface that can be used from userspace to decide whether app specific settings need to be applied / cleared when particular processes are running. CRs-Fixed: 981519 997757 Change-Id: Id81f8b70de64f291a8586150f4d2c7c8f8b4420f Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org> [satyap@codeaurora.org: trivial merge conflict resolution and pull fixes for CR: 997757] Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>	2016-06-07 11:53:27 -07:00
Alex Shi	cf5ba83bb2	Merge branch 'lsk-v4.4-android' of git://android.git.linaro.org/kernel/linaro-android into linux-linaro-lsk-v4.4-android	2016-06-02 17:59:02 +08:00
Vinayak Menon	9e6c849ebb	mm: zcache: remove __GFP_NO_KSWAPD Remove __GFP_NO_KSWAPD. It no longer exist. Change-Id: I8b50a06bdae050b3a3c47b80e21d0d2edf18b7c5 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-05-31 15:27:23 -07:00
Bob Liu	5248c3b4e4	mm: add WasActive page flag Zcache could be ineffective if the compressed memory pool is full with compressed inactive file pages and most of them will be never used again. So we pick up pages from active file list only, those pages would probably be accessed again. Compress them in memory can reduce the latency significantly compared with rereading from disk. When a file page is shrunk from active file list to inactive file list, PageActive flag is also cleared. So adding an extra WasActive page flag for zcache to know whether the file page was shrunk from the active list. Change-Id: Ida1f4db17075d1f6f825ef7ce2b3bae4eb799e3f Signed-off-by: Bob Liu <bob.liu@oracle.com> Patch-mainline: linux-mm @ 2013-08-06 11:36:17 [vinmenon@codeaurora.org: trivial merge conflict fixes, checkpatch fixes, fix the definitions of was_active page flag so that it does not create compile time errors with CONFIG_CLEANCACHE disabled. Also remove the unnecessary use of PG_was_active in PAGE_FLAGS_CHECK_AT_PREP. Since was_active is a requirement for zcache, make the definitions dependent on CONFIG_ZCACHE rather than CONFIG_CLEANCACHE.] Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-05-31 15:27:11 -07:00
Shiraz Hashim	9345b8f94b	mm/memblock: disable local irqs while late memblock changes There is a possibility of deadlock while doing late memblock configuration as only preemption is disabled and irq can be serviced while seqlock is held and in turn memblock_is_memory can be called from irq context thus trying to claim seqlock again. Following call stack was observed, [<c02136d4>] memblock_search+0x1c [<c021487c>] memblock_is_memory+0x10 [<c01e4684>] free_kmem_pages+0x44 [<c0121c04>] free_task+0x28 [<c0178b30>] rcu_process_callbacks+0x488 [<c0127e30>] __do_softirq+0x150 [<c0128284>] irq_exit+0x84 [<c010c11c>] handle_IPI+0x12c [<c0100588>] gic_handle_irq+0x70 [<c0e9efc0>] __irq_svc+0x40 [<c0214a8c>] memblock_region_resize_late_end+0xc [<c057010c>] removed_alloc+0x110 [<c04ab2c4>] pil_boot+0x2b0 [<c04b7700>] __subsystem_get+0xe0 [<c04b79cc>] subsys_device_open+0x74 [<c0229f20>] chrdev_open+0x12c [<c02246e4>] do_dentry_open+0x280 [<c0232698>] do_last+0x9a4 [<c0232b8c>] path_openat+0x23c [<c0233bf0>] do_filp_open+0x2c Fix it by disabling irqs during late memblock configuration. It is a one time operation which changes memblock related data structures and doesn't carry performance impact. CRs-Fixed: 1003890 Change-Id: I3ff1894f0c80580920b1971cda357915665b5054 Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>	2016-05-31 15:26:50 -07:00
Vinayak Menon	350c68c124	mm: swap_ratio: bail out if there aren't any other swap device It is pointless to calculate the swap ratio when there is only one swap device in the group. Moreover the existing code would result in a spinlock recursion because of not taking this into consideration. Interestingly, this check is already performed in swap_ratio_slow by this piece of code if (&(si)->avail_list == plist_last(&swap_avail_head)) { / just to make skip work / n = si; ret = -ENODEV; goto skip; } But there is window where we drop the swap_avail_lock before invoking swap_ratio() and take it back again in swap_ratio_slow. In this period the si can get removed from swap_avail_head, resulting in the failure of above logic. So recheck again. Similarly, bail out from swap_ratio() if the sysctl is disabled, and thus avoiding overhead of taking unnecessary locks. Change-Id: I81a9dd61d24b7da55d5341c48a1f71d2b4b1978d Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-05-31 15:23:38 -07:00
Vinayak Menon	44bd107fc9	mm: vmscan: fix the page state calculation in too_many_isolated It is observed that sometimes multiple tasks get blocked in the congestion_wait loop below, in shrink_inactive_list. (__schedule) from [<c0a03328>] (schedule_timeout) from [<c0a04940>] (io_schedule_timeout) from [<c01d585c>] (congestion_wait) from [<c01cc9d8>] (shrink_inactive_list) from [<c01cd034>] (shrink_zone) from [<c01cdd08>] (try_to_free_pages) from [<c01c442c>] (__alloc_pages_nodemask) from [<c01f1884>] (new_slab) from [<c09fcf60>] (__slab_alloc) from [<c01f1a6c>] In one such instance, zone_page_state(zone, NR_ISOLATED_FILE) had returned 14, zone_page_state(zone, NR_INACTIVE_FILE) returned 92, and the gfp_flag was GFP_KERNEL which resulted in too_many_isolated to return true. But one of the CPU pageset vmstat diff had NR_ISOLATED_FILE as -14. As there weren't any more update to per cpu pageset, the threshold wasn't met, and the tasks were blocked in the congestion wait. This patch uses zone_page_state_snapshot instead, but restricts its usage to avoid performance penalty. Change-Id: Iec767a548e524729c7ed79a92fe4718cdd08ce69 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-05-31 15:23:28 -07:00
Alex Shi	023861726f	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2016-05-20 12:16:40 +08:00
Dmitry Shmidt	52a20402ae	Revert "mm: vmscan: Add a debug file for shrinkers" Kernel panic when type "cat /sys/kernel/debug/shrinker" Unable to handle kernel paging request at virtual address 0af37d40 pgd = d4dec000 [0af37d40] *pgd=00000000 Internal error: Oops: 5 [#1] PREEMPT SMP ARM [<c0bb8f24>] (_raw_spin_lock) from [<c020aa08>] (list_lru_count_one+0x14/0x28) [<c020aa08>] (list_lru_count_one) from [<c02309a8>] (super_cache_count+0x40/0xa0) [<c02309a8>] (super_cache_count) from [<c01f6ab0>] (debug_shrinker_show+0x50/0x90) [<c01f6ab0>] (debug_shrinker_show) from [<c024fa5c>] (seq_read+0x1ec/0x48c) [<c024fa5c>] (seq_read) from [<c022e8f8>] (__vfs_read+0x20/0xd0) [<c022e8f8>] (__vfs_read) from [<c022f0d0>] (vfs_read+0x7c/0x104) [<c022f0d0>] (vfs_read) from [<c022f974>] (SyS_read+0x44/0x9c) [<c022f974>] (SyS_read) from [<c0107580>] (ret_fast_syscall+0x0/0x3c) Code: e1a04000 e3a00001 ebd66b39 f594f000 (e1943f9f) ---[ end trace 60c74014a63a9688 ]--- Kernel panic - not syncing: Fatal exception shrink_control.nid is used but not initialzed, same for shrink_control.memcg. This reverts commit `b0e7a582b2`. Change-Id: I108de88fa4baaef99a53c4e4c6a1d8c4b4804157 Reported-by: Xiaowen Liu <xiaowen.liu@freescale.com> Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>	2016-05-19 12:35:13 +05:30
Sergey Senozhatsky	1d77f0a51c	zsmalloc: fix zs_can_compact() integer overflow commit 44f43e99fe70833058482d183e99fdfd11220996 upstream. zs_can_compact() has two race conditions in its core calculation: unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) - zs_stat_get(class, OBJ_USED); 1) classes are not locked, so the numbers of allocated and used objects can change by the concurrent ops happening on other CPUs 2) shrinker invokes it from preemptible context Depending on the circumstances, thus, OBJ_ALLOCATED can become less than OBJ_USED, which can result in either very high or negative `total_scan' value calculated later in do_shrink_slab(). do_shrink_slab() has some logic to prevent those cases: vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62 However, due to the way `total_scan' is calculated, not every shrinker->count_objects() overflow can be spotted and handled. To demonstrate the latter, I added some debugging code to do_shrink_slab() (x86_64) and the results were: vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 92679974445502 vmscan: resulting total_scan: 92679974445502 [..] vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615] vmscan: but total_scan > 0: 22634041808232578 vmscan: resulting total_scan: 22634041808232578 Even though shrinker->count_objects() has returned an overflowed value, the resulting `total_scan' is positive, and, what is more worrisome, it is insanely huge. This value is getting used later on in shrinker->scan_objects() loop: while (total_scan >= batch_size \|\| total_scan >= freeable) { unsigned long ret; unsigned long nr_to_scan = min(batch_size, total_scan); shrinkctl->nr_to_scan = nr_to_scan; ret = shrinker->scan_objects(shrinker, shrinkctl); if (ret == SHRINK_STOP) break; freed += ret; count_vm_events(SLABS_SCANNED, nr_to_scan); total_scan -= nr_to_scan; cond_resched(); } `total_scan >= batch_size' is true for a very-very long time and 'total_scan >= freeable' is also true for quite some time, because `freeable < 0' and `total_scan' is large enough, for example, 22634041808232578. The only break condition, in the given scheme of things, is shrinker->scan_objects() == SHRINK_STOP test, which is a bit too weak to rely on, especially in heavy zsmalloc-usage scenarios. To fix the issue, take a pool stat snapshot and use it instead of racy zs_stat_get() calls. Link: http://lkml.kernel.org/r/20160509140052.3389-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-18 17:06:44 -07:00
Liam Mark	2ef8479273	mm: cma: sleep between retries in cma_alloc Port support from 3.10 for retrying cma allocations to 3.18 to help resolve cma allocation failures. It was observed that CMA pages are sometimes getting pinned down by BG processes scheduled out in their exit path. Since BG processes have lower priority they end up getting less time slice by scheduler there by consuming more time to free up CMA pages. Also when a process is being forked copy_one_pte may create copy-on-write mappings, when this is done the page _count and page _mapcount are each incremented sequentially. If the process is context switched out after incrementing the _count but before incrementing the _mapcount then the page will appear temporarily pinned. So instead of failing to allocate and directly returning an error on the CMA allocation path we do 2 retries, with sleeps, to give the system an opportunity to unpin any pinned pages. Change-Id: I022a9341f8ee44f281c7cb34769695843e97d684 Signed-off-by: Susheel Khiani <skhiani@codeaurora.org> Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-05-18 13:38:15 -07:00
Alex Shi	b3f09bff3f	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2016-05-12 12:20:40 +08:00
Alex Shi	334ca3ed18	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2016-05-12 09:27:18 +08:00
Howard Cochran	4bc9468f16	writeback: Fix performance regression in wb_over_bg_thresh() commit 74d369443325063a5f0260e63971decb950fd8fa upstream. Commit `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") unintentionally changed this function's meaning from "are there more dirty pages than the background writeback threshold" to "are there more dirty pages than the writeback threshold". The background writeback threshold is typically half of the writeback threshold, so this had the effect of raising the number of dirty pages required to cause a writeback worker to perform background writeout. This can cause a very severe performance regression when a BDI uses BDI_CAP_STRICTLIMIT because balance_dirty_pages() and the writeback worker can now disagree on whether writeback should be initiated. For example, in a system having 1GB of RAM, a single spinning disk, and a "pass-through" FUSE filesystem mounted over the disk, application code mmapped a 128MB file on the disk and was randomly dirtying pages in that mapping. Because FUSE uses strictlimit and has a default max_ratio of only 1%, in balance_dirty_pages, thresh is ~200, bg_thresh is ~100, and the dirty_freerun_ceiling is the average of those, ~150. So, it pauses the dirtying processes when we have 151 dirty pages and wakes up a background writeback worker. But the worker tests the wrong threshold (200 instead of 100), so it does not initiate writeback and just returns. Thus, balance_dirty_pages keeps looping, sleeping and then waking up the worker who will do nothing. It remains stuck in this state until the few dirty pages that we have finally expire and we write them back for that reason. Then the whole process repeats, resulting in near-zero throughput through the FUSE BDI. The fix is to call the parameterized variant of wb_calc_thresh, so that the worker will do writeback if the bg_thresh is exceeded which was the behavior before the referenced commit. Fixes: `947e9762a8` ("writeback: update wb_over_bg_thresh() to use wb_domain aware operations") Signed-off-by: Howard Cochran <hcochran@kernelspring.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by Sedat Dilek <sedat.dilek@gmail.com> Signed-off-by: Jens Axboe <axboe@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-11 11:21:18 +02:00
Jason Baron	24b8a175a6	mm: update min_free_kbytes from khugepaged after core initialization commit bc22af74f271ef76b2e6f72f3941f91f0da3f5f8 upstream. Khugepaged attempts to raise min_free_kbytes if its set too low. However, on boot khugepaged sets min_free_kbytes first from subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes after from init_per_zone_wmark_min(), via a module_init() call. Khugepaged used to use a late_initcall() to set min_free_kbytes (such that it occurred after the core initialization), however this was removed when the initialization of min_free_kbytes was integrated into the starting of the khugepaged thread. The fix here is simply to invoke the core initialization using a core_initcall() instead of module_init(), such that the previous initialization ordering is restored. I didn't restore the late_initcall() since start_stop_khugepaged() already sets min_free_kbytes via set_recommended_min_free_kbytes(). This was noticed when we had a number of page allocation failures when moving a workload to a kernel with this new initialization ordering. On an 8GB system this restores min_free_kbytes back to 67584 from 11365 when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y. Fixes: `79553da293` ("thp: cleanup khugepaged startup") Signed-off-by: Jason Baron <jbaron@akamai.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-11 11:21:17 +02:00
Dan Streetman	851375cc49	mm/zswap: provide unique zpool name commit 32a4e169039927bfb6ee9f0ccbbe3a8aaf13a4bc upstream. Instead of using "zswap" as the name for all zpools created, add an atomic counter and use "zswap%x" with the counter number for each zpool created, to provide a unique name for each new zpool. As zsmalloc, one of the zpool implementations, requires/expects a unique name for each pool created, zswap should provide a unique name. The zsmalloc pool creation does not fail if a new pool with a conflicting name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case, zsmalloc pool creation fails with -ENOMEM. Then zswap will be unable to change its compressor parameter if its zpool is zsmalloc; it also will be unable to change its zpool parameter back to zsmalloc, if it has any existing old zpool using zsmalloc with page(s) in it. Attempts to change the parameters will result in failure to create the zpool. This changes zswap to provide a unique name for each zpool creation. Fixes: `f1c54846ee` ("zswap: dynamic pool creation") Signed-off-by: Dan Streetman <ddstreet@ieee.org> Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Dan Streetman <dan.streetman@canonical.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-11 11:21:14 +02:00
Hugh Dickins	d27e2ddc40	mm, cma: prevent nr_isolated_* counters from going negative commit 14af4a5e9b26ad251f81c174e8a43f3e179434a5 upstream. /proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go increasingly negative under compaction: which would add delay when should be none, or no delay when should delay. The bug in compaction was due to a recent mmotm patch, but much older instance of the bug was also noticed in isolate_migratepages_range() which is used for CMA and gigantic hugepage allocations. The bug is caused by putback_movable_pages() in an error path decrementing the isolated counters without them being previously incremented by acct_isolated(). Fix isolate_migratepages_range() by removing the error-path putback, thus reaching acct_isolated() with migratepages still isolated, and leaving putback to caller like most other places do. Fixes: `edc2ca6124` ("mm, compaction: move pageblock checks up from isolate_migratepages_range()") [vbabka@suse.cz: expanded the changelog] Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-11 11:21:14 +02:00
Trilok Soni	1ca0992ef1	lib: memtest: Add MEMTEST_ENABLE_DEFAULT option As of now memtest remains disabled until we specify the patterns through the kernel command line. Some platforms have two different configurations files (one for debug and another for product) which can use the configuration option to enable the memtest by default (in the debug configuration file). CRs-Fixed: 1007344 Change-Id: I0bf7b33c3584f3d6cf5ef58dfe72be46212041da Signed-off-by: Trilok Soni <tsoni@codeaurora.org>	2016-05-05 15:05:52 -07:00
Minchan Kim	36abe7272a	mm/hwpoison: fix wrong num_poisoned_pages accounting commit d7e69488bd04de165667f6bc741c1c0ec6042ab9 upstream. Currently, migration code increses num_poisoned_pages on failed migration page as well as successfully migrated one at the trial of memory-failure. It will make the stat wrong. As well, it marks the page as PG_HWPoison even if the migration trial failed. It would mean we cannot recover the corrupted page using memory-failure facility. This patches fixes it. Signed-off-by: Minchan Kim <minchan@kernel.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Minchan Kim	87c855f150	mm: vmscan: reclaim highmem zone if buffer_heads is over limit commit 7bf52fb891b64b8d61caf0b82060adb9db761aec upstream. We have been reclaimed highmem zone if buffer_heads is over limit but commit `6b4f7799c6` ("mm: vmscan: invoke slab shrinkers from shrink_zone()") changed the behavior so it doesn't reclaim highmem zone although buffer_heads is over the limit. This patch restores the logic. Fixes: `6b4f7799c6` ("mm: vmscan: invoke slab shrinkers from shrink_zone()") Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Gerald Schaefer	e513b90a9a	numa: fix /proc/<pid>/numa_maps for THP commit 28093f9f34cedeaea0f481c58446d9dac6dd620f upstream. In gather_pte_stats() a THP pmd is cast into a pte, which is wrong because the layouts may differ depending on the architecture. On s390 this will lead to inaccurate numa_maps accounting in /proc because of misguided pte_present() and pte_dirty() checks on the fake pte. On other architectures pte_present() and pte_dirty() may work by chance, but there may be an issue with direct-access (dax) mappings w/o underlying struct pages when HAVE_PTE_SPECIAL is set and THP is available. In vm_normal_page() the fake pte will be checked with pte_special() and because there is no "special" bit in a pmd, this will always return false and the VM_PFNMAP \| VM_MIXEDMAP checking will be skipped. On dax mappings w/o struct pages, an invalid struct page pointer would then be returned that can crash the kernel. This patch fixes the numa_maps THP handling by introducing new "_pmd" variants of the can_gather_numa_stats() and vm_normal_page() functions. Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Jerome Marchand <jmarchan@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Konstantin Khlebnikov	be591a683e	mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check commit 3486b85a29c1741db99d0c522211c82d2b7a56d0 upstream. Khugepaged detects own VMAs by checking vm_file and vm_ops but this way it cannot distinguish private /dev/zero mappings from other special mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap. This fixes false-positive VM_BUG_ON and prevents installing THP where they are not expected. Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com Fixes: `78f11a2557` ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Tejun Heo	52526076a5	memcg: relocate charge moving from ->attach to ->post_attach commit 264a0ae164bc0e9144bebcd25ff030d067b1a878 upstream. Hello, So, this ended up a lot simpler than I originally expected. I tested it lightly and it seems to work fine. Petr, can you please test these two patches w/o the lru drain drop patch and see whether the problem is gone? Thanks. ------ 8< ------ If charge moving is used, memcg performs relabeling of the affected pages from its ->attach callback which is called under both cgroup_threadgroup_rwsem and thus can't create new kthreads. This is fragile as various operations may depend on workqueues making forward progress which relies on the ability to create new kthreads. There's no reason to perform charge moving from ->attach which is deep in the task migration path. Move it to ->post_attach which is called after the actual migration is finished and cgroup_threadgroup_rwsem is dropped. * move_charge_struct->mm is added and ->can_attach is now responsible for pinning and recording the target mm. mem_cgroup_clear_mc() is updated accordingly. This also simplifies mem_cgroup_move_task(). * mem_cgroup_move_task() is now called from ->post_attach instead of ->attach. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@kernel.org> Debugged-and-tested-by: Petr Mladek <pmladek@suse.com> Reported-by: Cyril Hrubis <chrubis@suse.cz> Reported-by: Johannes Weiner <hannes@cmpxchg.org> Fixes: `1ed1328792` ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Jesper Dangaard Brouer	a4e25ff311	slub: clean up code for kmem cgroup support to kmem_cache_free_bulk commit 376bf125ac781d32e202760ed7deb1ae4ed35d31 upstream. This change is primarily an attempt to make it easier to realize the optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not enabled. Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the overhead is zero. This is because, as long as no process have enabled kmem cgroups accounting, the assignment is replaced by asm-NOP operations. This is possible because memcg_kmem_enabled() uses a static_key_false() construct. It also helps readability as it avoid accessing the p[] array like: p[size - 1] which "expose" that the array is processed backwards inside helper function build_detached_freelist(). Lastly this also makes the code more robust, in error case like passing NULL pointers in the array. Which were previously handled before commit `033745189b` ("slub: add missing kmem cgroup support to kmem_cache_free_bulk"). Fixes: `033745189b` ("slub: add missing kmem cgroup support to kmem_cache_free_bulk") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-05-04 14:48:49 -07:00
Trilok Soni	bf6ceddd56	mm/page_owner: ask users about default setting of PAGE_OWNER Since this commit `48c96a3685` ("mm/page_owner: keep track of page owners") doesn't enable the page_owner by default even though CONFIG_PAGE_OWNER is enabled. Add configuration option CONFIG_PAGE_OWNER_ENABLE_DEFAULT to allow user to enable it by default through the defconfig file. CRs-Fixed: 1006743 Change-Id: I9b565a34e2068bf575974eaf3dc9f7820bdd7a96 Signed-off-by: Trilok Soni <tsoni@codeaurora.org>	2016-05-03 15:53:55 -07:00
Christian Borntraeger	9194c460b8	mm/debug_pagealloc: ask users for default setting of debug_pagealloc Since commit `031bc5743f` ("mm/debug-pagealloc: make debug-pagealloc boottime configurable") CONFIG_DEBUG_PAGEALLOC is by default not adding any page debugging. This resulted in several unnoticed bugs, e.g. https://lkml.kernel.org/g/<569F5E29.3090107@de.ibm.com> or https://lkml.kernel.org/g/<56A20F30.4050705@de.ibm.com> as this behaviour change was not even documented in Kconfig. Let's provide a new Kconfig symbol that allows to change the default back to enabled, e.g. for debug kernels. This also makes the change obvious to kernel packagers. Let's also change the Kconfig description for CONFIG_DEBUG_PAGEALLOC, to indicate that there are two stages of overhead. CRs-Fixed: 1006743 Change-Id: I52c36765837cc873877b9398371ffd840d485a81 Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-commit: ea6eabb05b26bd3d6f60b29b77a03bc61479fc0f Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git Signed-off-by: Trilok Soni <tsoni@codeaurora.org>	2016-04-29 14:40:10 -07:00
Vinayak Menon	b6fb81015e	mm: fix compile time error with !CONFIG_CMA Fixes compile time failures because of not protecting CMA related elements with CONFIG_CMA. Change-Id: I930b7c0ffdce0f1bfc4f8a582a698be16ed44d1f Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-04-26 14:38:03 -07:00
Alex Shi	bab1564182	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android Conflicts: d_canonical_path in include/linux/dcache.h	2016-04-21 14:08:44 +08:00
Xishi Qiu	fb4cfc6e0a	mm: fix invalid node in alloc_migrate_target() commit 6f25a14a7053b69917e2ebea0d31dd444cd31fd5 upstream. It is incorrect to use next_node to find a target node, it will return MAX_NUMNODES or invalid node. This will lead to crash in buddy system allocation. Fixes: `c8721bbbdd` ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Xishi Qiu <qiuxishi@huawei.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Joonsoo Kim <js1304@gmail.com> Cc: David Rientjes <rientjes@google.com> Cc: "Laura Abbott" <lauraa@codeaurora.org> Cc: Hui Zhu <zhuhui@xiaomi.com> Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-20 15:41:53 +09:00
Liam Mark	50050f2a1d	mm: add cma pcp list Add a cma pcp list in order to increase cma memory utilization. Increased cma memory utilization will improve overall memory utilization because free cma pages are ignored when memory reclaim is done with gfp mask GFP_KERNEL. Since most memory reclaim is done by kswapd, which uses a gfp mask of GFP_KERNEL, by increasing cma memory utilization we are therefore ensuring that less aggressive memory reclaim takes place. Increased cma memory utilization will improve performance, for example it will increase app concurrency. Change-Id: I809589a25c6abca51f1c963f118adfc78e955cf9 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-04-13 11:11:40 -07:00
Heesub Shin	d491cf59f0	cma: redirect page allocation to CMA CMA pages are designed to be used as fallback for movable allocations and cannot be used for non-movable allocations. If CMA pages are utilized poorly, non-movable allocations may end up getting starved if all regular movable pages are allocated and the only pages left are CMA. Always using CMA pages first creates unacceptable performance problems. As a midway alternative, use CMA pages for certain userspace allocations. The userspace pages can be migrated or dropped quickly which giving decent utilization. Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174 CRs-Fixed: 452508 Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com> Signed-off-by: Heesub Shin <heesub.shin@samsung.com [lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text] Signed-off-by: Laura Abbott <lauraa@codeaurora.org> [lmark@codeaurora.org: resolve conflicts relating to MIGRATE_HIGHATOMIC and some other trivial merge conflicts] Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-04-13 11:11:30 -07:00
Liam Mark	1426d1f8d9	lowmemorykiller: Don't count swap cache pages twice The lowmem_shrink function discounts all the swap cache pages from the file cache count. The zone aware code also discounts all file cache pages from a certain zone. This results in some swap cache pages being discounted twice, which can result in the low memory killer being unnecessarily aggressive. Fix the low memory killer to only discount the swap cache pages once. Change-Id: I650bbfbf0fbbabd01d82bdb3502b57ff59c3e14f Signed-off-by: Liam Mark <lmark@codeaurora.org> Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-04-13 11:11:01 -07:00
Liam Mark	92c1fefed5	android/lowmemorykiller: Selectively count free CMA pages In certain memory configurations there can be a large number of CMA pages which are not suitable to satisfy certain memory requests. This large number of unsuitable pages can cause the lowmemorykiller to not kill any tasks because the lowmemorykiller counts all free pages. In order to ensure the lowmemorykiller properly evaluates the free memory only count the free CMA pages if they are suitable for satisfying the memory request. Change-Id: I7f06d53e2d8cfe7439e5561fe6e5209ce73b1c90 CRs-fixed: 437016 Signed-off-by: Liam Mark <lmark@codeaurora.org>	2016-04-13 11:09:54 -07:00
Alex Shi	08562bfcb8	Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android	2016-04-13 12:02:21 +08:00
Vlastimil Babka	5dc7e939b6	mm/page_alloc: prevent merging between isolated and other pageblocks commit d9dddbf556674bf125ecd925b24e43a5cf2a568a upstream. Hanjun Guo has reported that a CMA stress test causes broken accounting of CMA and free pages: > Before the test, I got: > -bash-4.3# cat /proc/meminfo \| grep Cma > CmaTotal: 204800 kB > CmaFree: 195044 kB > > > After running the test: > -bash-4.3# cat /proc/meminfo \| grep Cma > CmaTotal: 204800 kB > CmaFree: 6602584 kB > > So the freed CMA memory is more than total.. > > Also the the MemFree is more than mem total: > > -bash-4.3# cat /proc/meminfo > MemTotal: 16342016 kB > MemFree: 22367268 kB > MemAvailable: 22370528 kB Laura Abbott has confirmed the issue and suspected the freepage accounting rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA pageblocks: > CMA isolates MAX_ORDER aligned blocks, but, during the process, > partialy isolated block exists. If MAX_ORDER is 11 and > pageblock_order is 9, two pageblocks make up MAX_ORDER > aligned block and I can think following scenario because pageblock > (un)isolation would be done one by one. > > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA, > MIGRATE_ISOLATE, respectively. > > CC -> IC -> II (Isolation) > II -> CI -> CC (Un-isolation) > > If some pages are freed at this intermediate state such as IC or CI, > that page could be merged to the other page that is resident on > different type of pageblock and it will cause wrong freepage count. This was supposed to be prevented by CMA operating on MAX_ORDER blocks, but since it doesn't hold the zone->lock between pageblocks, a race window does exist. It's also likely that unexpected merging can occur between MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in __free_one_page() since commit `3c605096d3` ("mm/page_alloc: restrict max order of merging on isolated pageblock"). However, we only check the migratetype of the pageblock where buddy merging has been initiated, not the migratetype of the buddy pageblock (or group of pageblocks) which can be MIGRATE_ISOLATE. Joonsoo has suggested checking for buddy migratetype as part of page_is_buddy(), but that would add extra checks in allocator hotpath and bloat-o-meter has shown significant code bloat (the function is inline). This patch reduces the bloat at some expense of more complicated code. The buddy-merging while-loop in __free_one_page() is initially bounded to pageblock_border and without any migratetype checks. The checks are placed outside, bumping the max_order if merging is allowed, and returning to the while-loop with a statement which can't be possibly considered harmful. This fixes the accounting bug and also removes the arguably weird state in the original commit `3c605096d3` where buddies could be left unmerged. Fixes: `3c605096d3` ("mm/page_alloc: restrict max order of merging on isolated pageblock") Link: https://lkml.org/lkml/2016/3/2/280 Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: Hanjun Guo <guohanjun@huawei.com> Tested-by: Hanjun Guo <guohanjun@huawei.com> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Debugged-by: Laura Abbott <labbott@redhat.com> Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:09:05 -07:00
Johannes Weiner	0ccab5b139	mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage commit b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f upstream. Setting the original memory.limit_in_bytes hardlimit is subject to a race condition when the desired value is below the current usage. The code tries a few times to first reclaim and then see if the usage has dropped to where we would like it to be, but there is no locking, and the workload is free to continue making new charges up to the old limit. Thus, attempting to shrink a workload relies on pure luck and hope that the workload happens to cooperate. To fix this in the cgroup2 memory.max knob, do it the other way round: set the limit first, then try enforcement. And if reclaim is not able to succeed, trigger OOM kills in the group. Keep going until the new limit is met, we run out of OOM victims and there's only unreclaimable memory left, or the task writing to memory.max is killed. This allows users to shrink groups reliably, and the behavior is consistent with what happens when new charges are attempted in excess of memory.max. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:54 -07:00
Johannes Weiner	8b42fc47e1	mm: memcontrol: reclaim when shrinking memory.high below usage commit 588083bb37a3cea8533c392370a554417c8f29cb upstream. When setting memory.high below usage, nothing happens until the next charge comes along, and then it will only reclaim its own charge and not the now potentially huge excess of the new memory.high. This can cause groups to stay in excess of their memory.high indefinitely. To fix that, when shrinking memory.high, kick off a reclaim cycle that goes after the delta. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2016-04-12 09:08:53 -07:00
Guenter Roeck	a7b7a225c1	mm: Export do_munmap The 0-day build bot reports the following build error, seen if SDCARD_FS is built as module. ERROR: "do_munmap" undefined! Fixes: `84a1b7d3d3` ("Included sdcardfs source code for kernel 3.0") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Guenter Roeck <groeck@chromium.org>	2016-04-07 16:50:04 +05:30
Prasad Sodagudi	8567b31169	Revert "debug-pagealloc: Panic on pagealloc corruption" This reverts commit 022c1f3696f2 ("debug-pagealloc: Panic on pagealloc corruption"). Kernel panic is seen on MSM8937 with 32 bit mode, revert this patch till root cause is identified. Change-Id: I66f1bab7f8c836b8b7167ec05141656f34c3702c Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>	2016-03-25 16:04:09 -07:00
Vinayak Menon	e74a8a432e	mm: vmstat: add pageoutclean vmstat events currently count pgpgout, but that includes only the writebacks, and not the reclaim of clean pages. Add an event to count clean page evictions. This is helpful to evaluate page thrashing cases. Change-Id: Icfb797877a544a58c289074bdc290dfbc1384514 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-25 16:03:59 -07:00
Prasad Sodagudi	d099e491b8	debug-pagealloc: print physical address for detected corruption It's sometimes useful to know the physical address which has beencorrupted, especially in systems with multiple bus masters and DMA engines the capability of writing to memory. It's may also be useful for identifying the location of failures of memory cells in cases of device-specific corruption. So print the physical start address of the page to help in these scenarios. Change-Id: I081edd8b1c06913c0057a6cb9dda18077cfbdc30 Signed-off-by: Matt Wagantall <mattw@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>	2016-03-25 16:03:55 -07:00
Vinayak Menon	2e21911abe	mm: do not activate swap write failed pages Sometime back a piece of code was added to activate pages in pageout which failed to writeback. This was done for the case of failed write to zram, with the intention of reducing further zram writes. But this does not make much sense because there can anyway be other pages which the reclaim path can pick to swap out. And this particular logic has a problem. When a write fails, the page is unlocked. Its locked again before activating the page, but the page which is now in swapcache can be brought back with its original mapping through a fault, which can happen during this period. This can result in random bugs, for e.g. when shrink_page_list try to do try_to_free_swap. Here is one such case. In this case PageSwapCache was cleared by the fault path. " zram: Error allocating memory for compressed page: 91433, size=4096 Write-error on swap-device (254:0:731464) page:de866e80 count:3 mapcount:1 mapping:d5368941 index:0xb2ce5 flags: 0x80018(uptodate\|dirty\|swapbacked) page dumped because: VM_BUG_ON_PAGE(!PageSwapCache(page)) " CRs-Fixed: 988207 Change-Id: I26738d0f8dd3e2dfdb24c25edac24a7d968eeba0 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-25 16:03:35 -07:00
Prasad Sodagudi	4eace7df06	debug-pagealloc: Panic on pagealloc corruption Currently, we just print the pagealloc corruption warnings and proceed. Sometimes, we are getting multiple errors printed down the line. It will be good to get the device state as early as possible when we get the first pagealloc error. Change-Id: I79155ac8a039b30a3a98d5dd1384d3923082712f Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org> Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>	2016-03-25 16:03:35 -07:00
Vinayak Menon	30781083ea	mm: zbud: prevent softirq during zbud alloc, free and reclaim The following deadlock is observed. Core 2 waiting on mapping->tree_lock which is taken by core 6 do_raw_spin_lock raw_spin_lock_irq atomic_cmpxchg page_freeze_refs __remove_mapping shrink_page_list shrink_inactive_list shrink_list shrink_lruvec shrink_zone shrink_zones do_try_to_free_pages try_to_free_pages(?, ?, ?, ?) __perform_reclaim __alloc_pages_direct_reclaim __alloc_pages_slowpath __alloc_pages_nodemask alloc_kmem_pages_node alloc_thread_info_node dup_task_struct copy_process.part.56 do_fork sys_clone el0_svc_naked Core 6 after taking mapping->tree_lock is waiting on zbud pool lock which is held by core 5 zbud_alloc zcache_store_page __cleancache_put_page cleancache_put_page __delete_from_page_cache spin_unlock_irq __remove_mapping shrink_page_list shrink_inactive_list shrink_list shrink_lruvec shrink_zone bitmap_zero __nodes_clear kswapd_shrink_zone.constprop.58 balance_pgdat kswapd_try_to_sleep kswapd kthread ret_from_fork Core 5 after taking zbud pool lock from zbud_free received an IRQ, and after IRQ exit, softirqs were scheduled and end_page_writeback tried to lock on mapping->tree_lock which is already held by Core 6. Deadlock. do_raw_spin_lock raw_spin_lock_irqsave test_clear_page_writeba end_page_writeback ext4_finish_bio ext4_end_bio bio_endio blk_update_request end_clone_bio bio_endio blk_update_request blk_update_bidi_request blk_end_bidi_request blk_end_request mmc_blk_cmdq_complete_r mmc_cmdq_softirq_done blk_done_softirq static_key_count static_key_false trace_softirq_exit __do_softirq() tick_irq_exit irq_exit() set_irq_regs __handle_domain_irq gic_handle_irq el1_irq exception __list_del_entry list_del zbud_free zcache_load_page __cleancache_get_page(? So protect zbud_alloc/free/reclaim with spink_lock_bh CRs-Fixed: 986783 Change-Id: Ib0605b38e7371c29316ed81e43549a0b9503d531 Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>	2016-03-25 16:03:17 -07:00
Shiraz Hashim	4c2cff20a2	mm: zcache: fix locking sequence Deadlock is observed in zcache reclaim paths due to different locking sequence. Core#0: Core#1: \|spin_bug() \|do_raw_write_lock() \|do_raw_spin_lock() \|_raw_write_lock_irqsave() \|_raw_spin_lock_irqsave() \|zcache_rbnode_isolate() \|zcache_flush_inode() \|zcache_load_delete_zaddr() \|__cleancache_invalidate_inode() \|zcache_evict_zpage() \|truncate_inode_pages_range() \|zbud_reclaim_page() \|truncate_inode_pages() \|zcache_scan() \|truncate_inode_pages_final() \|shrink_slab_node() \|ext4_evict_inode() \|shrink_slab() \|evict() \|try_to_free_pages() \|dispose_list() \|__alloc_pages_nodemask() \|prune_icache_sb() \|alloc_kmem_pages_node() \|super_cache_scan() \|copy_process.part.52() \|shrink_slab_node() \|do_fork() \|shrink_slab() \|sys_clone() \|kswapd_shrink_zone.constprop \|el0_svc() \|balance_pgdat() \|kswapd() \|kthread() \|ret_from_fork() The deadlock happens because alternate sequence are followed while taking zpool->rb_lock (protects zpool rb tree), and rbnode->ra_lock (protects radix tree maintained by rbtree node) Fix the sequence of locks being taken to avoid deadlock. Change-Id: I32db23268f63eb8eb5aee30e4462c190e2e02f48 Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>	2016-03-25 16:03:08 -07:00
Shiraz Hashim	1b7778354b	mm: zbud: initialize object to 0 on GFP_ZERO zbud_alloc if returns free object from pool must also initialize it to 0 when asked to do so. The same is already taken care if a fresh object is allocated. CRs-fixed: 979234 Change-Id: Id171edf131df321385fcdcd7660d06da97689e3e Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>	2016-03-25 16:02:56 -07:00
Andrey Markovytch	b61ac21fb0	mm + fs: extends support for cache dropping Exposes drop_pagecache_sb (required by eCryptfs cache wiping) Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping), which not only truncates pages but also fills them with 0, so that the cached data can no longer be retrieved. Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6 Signed-off-by: Andrey Markovytch <andreym@codeaurora.org> [gbroner@codeaurora.org: fix merge conflict] Signed-off-by: Gilad Broner <gbroner@codeaurora.org>	2016-03-23 21:24:12 -07:00
Christoph Lameter	8cc0e37a56	vmstat: Remove BUG_ON from vmstat_update If we detect that there is nothing to do just set the flag and do not check if it was already set before. Races really do not matter. If the flag is set by any code then the shepherd will start dealing with the situation and reenable the vmstat workers when necessary again. Since commit 0eb77e988032 ("vmstat: make vmstat_updater deferrable again and shut down on idle") quiet_vmstat might update cpu_stat_off and mark a particular cpu to be handled by vmstat_shepherd. This might trigger a VM_BUG_ON in vmstat_update because the work item might have been sleeping during the idle period and see the cpu_stat_off updated after the wake up. The VM_BUG_ON is therefore misleading and no more appropriate. Moreover it doesn't really suite any protection from real bugs because vmstat_shepherd will simply reschedule the vmstat_work anytime it sees a particular cpu set or vmstat_update would do the same from the worker context directly. Even when the two would race the result wouldn't be incorrect as the counters update is fully idempotent. Change-Id: I4b46e471024ff4cac2b32234dffb3dfcf91713b6 Reported-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Christoph Lameter <cl@linux.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git Git-commit: 587198ba5206cdf0d30855f7361af950a4172cd6 [shashim@codeaurora.org: resolve trivial merge conflicts] Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>	2016-03-23 21:22:15 -07:00

... 3 4 5 6 7 ...