Commit graph

10410 commits

Author SHA1 Message Date
Jason Baron
24b8a175a6 mm: update min_free_kbytes from khugepaged after core initialization
commit bc22af74f271ef76b2e6f72f3941f91f0da3f5f8 upstream.

Khugepaged attempts to raise min_free_kbytes if its set too low.
However, on boot khugepaged sets min_free_kbytes first from
subsys_initcall(), and then the mm 'core' over-rides min_free_kbytes
after from init_per_zone_wmark_min(), via a module_init() call.

Khugepaged used to use a late_initcall() to set min_free_kbytes (such
that it occurred after the core initialization), however this was
removed when the initialization of min_free_kbytes was integrated into
the starting of the khugepaged thread.

The fix here is simply to invoke the core initialization using a
core_initcall() instead of module_init(), such that the previous
initialization ordering is restored.  I didn't restore the
late_initcall() since start_stop_khugepaged() already sets
min_free_kbytes via set_recommended_min_free_kbytes().

This was noticed when we had a number of page allocation failures when
moving a workload to a kernel with this new initialization ordering.  On
an 8GB system this restores min_free_kbytes back to 67584 from 11365
when CONFIG_TRANSPARENT_HUGEPAGE=y is set and either
CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y or
CONFIG_TRANSPARENT_HUGEPAGE_MADVISE=y.

Fixes: 79553da293 ("thp: cleanup khugepaged startup")
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:17 +02:00
Dan Streetman
851375cc49 mm/zswap: provide unique zpool name
commit 32a4e169039927bfb6ee9f0ccbbe3a8aaf13a4bc upstream.

Instead of using "zswap" as the name for all zpools created, add an
atomic counter and use "zswap%x" with the counter number for each zpool
created, to provide a unique name for each new zpool.

As zsmalloc, one of the zpool implementations, requires/expects a unique
name for each pool created, zswap should provide a unique name.  The
zsmalloc pool creation does not fail if a new pool with a conflicting
name is created, unless CONFIG_ZSMALLOC_STAT is enabled; in that case,
zsmalloc pool creation fails with -ENOMEM.  Then zswap will be unable to
change its compressor parameter if its zpool is zsmalloc; it also will
be unable to change its zpool parameter back to zsmalloc, if it has any
existing old zpool using zsmalloc with page(s) in it.  Attempts to
change the parameters will result in failure to create the zpool.  This
changes zswap to provide a unique name for each zpool creation.

Fixes: f1c54846ee ("zswap: dynamic pool creation")
Signed-off-by: Dan Streetman <ddstreet@ieee.org>
Reported-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Dan Streetman <dan.streetman@canonical.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:14 +02:00
Hugh Dickins
d27e2ddc40 mm, cma: prevent nr_isolated_* counters from going negative
commit 14af4a5e9b26ad251f81c174e8a43f3e179434a5 upstream.

/proc/sys/vm/stat_refresh warns nr_isolated_anon and nr_isolated_file go
increasingly negative under compaction: which would add delay when
should be none, or no delay when should delay.  The bug in compaction
was due to a recent mmotm patch, but much older instance of the bug was
also noticed in isolate_migratepages_range() which is used for CMA and
gigantic hugepage allocations.

The bug is caused by putback_movable_pages() in an error path
decrementing the isolated counters without them being previously
incremented by acct_isolated().  Fix isolate_migratepages_range() by
removing the error-path putback, thus reaching acct_isolated() with
migratepages still isolated, and leaving putback to caller like most
other places do.

Fixes: edc2ca6124 ("mm, compaction: move pageblock checks up from isolate_migratepages_range()")
[vbabka@suse.cz: expanded the changelog]
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:14 +02:00
Trilok Soni
1ca0992ef1 lib: memtest: Add MEMTEST_ENABLE_DEFAULT option
As of now memtest remains disabled until we specify the patterns
through the kernel command line. Some platforms have two
different configurations files (one for debug and another for
product) which can use the configuration option to enable the
memtest by default (in the debug configuration file).

CRs-Fixed: 1007344
Change-Id: I0bf7b33c3584f3d6cf5ef58dfe72be46212041da
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-05-05 15:05:52 -07:00
Minchan Kim
36abe7272a mm/hwpoison: fix wrong num_poisoned_pages accounting
commit d7e69488bd04de165667f6bc741c1c0ec6042ab9 upstream.

Currently, migration code increses num_poisoned_pages on *failed*
migration page as well as successfully migrated one at the trial of
memory-failure.  It will make the stat wrong.  As well, it marks the
page as PG_HWPoison even if the migration trial failed.  It would mean
we cannot recover the corrupted page using memory-failure facility.

This patches fixes it.

Signed-off-by: Minchan Kim <minchan@kernel.org>
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Minchan Kim
87c855f150 mm: vmscan: reclaim highmem zone if buffer_heads is over limit
commit 7bf52fb891b64b8d61caf0b82060adb9db761aec upstream.

We have been reclaimed highmem zone if buffer_heads is over limit but
commit 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from
shrink_zone()") changed the behavior so it doesn't reclaim highmem zone
although buffer_heads is over the limit.  This patch restores the logic.

Fixes: 6b4f7799c6 ("mm: vmscan: invoke slab shrinkers from shrink_zone()")
Signed-off-by: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Gerald Schaefer
e513b90a9a numa: fix /proc/<pid>/numa_maps for THP
commit 28093f9f34cedeaea0f481c58446d9dac6dd620f upstream.

In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture.  On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.

On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available.  In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped.  On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.

This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.

Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Konstantin Khlebnikov
be591a683e mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
commit 3486b85a29c1741db99d0c522211c82d2b7a56d0 upstream.

Khugepaged detects own VMAs by checking vm_file and vm_ops but this way
it cannot distinguish private /dev/zero mappings from other special
mappings like /dev/hpet which has no vm_ops and popultes PTEs in mmap.

This fixes false-positive VM_BUG_ON and prevents installing THP where
they are not expected.

Link: http://lkml.kernel.org/r/CACT4Y+ZmuZMV5CjSFOeXviwQdABAgT7T+StKfTqan9YDtgEi5g@mail.gmail.com
Fixes: 78f11a2557 ("mm: thp: fix /dev/zero MAP_PRIVATE and vm_flags cleanups")
Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Tejun Heo
52526076a5 memcg: relocate charge moving from ->attach to ->post_attach
commit 264a0ae164bc0e9144bebcd25ff030d067b1a878 upstream.

Hello,

So, this ended up a lot simpler than I originally expected.  I tested
it lightly and it seems to work fine.  Petr, can you please test these
two patches w/o the lru drain drop patch and see whether the problem
is gone?

Thanks.
------ 8< ------
If charge moving is used, memcg performs relabeling of the affected
pages from its ->attach callback which is called under both
cgroup_threadgroup_rwsem and thus can't create new kthreads.  This is
fragile as various operations may depend on workqueues making forward
progress which relies on the ability to create new kthreads.

There's no reason to perform charge moving from ->attach which is deep
in the task migration path.  Move it to ->post_attach which is called
after the actual migration is finished and cgroup_threadgroup_rwsem is
dropped.

* move_charge_struct->mm is added and ->can_attach is now responsible
  for pinning and recording the target mm.  mem_cgroup_clear_mc() is
  updated accordingly.  This also simplifies mem_cgroup_move_task().

* mem_cgroup_move_task() is now called from ->post_attach instead of
  ->attach.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@kernel.org>
Debugged-and-tested-by: Petr Mladek <pmladek@suse.com>
Reported-by: Cyril Hrubis <chrubis@suse.cz>
Reported-by: Johannes Weiner <hannes@cmpxchg.org>
Fixes: 1ed1328792 ("sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Jesper Dangaard Brouer
a4e25ff311 slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
commit 376bf125ac781d32e202760ed7deb1ae4ed35d31 upstream.

This change is primarily an attempt to make it easier to realize the
optimizations the compiler performs in-case CONFIG_MEMCG_KMEM is not
enabled.

Performance wise, even when CONFIG_MEMCG_KMEM is compiled in, the
overhead is zero.  This is because, as long as no process have enabled
kmem cgroups accounting, the assignment is replaced by asm-NOP
operations.  This is possible because memcg_kmem_enabled() uses a
static_key_false() construct.

It also helps readability as it avoid accessing the p[] array like:
p[size - 1] which "expose" that the array is processed backwards inside
helper function build_detached_freelist().

Lastly this also makes the code more robust, in error case like passing
NULL pointers in the array.  Which were previously handled before commit
033745189b ("slub: add missing kmem cgroup support to
kmem_cache_free_bulk").

Fixes: 033745189b ("slub: add missing kmem cgroup support to kmem_cache_free_bulk")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:49 -07:00
Trilok Soni
bf6ceddd56 mm/page_owner: ask users about default setting of PAGE_OWNER
Since this commit 48c96a3685 ("mm/page_owner: keep track
of page owners") doesn't enable the page_owner by default
even though CONFIG_PAGE_OWNER is enabled.

Add configuration option CONFIG_PAGE_OWNER_ENABLE_DEFAULT to
allow user to enable it by default through the defconfig file.

CRs-Fixed: 1006743
Change-Id: I9b565a34e2068bf575974eaf3dc9f7820bdd7a96
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-05-03 15:53:55 -07:00
Christian Borntraeger
9194c460b8 mm/debug_pagealloc: ask users for default setting of debug_pagealloc
Since commit 031bc5743f ("mm/debug-pagealloc: make debug-pagealloc
boottime configurable") CONFIG_DEBUG_PAGEALLOC is by default not adding
any page debugging.

This resulted in several unnoticed bugs, e.g.

    https://lkml.kernel.org/g/<569F5E29.3090107@de.ibm.com>
or
    https://lkml.kernel.org/g/<56A20F30.4050705@de.ibm.com>

as this behaviour change was not even documented in Kconfig.

Let's provide a new Kconfig symbol that allows to change the default
back to enabled, e.g.  for debug kernels.  This also makes the change
obvious to kernel packagers.

Let's also change the Kconfig description for CONFIG_DEBUG_PAGEALLOC, to
indicate that there are two stages of overhead.

CRs-Fixed: 1006743
Change-Id: I52c36765837cc873877b9398371ffd840d485a81
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: ea6eabb05b26bd3d6f60b29b77a03bc61479fc0f
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
2016-04-29 14:40:10 -07:00
Vinayak Menon
b6fb81015e mm: fix compile time error with !CONFIG_CMA
Fixes compile time failures because of not protecting
CMA related elements with CONFIG_CMA.

Change-Id: I930b7c0ffdce0f1bfc4f8a582a698be16ed44d1f
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-04-26 14:38:03 -07:00
Dmitry Shmidt
ad95c12f66 Revert "mm: vmscan: Add a debug file for shrinkers"
Kernel panic when type "cat /sys/kernel/debug/shrinker"

Unable to handle kernel paging request at virtual address 0af37d40
pgd = d4dec000
[0af37d40] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[<c0bb8f24>] (_raw_spin_lock) from [<c020aa08>] (list_lru_count_one+0x14/0x28)
[<c020aa08>] (list_lru_count_one) from [<c02309a8>] (super_cache_count+0x40/0xa0)
[<c02309a8>] (super_cache_count) from [<c01f6ab0>] (debug_shrinker_show+0x50/0x90)
[<c01f6ab0>] (debug_shrinker_show) from [<c024fa5c>] (seq_read+0x1ec/0x48c)
[<c024fa5c>] (seq_read) from [<c022e8f8>] (__vfs_read+0x20/0xd0)
[<c022e8f8>] (__vfs_read) from [<c022f0d0>] (vfs_read+0x7c/0x104)
[<c022f0d0>] (vfs_read) from [<c022f974>] (SyS_read+0x44/0x9c)
[<c022f974>] (SyS_read) from [<c0107580>] (ret_fast_syscall+0x0/0x3c)
Code: e1a04000 e3a00001 ebd66b39 f594f000 (e1943f9f)
---[ end trace 60c74014a63a9688 ]---
Kernel panic - not syncing: Fatal exception

shrink_control.nid is used but not initialzed, same for
shrink_control.memcg.

This reverts commit b0e7a582b2.

Change-Id: I108de88fa4baaef99a53c4e4c6a1d8c4b4804157
Reported-by: Xiaowen Liu <xiaowen.liu@freescale.com>
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2016-04-26 12:45:11 -07:00
Alex Shi
bab1564182 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Conflicts:
	d_canonical_path in include/linux/dcache.h
2016-04-21 14:08:44 +08:00
Xishi Qiu
fb4cfc6e0a mm: fix invalid node in alloc_migrate_target()
commit 6f25a14a7053b69917e2ebea0d31dd444cd31fd5 upstream.

It is incorrect to use next_node to find a target node, it will return
MAX_NUMNODES or invalid node.  This will lead to crash in buddy system
allocation.

Fixes: c8721bbbdd ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "Laura Abbott" <lauraa@codeaurora.org>
Cc: Hui Zhu <zhuhui@xiaomi.com>
Cc: Wang Xiaoqiang <wangxq10@lzu.edu.cn>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:41:53 +09:00
Liam Mark
50050f2a1d mm: add cma pcp list
Add a cma pcp list in order to increase cma memory utilization.

Increased cma memory utilization will improve overall memory
utilization because free cma pages are ignored when memory reclaim
is done with gfp mask GFP_KERNEL.

Since most memory reclaim is done by kswapd, which uses a gfp mask
of GFP_KERNEL, by increasing cma memory utilization we are therefore
ensuring that less aggressive memory reclaim takes place.

Increased cma memory utilization will improve performance,
for example it will increase app concurrency.

Change-Id: I809589a25c6abca51f1c963f118adfc78e955cf9
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2016-04-13 11:11:40 -07:00
Heesub Shin
d491cf59f0 cma: redirect page allocation to CMA
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.

Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174
CRs-Fixed: 452508
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Heesub Shin <heesub.shin@samsung.com
[lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[lmark@codeaurora.org: resolve conflicts relating to
 MIGRATE_HIGHATOMIC and some other trivial merge conflicts]
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2016-04-13 11:11:30 -07:00
Liam Mark
1426d1f8d9 lowmemorykiller: Don't count swap cache pages twice
The lowmem_shrink function discounts all the swap cache pages from
the file cache count. The zone aware code also discounts all file
cache pages from a certain zone.  This results in some swap cache
pages being discounted twice, which can result in the low memory
killer being unnecessarily aggressive.

Fix the low memory killer to only discount the swap cache pages
once.

Change-Id: I650bbfbf0fbbabd01d82bdb3502b57ff59c3e14f
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-04-13 11:11:01 -07:00
Liam Mark
92c1fefed5 android/lowmemorykiller: Selectively count free CMA pages
In certain memory configurations there can be a large number of
CMA pages which are not suitable to satisfy certain memory
requests.

This large number of unsuitable pages can cause the
lowmemorykiller to not kill any tasks because the
lowmemorykiller counts all free pages.
In order to ensure the lowmemorykiller properly evaluates the
free memory only count the free CMA pages if they are suitable
for satisfying the memory request.

Change-Id: I7f06d53e2d8cfe7439e5561fe6e5209ce73b1c90
CRs-fixed: 437016
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2016-04-13 11:09:54 -07:00
Alex Shi
08562bfcb8 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-04-13 12:02:21 +08:00
Vlastimil Babka
5dc7e939b6 mm/page_alloc: prevent merging between isolated and other pageblocks
commit d9dddbf556674bf125ecd925b24e43a5cf2a568a upstream.

Hanjun Guo has reported that a CMA stress test causes broken accounting of
CMA and free pages:

> Before the test, I got:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:          195044 kB
>
>
> After running the test:
> -bash-4.3# cat /proc/meminfo | grep Cma
> CmaTotal:         204800 kB
> CmaFree:         6602584 kB
>
> So the freed CMA memory is more than total..
>
> Also the the MemFree is more than mem total:
>
> -bash-4.3# cat /proc/meminfo
> MemTotal:       16342016 kB
> MemFree:        22367268 kB
> MemAvailable:   22370528 kB

Laura Abbott has confirmed the issue and suspected the freepage accounting
rewrite around 3.18/4.0 by Joonsoo Kim.  Joonsoo had a theory that this is
caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
pageblocks:

> CMA isolates MAX_ORDER aligned blocks, but, during the process,
> partialy isolated block exists. If MAX_ORDER is 11 and
> pageblock_order is 9, two pageblocks make up MAX_ORDER
> aligned block and I can think following scenario because pageblock
> (un)isolation would be done one by one.
>
> (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
> MIGRATE_ISOLATE, respectively.
>
> CC -> IC -> II (Isolation)
> II -> CI -> CC (Un-isolation)
>
> If some pages are freed at this intermediate state such as IC or CI,
> that page could be merged to the other page that is resident on
> different type of pageblock and it will cause wrong freepage count.

This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
but since it doesn't hold the zone->lock between pageblocks, a race
window does exist.

It's also likely that unexpected merging can occur between
MIGRATE_ISOLATE and non-CMA pageblocks.  This should be prevented in
__free_one_page() since commit 3c605096d3 ("mm/page_alloc: restrict
max order of merging on isolated pageblock").  However, we only check
the migratetype of the pageblock where buddy merging has been initiated,
not the migratetype of the buddy pageblock (or group of pageblocks)
which can be MIGRATE_ISOLATE.

Joonsoo has suggested checking for buddy migratetype as part of
page_is_buddy(), but that would add extra checks in allocator hotpath
and bloat-o-meter has shown significant code bloat (the function is
inline).

This patch reduces the bloat at some expense of more complicated code.
The buddy-merging while-loop in __free_one_page() is initially bounded
to pageblock_border and without any migratetype checks.  The checks are
placed outside, bumping the max_order if merging is allowed, and
returning to the while-loop with a statement which can't be possibly
considered harmful.

This fixes the accounting bug and also removes the arguably weird state
in the original commit 3c605096d3 where buddies could be left
unmerged.

Fixes: 3c605096d3 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
Link: https://lkml.org/lkml/2016/3/2/280
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Tested-by: Hanjun Guo <guohanjun@huawei.com>
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Debugged-by: Laura Abbott <labbott@redhat.com>
Debugged-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:09:05 -07:00
Johannes Weiner
0ccab5b139 mm: memcontrol: reclaim and OOM kill when shrinking memory.max below usage
commit b6e6edcfa40561e9c8abe5eecf1c96f8e5fd9c6f upstream.

Setting the original memory.limit_in_bytes hardlimit is subject to a
race condition when the desired value is below the current usage.  The
code tries a few times to first reclaim and then see if the usage has
dropped to where we would like it to be, but there is no locking, and
the workload is free to continue making new charges up to the old limit.
Thus, attempting to shrink a workload relies on pure luck and hope that
the workload happens to cooperate.

To fix this in the cgroup2 memory.max knob, do it the other way round:
set the limit first, then try enforcement.  And if reclaim is not able
to succeed, trigger OOM kills in the group.  Keep going until the new
limit is met, we run out of OOM victims and there's only unreclaimable
memory left, or the task writing to memory.max is killed.  This allows
users to shrink groups reliably, and the behavior is consistent with
what happens when new charges are attempted in excess of memory.max.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:54 -07:00
Johannes Weiner
8b42fc47e1 mm: memcontrol: reclaim when shrinking memory.high below usage
commit 588083bb37a3cea8533c392370a554417c8f29cb upstream.

When setting memory.high below usage, nothing happens until the next
charge comes along, and then it will only reclaim its own charge and not
the now potentially huge excess of the new memory.high.  This can cause
groups to stay in excess of their memory.high indefinitely.

To fix that, when shrinking memory.high, kick off a reclaim cycle that
goes after the delta.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:53 -07:00
Guenter Roeck
a7b7a225c1 mm: Export do_munmap
The 0-day build bot reports the following build error, seen if SDCARD_FS
is built as module.

ERROR: "do_munmap" undefined!

Fixes: 84a1b7d3d3 ("Included sdcardfs source code for kernel 3.0")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-04-07 16:50:04 +05:30
Guenter Roeck
de679b3bbe mm: Export do_munmap
The 0-day build bot reports the following build error, seen if SDCARD_FS
is built as module.

ERROR: "do_munmap" undefined!

Fixes: 84a1b7d3d3 ("Included sdcardfs source code for kernel 3.0")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-03-29 02:04:52 +00:00
Prasad Sodagudi
8567b31169 Revert "debug-pagealloc: Panic on pagealloc corruption"
This reverts commit 022c1f3696f2 ("debug-pagealloc:
Panic on pagealloc corruption"). Kernel panic is seen
on MSM8937 with 32 bit mode, revert this patch till
root cause is identified.

Change-Id: I66f1bab7f8c836b8b7167ec05141656f34c3702c
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2016-03-25 16:04:09 -07:00
Vinayak Menon
e74a8a432e mm: vmstat: add pageoutclean
vmstat events currently count pgpgout, but that includes
only the writebacks, and not the reclaim of clean
pages. Add an event to count clean page evictions. This is
helpful to evaluate page thrashing cases.

Change-Id: Icfb797877a544a58c289074bdc290dfbc1384514
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-25 16:03:59 -07:00
Prasad Sodagudi
d099e491b8 debug-pagealloc: print physical address for detected corruption
It's sometimes useful to know the physical address which
has beencorrupted, especially in systems with multiple
bus masters and DMA engines the capability of writing
to memory. It's may also be useful for identifying the
location of failures of memory cells in cases of
device-specific corruption. So print the physical
start address of the page to help in these scenarios.

Change-Id: I081edd8b1c06913c0057a6cb9dda18077cfbdc30
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2016-03-25 16:03:55 -07:00
Vinayak Menon
2e21911abe mm: do not activate swap write failed pages
Sometime back a piece of code was added to activate
pages in pageout which failed to writeback. This was
done for the case of failed write to zram, with the
intention of reducing further zram writes. But this
does not make much sense because there can anyway be
other pages which the reclaim path can pick to swap
out.
And this particular logic has a problem. When a write
fails, the page is unlocked. Its locked again before
activating the page, but the page which is now in
swapcache can be brought back with its original mapping
through a fault, which can happen during this period.
This can result in random bugs, for e.g. when shrink_page_list
try to do try_to_free_swap. Here is one such case.
In this case PageSwapCache was cleared by the fault path.

"
zram: Error allocating memory for compressed page: 91433, size=4096
Write-error on swap-device (254:0:731464)
page:de866e80 count:3 mapcount:1 mapping:d5368941 index:0xb2ce5
flags: 0x80018(uptodate|dirty|swapbacked)
page dumped because: VM_BUG_ON_PAGE(!PageSwapCache(page))
"

CRs-Fixed: 988207
Change-Id: I26738d0f8dd3e2dfdb24c25edac24a7d968eeba0
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-25 16:03:35 -07:00
Prasad Sodagudi
4eace7df06 debug-pagealloc: Panic on pagealloc corruption
Currently, we just print the pagealloc corruption warnings and
proceed. Sometimes, we are getting multiple errors printed down
the line. It will be good to get the device state as early as
possible when we get the first pagealloc error.

Change-Id: I79155ac8a039b30a3a98d5dd1384d3923082712f
Signed-off-by: Subbaraman Narayanamurthy <subbaram@codeaurora.org>
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
2016-03-25 16:03:35 -07:00
Vinayak Menon
30781083ea mm: zbud: prevent softirq during zbud alloc, free and reclaim
The following deadlock is observed.

Core 2 waiting on mapping->tree_lock which is taken by core 6

do_raw_spin_lock
raw_spin_lock_irq
atomic_cmpxchg
page_freeze_refs
__remove_mapping
shrink_page_list
shrink_inactive_list
shrink_list
shrink_lruvec
shrink_zone
shrink_zones
do_try_to_free_pages
try_to_free_pages(?, ?, ?, ?)
__perform_reclaim
__alloc_pages_direct_reclaim
__alloc_pages_slowpath
__alloc_pages_nodemask
alloc_kmem_pages_node
alloc_thread_info_node
dup_task_struct
copy_process.part.56
do_fork
sys_clone
el0_svc_naked

Core 6 after taking mapping->tree_lock is waiting on zbud pool lock
which is held by core 5

zbud_alloc
zcache_store_page
__cleancache_put_page
cleancache_put_page
__delete_from_page_cache
spin_unlock_irq
__remove_mapping
shrink_page_list
shrink_inactive_list
shrink_list
shrink_lruvec
shrink_zone
bitmap_zero
__nodes_clear
kswapd_shrink_zone.constprop.58
balance_pgdat
kswapd_try_to_sleep
kswapd
kthread
ret_from_fork

Core 5 after taking zbud pool lock from zbud_free received an IRQ, and
after IRQ exit, softirqs were scheduled and end_page_writeback tried to
lock on mapping->tree_lock which is already held by Core 6. Deadlock.

do_raw_spin_lock
raw_spin_lock_irqsave
test_clear_page_writeba
end_page_writeback
ext4_finish_bio
ext4_end_bio
bio_endio
blk_update_request
end_clone_bio
bio_endio
blk_update_request
blk_update_bidi_request
blk_end_bidi_request
blk_end_request
mmc_blk_cmdq_complete_r
mmc_cmdq_softirq_done
blk_done_softirq
static_key_count
static_key_false
trace_softirq_exit
__do_softirq()
tick_irq_exit
irq_exit()
set_irq_regs
__handle_domain_irq
gic_handle_irq
el1_irq
exception
__list_del_entry
list_del
zbud_free
zcache_load_page
__cleancache_get_page(?

So protect zbud_alloc/free/reclaim with spink_lock_bh

CRs-Fixed: 986783
Change-Id: Ib0605b38e7371c29316ed81e43549a0b9503d531
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-25 16:03:17 -07:00
Shiraz Hashim
4c2cff20a2 mm: zcache: fix locking sequence
Deadlock is observed in zcache reclaim paths due to
different locking sequence.

Core#0:				    Core#1:
 |spin_bug()                         |do_raw_write_lock()
 |do_raw_spin_lock()                 |_raw_write_lock_irqsave()
 |_raw_spin_lock_irqsave()           |zcache_rbnode_isolate()
 |zcache_flush_inode()               |zcache_load_delete_zaddr()
 |__cleancache_invalidate_inode()    |zcache_evict_zpage()
 |truncate_inode_pages_range()       |zbud_reclaim_page()
 |truncate_inode_pages()             |zcache_scan()
 |truncate_inode_pages_final()       |shrink_slab_node()
 |ext4_evict_inode()                 |shrink_slab()
 |evict()                            |try_to_free_pages()
 |dispose_list()                     |__alloc_pages_nodemask()
 |prune_icache_sb()                  |alloc_kmem_pages_node()
 |super_cache_scan()                 |copy_process.part.52()
 |shrink_slab_node()                 |do_fork()
 |shrink_slab()                      |sys_clone()
 |kswapd_shrink_zone.constprop       |el0_svc()
 |balance_pgdat()
 |kswapd()
 |kthread()
 |ret_from_fork()

The deadlock happens because alternate sequence are
followed while taking
 zpool->rb_lock  (protects zpool rb tree), and
 rbnode->ra_lock (protects radix tree maintained by rbtree node)

Fix the sequence of locks being taken to avoid deadlock.

Change-Id: I32db23268f63eb8eb5aee30e4462c190e2e02f48
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-25 16:03:08 -07:00
Shiraz Hashim
1b7778354b mm: zbud: initialize object to 0 on GFP_ZERO
zbud_alloc if returns free object from pool must also
initialize it to 0 when asked to do so. The same is
already taken care if a fresh object is allocated.

CRs-fixed: 979234
Change-Id: Id171edf131df321385fcdcd7660d06da97689e3e
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-25 16:02:56 -07:00
Andrey Markovytch
b61ac21fb0 mm + fs: extends support for cache dropping
Exposes drop_pagecache_sb (required by eCryptfs cache wiping)
Adds truncate_inode_pages_fill_zero (required by eCryptfs cache wiping),
which not only truncates pages but also fills them with 0, so that the
cached data can no longer be retrieved.

Change-Id: Icfc18a2c8cdc922e71ee17add6459a1355e77ba6
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
[gbroner@codeaurora.org: fix merge conflict]
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:12 -07:00
Christoph Lameter
8cc0e37a56 vmstat: Remove BUG_ON from vmstat_update
If we detect that there is nothing to do just set the flag and do not
check if it was already set before.  Races really do not matter.  If the
flag is set by any code then the shepherd will start dealing with the
situation and reenable the vmstat workers when necessary again.

Since commit 0eb77e988032 ("vmstat: make vmstat_updater deferrable again
and shut down on idle") quiet_vmstat might update cpu_stat_off and mark
a particular cpu to be handled by vmstat_shepherd.  This might trigger a
VM_BUG_ON in vmstat_update because the work item might have been
sleeping during the idle period and see the cpu_stat_off updated after
the wake up.  The VM_BUG_ON is therefore misleading and no more
appropriate.  Moreover it doesn't really suite any protection from real
bugs because vmstat_shepherd will simply reschedule the vmstat_work
anytime it sees a particular cpu set or vmstat_update would do the same
from the worker context directly.  Even when the two would race the
result wouldn't be incorrect as the counters update is fully idempotent.

Change-Id: I4b46e471024ff4cac2b32234dffb3dfcf91713b6
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 587198ba5206cdf0d30855f7361af950a4172cd6
[shashim@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-23 21:22:15 -07:00
Christoph Lameter
8fcc9882c3 vmstat: make vmstat_updater deferrable again and shut down on idle
Currently the vmstat updater is not deferrable as a result of commit
ba4877b9ca ("vmstat: do not use deferrable delayed work for
vmstat_update").  This in turn can cause multiple interruptions of the
applications because the vmstat updater may run at

Make vmstate_update deferrable again and provide a function that folds
the differentials when the processor is going to idle mode thus
addressing the issue of the above commit in a clean way.

Note that the shepherd thread will continue scanning the differentials
from another processor and will reenable the vmstat workers if it
detects any changes.

Change-Id: Idf256cfacb40b4dc8dbb6795cf06b34e8fec7a06
Fixes: ba4877b9ca ("vmstat: do not use deferrable delayed work for vmstat_update")
Signed-off-by: Christoph Lameter <cl@linux.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Git-commit: 0eb77e9880321915322d42913c3b53241739c8aa
[shashim@codeaurora.org: resolve minor merge conflicts]
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
[satyap: resolve trivial merge conflicts]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2016-03-23 21:22:14 -07:00
Shiraz Hashim
8cda113324 drivers: dma-contiguous: remove cma regions from kmemleak scan
cma regions can be allocated for secure use cases where
the virtual mapping is removed. kmemleak scan can try to
scan through this mapping causing mmu faults. Hence free
cma regions from kmemleak scan.

Change-Id: I2cc66423ec9ec539905cc8d07bbc40fa43e695bd
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-23 21:20:39 -07:00
Shiraz Hashim
2f879b6dcb mm/memblock: fix a race between search and remove
no-map-fixup feature for dma-removed reserve region does a
late memblock remove. This late removal is a slight
deviation from conventional memblock flow. That leads to a
race between pfn_valid->memblock_search, and
memblock_remove.

To fix this, use a read seqlock in memblock_search, which
would ensure minimum overhead in search path. And export
two APIs to let code doing late memblock remove apply
write seqlock. write seqlock would ensure that search
retries if the list is updated concurrently.

The two exported APIs which should be called before
and after modifying memblock regions, late in boot, are
   - memblock_region_resize_late_begin
   - memblock_region_resize_late_end

The code to alter memblock regions should be guarded by
these APIs.

CRs-fixed: 967728
Change-Id: I6a10c3e980002048aafeaf829a16119848c6a099
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-23 21:19:51 -07:00
Vinayak Menon
91212fa8e0 mm: zcache: fix use after free in zcache_store_page
There is a chance of zbud handle being used after a free.
Unable to handle kernel paging request at virtual address ffffffc05be72040
PC is at zcache_store_page+0x59c/0x618
LR is at zcache_store_page+0x59c/0x618
[<ffffffc00019c99c>] zcache_store_page+0x59c/0x618
[<ffffffc0001a70c4>] __cleancache_put_page+0x94/0xcc
[<ffffffc00015da4c>] __delete_from_page_cache+0xc0/0x2cc
[<ffffffc00016d230>] __remove_mapping+0xe4/0x128
[<ffffffc00016e750>] shrink_page_list+0x634/0x95c
[<ffffffc00016f32c>] shrink_inactive_list+0x41c/0x67c
[<ffffffc00016fc14>] shrink_lruvec+0x364/0x510
[<ffffffc00016fe10>] shrink_zone+0x50/0x12c
[<ffffffc000170278>] try_to_free_pages+0x38c/0x56c
[<ffffffc000164e4c>] __alloc_pages_nodemask+0x5e0/0x994
[<ffffffc000165214>] __get_free_pages+0x14/0x60

CRs-Fixed: 968859
Change-Id: I24f6cf8ccbac956d4c3114e70a9f94f5e3bfa1c8
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:19:12 -07:00
Vinayak Menon
ad6f486284 mm: swap: swap ratio support
Add support to receive a static ratio from userspace to
divide the swap pages between ZRAM and disk based swap
devices. The existing infrastructure allows to keep
same priority for multiple swap devices, which results
in round robin distribution of pages. With this patch,
the ratio can be defined.

CRs-fixed: 968416
Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:19:04 -07:00
Shiraz Hashim
38a6f0388b mm/memblock: call kmemleak only for logically mapped region
On 32bit, memblock_alloc_range_nid can return block of
memory from highmem. Ensure kmemleak is called only for
logically mapped region.

Change-Id: I4491f199437d8401071ba5f98a8b0eedbaf47b00
Signed-off-by: Shiraz Hashim <shashim@codeaurora.org>
2016-03-23 21:18:18 -07:00
Vinayak Menon
df756d1f8b mm: zcache: reduce prints on store error
zcache store can fail on lowmemory conditions.
Do not throw unnecessary prints on such events, causing
a stall.

Change-Id: I8fba6938f75ef5b6054ee079951b7409279e1c02
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:16:46 -07:00
Vinayak Menon
a360ae6982 mm: zcache: fix race between store and evict
The following race is possible:

CPU 1				CPU2
zcache_store_page
zbud_alloc
				zcache_evict_zpage
				zpool = zhandle->zpool;
				CRASH

zcache_store_page
zhandle->zpool = zpool

Fix this by properly initializing the zhandle and validating
in zcache_evict_zpage

Change-Id: I02328220b30f415fa1f171236eab3a2e40072fd9
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:16:45 -07:00
Vinayak Menon
d7749d7f8e mm: memory: reduce fault_around_bytes
mapping multiple pages on a fault result in page_check_references
hitting more number of referenced inactive pages and this results
in increased pressure on reclaim. Reduce it to the lowest possible
value. Reduced kswapd wakeups are observed with this change.

Change-Id: I03c6cac9f28fa328abab7b40f5f01144084a147c
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:16:02 -07:00
Vinayak Menon
4d09f895a3 mm: zcache: clear zcache when low on file pages
When files pages are very low, it is better to clear off
zcache pages, since the freed memory can be used to sustain
an application in foreground. Moreover when file pages
are too low, we don't gain much by holding a few zcache
pages.

Change-Id: I88dd295d24b7de18fb3bc0788e0baeb6bfdb2f6d
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:15:20 -07:00
Vinayak Menon
90a22ec53a mm: zcache: add zero page support
zero pages need not be compressed and stored unnecessarily.
Rather insert a special handle to the radix tree to identify
a zero page.

Change-Id: Ic92321c4753401a90d69a6e8c61b5119168c9df7
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:15:19 -07:00
Vinayak Menon
347c419ead mm: zcache: do not wake up kswapd for zcache allocs
zcache allocations happen during reclaim of pages,
and waking up kswapd at this time hinders system
performance. During tests it was seen that without
this patch, there were a lot of kswapd wakeups and
was resulting in bad launches.

Change-Id: Ic0f0240b8fdad6b3fe142b2bdc0366cfd870635e
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:15:17 -07:00
Vinayak Menon
5ef8272721 mm: zcache: fix accouting of pool pages
zcache_pool_pages is supposed to store the total
pages in all the pools used by zcache. But at
present zcache_pool_pages is assigned pages of any
particular pool on a load, store or any operation
which modifies the pool size. And this is more
important for external clients which depend on zcache
pool size, like the lowmemorykiller.

Change-Id: Ifdaab8646c40f1fec71dfa5903658fbdc6b3cce5
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:15:17 -07:00
Vinayak Menon
aefb461740 mm: zcache: shrink zcache on memory pressure
When file pages drops down, at a point zcache pages
can be more than the file pages, and even further when
we have to reclaim the maximum number of file pages
possible to launch an application, we need a way to
reclaim even the zcache pages at least to an extend
which makes it match with the number of file pages.
This can help in better foreground headroom.

Change-Id: I481bfb9961ed5cee47ebeae08eb910bb269b644c
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:15:16 -07:00