Commit graph

9967 commits

Author SHA1 Message Date
Dmitry Vyukov
97bf1066e3 UPSTREAM: kasan: resched in quarantine_remove_cache()
We see reported stalls/lockups in quarantine_remove_cache() on machines
with large amounts of RAM.  quarantine_remove_cache() needs to scan
whole quarantine in order to take out all objects belonging to the
cache.  Quarantine is currently 1/32-th of RAM, e.g.  on a machine with
256GB of memory that will be 8GB.  Moreover quarantine scanning is a
walk over uncached linked list, which is slow.

Add cond_resched() after scanning of each non-empty batch of objects.
Batches are specifically kept of reasonable size for quarantine_put().
On a machine with 256GB of RAM we should have ~512 non-empty batches,
each with 16MB of objects.

Link: http://lkml.kernel.org/r/20170308154239.25440-1-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 68fd814a3391c7e017ae6ace8855788748edd981)
Change-Id: I8a38466a9b9544bb303202c94bfba6201251e3f3
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:25:05 -08:00
Ingo Molnar
16a34d5679 BACKPORT: kasan, sched/headers: Uninline kasan_enable/disable_current()
<linux/kasan.h> is a low level header that is included early
in affected kernel headers. But it includes <linux/sched.h>
which complicates the cleanup of sched.h dependencies.

But kasan.h has almost no need for sched.h: its only use of
scheduler functionality is in two inline functions which are
not used very frequently - so uninline kasan_enable_current()
and kasan_disable_current().

Also add a <linux/sched.h> dependency to a .c file that depended
on kasan.h including it.

This paves the way to remove the <linux/sched.h> include from kasan.h.

Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 64145065
(cherry-picked from af8601ad420f6afa6445c927ad9f36d9700d96d6)
Change-Id: I13fd2d3927f663d694ea0d5bf44f18e2c62ae013
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:25:01 -08:00
Greg Thelen
c0e5ed2db4 BACKPORT: kasan: drain quarantine of memcg slab objects
Per memcg slab accounting and kasan have a problem with kmem_cache
destruction.
 - kmem_cache_create() allocates a kmem_cache, which is used for
   allocations from processes running in root (top) memcg.
 - Processes running in non root memcg and allocating with either
   __GFP_ACCOUNT or from a SLAB_ACCOUNT cache use a per memcg
   kmem_cache.
 - Kasan catches use-after-free by having kfree() and kmem_cache_free()
   defer freeing of objects. Objects are placed in a quarantine.
 - kmem_cache_destroy() destroys root and non root kmem_caches. It takes
   care to drain the quarantine of objects from the root memcg's
   kmem_cache, but ignores objects associated with non root memcg. This
   causes leaks because quarantined per memcg objects refer to per memcg
   kmem cache being destroyed.

To see the problem:

 1) create a slab cache with kmem_cache_create(,,,SLAB_ACCOUNT,)
 2) from non root memcg, allocate and free a few objects from cache
 3) dispose of the cache with kmem_cache_destroy() kmem_cache_destroy()
    will trigger a "Slab cache still has objects" warning indicating
    that the per memcg kmem_cache structure was leaked.

Fix the leak by draining kasan quarantined objects allocated from non
root memcg.

Racing memcg deletion is tricky, but handled.  kmem_cache_destroy() =>
shutdown_memcg_caches() => __shutdown_memcg_cache() => shutdown_cache()
flushes per memcg quarantined objects, even if that memcg has been
rmdir'd and gone through memcg_deactivate_kmem_caches().

This leak only affects destroyed SLAB_ACCOUNT kmem caches when kasan is
enabled.  So I don't think it's worth patching stable kernels.

Link: http://lkml.kernel.org/r/1482257462-36948-1-git-send-email-gthelen@google.com
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from f9fa1d919c696e90c887d8742198023e7639d139)
Change-Id: Ie054d9cde7fb1ce62e65776bff5a70f72925d037
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:24:57 -08:00
Dmitry Vyukov
9e84d95fd7 UPSTREAM: kasan: eliminate long stalls during quarantine reduction
Currently we dedicate 1/32 of RAM for quarantine and then reduce it by
1/4 of total quarantine size.  This can be a significant amount of
memory.  For example, with 4GB of RAM total quarantine size is 128MB and
it is reduced by 32MB at a time.  With 128GB of RAM total quarantine
size is 4GB and it is reduced by 1GB.  This leads to several problems:

 - freeing 1GB can take tens of seconds, causes rcu stall warnings and
   just introduces unexpected long delays at random places
 - if kmalloc() is called under a mutex, other threads stall on that
   mutex while a thread reduces quarantine
 - threads wait on quarantine_lock while one thread grabs a large batch
   of objects to evict
 - we walk the uncached list of object to free twice which makes all of
   the above worse
 - when a thread frees objects, they are already not accounted against
   global_quarantine.bytes; as the result we can have quarantine_size
   bytes in quarantine + unbounded amount of memory in large batches in
   threads that are in process of freeing

Reduce size of quarantine in smaller batches to reduce the delays.  The
only reason to reduce it in batches is amortization of overheads, the
new batch size of 1MB should be well enough to amortize spinlock
lock/unlock and few function calls.

Plus organize quarantine as a FIFO array of batches.  This allows to not
walk the list in quarantine_reduce() under quarantine_lock, which in
turn reduces contention and is just faster.

This improves performance of heavy load (syzkaller fuzzing) by ~20% with
4 CPUs and 32GB of RAM.  Also this eliminates frequent (every 5 sec)
drops of CPU consumption from ~400% to ~100% (one thread reduces
quarantine while others are waiting on a mutex).

Some reference numbers:
1. Machine with 4 CPUs and 4GB of memory. Quarantine size 128MB.
   Currently we free 32MB at at time.
   With new code we free 1MB at a time (1024 batches, ~128 are used).
2. Machine with 32 CPUs and 128GB of memory. Quarantine size 4GB.
   Currently we free 1GB at at time.
   With new code we free 8MB at a time (1024 batches, ~512 are used).
3. Machine with 4096 CPUs and 1TB of memory. Quarantine size 32GB.
   Currently we free 8GB at at time.
   With new code we free 4MB at a time (16K batches, ~8K are used).

Link: http://lkml.kernel.org/r/1478756952-18695-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 64abdcb24351a27bed6e2b6a3c27348fe532c73f)
Change-Id: Idf73cb292453ceffc437121b7a5e152cde1901ff
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:24:52 -08:00
Dmitry Vyukov
418991b870 UPSTREAM: kasan: support panic_on_warn
If user sets panic_on_warn, he wants kernel to panic if there is
anything barely wrong with the kernel.  KASAN-detected errors are
definitely not less benign than an arbitrary kernel WARNING.

Panic after KASAN errors if panic_on_warn is set.

We use this for continuous fuzzing where we want kernel to stop and
reboot on any error.

Link: http://lkml.kernel.org/r/1476694764-31986-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 5c5c1f36cedfb51ec291181e71817f7fe7e03ee2)
Change-Id: Iee7cbc4ffbce8eb8d827447fdf960a6520d10b00
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:24:47 -08:00
Josh Poimboeuf
915047a1de UPSTREAM: x86/suspend: fix false positive KASAN warning on suspend/resume
Resuming from a suspend operation is showing a KASAN false positive
warning:

  BUG: KASAN: stack-out-of-bounds in unwind_get_return_address+0x11d/0x130 at addr ffff8803867d7878
  Read of size 8 by task pm-suspend/7774
  page:ffffea000e19f5c0 count:0 mapcount:0 mapping:          (null) index:0x0
  flags: 0x2ffff0000000000()
  page dumped because: kasan: bad access detected
  CPU: 0 PID: 7774 Comm: pm-suspend Tainted: G    B           4.9.0-rc7+ #8
  Hardware name: Gigabyte Technology Co., Ltd. Z170X-UD5/Z170X-UD5-CF, BIOS F5 03/07/2016
  Call Trace:
    dump_stack+0x63/0x82
    kasan_report_error+0x4b4/0x4e0
    ? acpi_hw_read_port+0xd0/0x1ea
    ? kfree_const+0x22/0x30
    ? acpi_hw_validate_io_request+0x1a6/0x1a6
    __asan_report_load8_noabort+0x61/0x70
    ? unwind_get_return_address+0x11d/0x130
    unwind_get_return_address+0x11d/0x130
    ? unwind_next_frame+0x97/0xf0
    __save_stack_trace+0x92/0x100
    save_stack_trace+0x1b/0x20
    save_stack+0x46/0xd0
    ? save_stack_trace+0x1b/0x20
    ? save_stack+0x46/0xd0
    ? kasan_kmalloc+0xad/0xe0
    ? kasan_slab_alloc+0x12/0x20
    ? acpi_hw_read+0x2b6/0x3aa
    ? acpi_hw_validate_register+0x20b/0x20b
    ? acpi_hw_write_port+0x72/0xc7
    ? acpi_hw_write+0x11f/0x15f
    ? acpi_hw_read_multiple+0x19f/0x19f
    ? memcpy+0x45/0x50
    ? acpi_hw_write_port+0x72/0xc7
    ? acpi_hw_write+0x11f/0x15f
    ? acpi_hw_read_multiple+0x19f/0x19f
    ? kasan_unpoison_shadow+0x36/0x50
    kasan_kmalloc+0xad/0xe0
    kasan_slab_alloc+0x12/0x20
    kmem_cache_alloc_trace+0xbc/0x1e0
    ? acpi_get_sleep_type_data+0x9a/0x578
    acpi_get_sleep_type_data+0x9a/0x578
    acpi_hw_legacy_wake_prep+0x88/0x22c
    ? acpi_hw_legacy_sleep+0x3c7/0x3c7
    ? acpi_write_bit_register+0x28d/0x2d3
    ? acpi_read_bit_register+0x19b/0x19b
    acpi_hw_sleep_dispatch+0xb5/0xba
    acpi_leave_sleep_state_prep+0x17/0x19
    acpi_suspend_enter+0x154/0x1e0
    ? trace_suspend_resume+0xe8/0xe8
    suspend_devices_and_enter+0xb09/0xdb0
    ? printk+0xa8/0xd8
    ? arch_suspend_enable_irqs+0x20/0x20
    ? try_to_freeze_tasks+0x295/0x600
    pm_suspend+0x6c9/0x780
    ? finish_wait+0x1f0/0x1f0
    ? suspend_devices_and_enter+0xdb0/0xdb0
    state_store+0xa2/0x120
    ? kobj_attr_show+0x60/0x60
    kobj_attr_store+0x36/0x70
    sysfs_kf_write+0x131/0x200
    kernfs_fop_write+0x295/0x3f0
    __vfs_write+0xef/0x760
    ? handle_mm_fault+0x1346/0x35e0
    ? do_iter_readv_writev+0x660/0x660
    ? __pmd_alloc+0x310/0x310
    ? do_lock_file_wait+0x1e0/0x1e0
    ? apparmor_file_permission+0x18/0x20
    ? security_file_permission+0x73/0x1c0
    ? rw_verify_area+0xbd/0x2b0
    vfs_write+0x149/0x4a0
    SyS_write+0xd9/0x1c0
    ? SyS_read+0x1c0/0x1c0
    entry_SYSCALL_64_fastpath+0x1e/0xad
  Memory state around the buggy address:
   ffff8803867d7700: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   ffff8803867d7780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  >ffff8803867d7800: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 f4
                                                                  ^
   ffff8803867d7880: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
   ffff8803867d7900: 00 00 00 f1 f1 f1 f1 04 f4 f4 f4 f3 f3 f3 f3 00

KASAN instrumentation poisons the stack when entering a function and
unpoisons it when exiting the function.  However, in the suspend path,
some functions never return, so their stack never gets unpoisoned,
resulting in stale KASAN shadow data which can cause later false
positive warnings like the one above.

Reported-by: Scott Bauer <scott.bauer@intel.com>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

Bug: 64145065
(cherry-picked from b53f40db59b27b62bc294c30506b02a0cae47e0b)
Change-Id: Iafbf1f7f19bb7db9a49316cf70050a3dae576f15
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:24:41 -08:00
Dmitry Vyukov
1d16d8b51e UPSTREAM: kasan: support use-after-scope detection
Gcc revision 241896 implements use-after-scope detection.  Will be
available in gcc 7.  Support it in KASAN.

Gcc emits 2 new callbacks to poison/unpoison large stack objects when
they go in/out of scope.  Implement the callbacks and add a test.

[dvyukov@google.com: v3]
  Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com
Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: <stable@vger.kernel.org>	[4.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 828347f8f9a558cf1af2faa46387a26564f2ac3e)
Change-Id: Ib9cb585efbe98ba11a7efbd233ebd97cb4214a92
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:24:13 -08:00
Dmitry Vyukov
2cb9e02424 BACKPORT: kprobes: Unpoison stack in jprobe_return() for KASAN
I observed false KSAN positives in the sctp code, when
sctp uses jprobe_return() in jsctp_sf_eat_sack().

The stray 0xf4 in shadow memory are stack redzones:

[     ] ==================================================================
[     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
[     ] Read of size 1 by task syz-executor/18535
[     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
[     ] flags: 0x1fffc0000000000()
[     ] page dumped because: kasan: bad access detected
[     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
[     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
[     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
[     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
[     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
[     ] Call Trace:
[     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
[     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
[     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
[     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
[     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
[     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
[     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
[     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
[     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
[     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
[     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
[     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
[     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
[     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
[     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
[     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
[     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
[     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
[ ... ]
[     ] Memory state around the buggy address:
[     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]                    ^
[     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[     ] ==================================================================

KASAN stack instrumentation poisons stack redzones on function entry
and unpoisons them on function exit. If a function exits abnormally
(e.g. with a longjmp like jprobe_return()), stack redzones are left
poisoned. Later this leads to random KASAN false reports.

Unpoison stack redzones in the frames we are going to jump over
before doing actual longjmp in jprobe_return().

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: kasan-dev@googlegroups.com
Cc: surovegin@google.com
Cc: rostedt@goodmis.org
Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

Bug: 64145065
(cherry-picked from 9f7d416c36124667c406978bcb39746589c35d7f)
Change-Id: I84e4fac44265a69f615601266b3415147dade633
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:22:14 -08:00
Alexander Potapenko
bfe4adad69 UPSTREAM: kasan: remove the unnecessary WARN_ONCE from quarantine.c
It's quite unlikely that the user will so little memory that the per-CPU
quarantines won't fit into the given fraction of the available memory.
Even in that case he won't be able to do anything with the information
given in the warning.

Link: http://lkml.kernel.org/r/1470929182-101413-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from bcbf0d566b6e59a6e873bfe415cc415111a819e2)
Change-Id: I1230018140c32fab7ea1d1dc1d54471aa48ae45f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:22:09 -08:00
Alexander Potapenko
a9674f4e53 UPSTREAM: kasan: avoid overflowing quarantine size on low memory systems
If the total amount of memory assigned to quarantine is less than the
amount of memory assigned to per-cpu quarantines, |new_quarantine_size|
may overflow.  Instead, set it to zero.

[akpm@linux-foundation.org: cleanup: use WARN_ONCE return value]
Link: http://lkml.kernel.org/r/1470063563-96266-1-git-send-email-glider@google.com
Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Signed-off-by: Alexander Potapenko <glider@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from c3cee372282cb6bcdf19ac1457581d5dd5ecb554)
Change-Id: I8a647e5ee5d9494698aa2a31d50d587d6ff8b65c
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:22:05 -08:00
Andrey Ryabinin
2771cfaf23 UPSTREAM: kasan: improve double-free reports
Currently we just dump stack in case of double free bug.
Let's dump all info about the object that we have.

[aryabinin@virtuozzo.com: change double free message per Alexander]
  Link: http://lkml.kernel.org/r/1470153654-30160-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1470062715-14077-6-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 7e088978933ee186533355ae03a9dc1de99cf6c7)
Change-Id: I733bf6272d44597907bcf01f1d13695b8e9f8cb4
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:22:00 -08:00
Joe Perches
1a44264ae8 BACKPORT: mm: coalesce split strings
Kernel style prefers a single string over split strings when the string is
'user-visible'.

Miscellanea:

 - Add a missing newline
 - Realign arguments

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Tejun Heo <tj@kernel.org>	[percpu]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 756a025f00091918d9d09ca3229defb160b409c0)
Change-Id: I377fb1542980c15d2f306924656227ad17b02b5e
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:21:34 -08:00
Andrey Ryabinin
508ad7fe89 BACKPORT: mm/kasan: get rid of ->state in struct kasan_alloc_meta
The state of object currently tracked in two places - shadow memory, and
the ->state field in struct kasan_alloc_meta.  We can get rid of the
latter.  The will save us a little bit of memory.  Also, this allow us
to move free stack into struct kasan_alloc_meta, without increasing
memory consumption.  So now we should always know when the last time the
object was freed.  This may be useful for long delayed use-after-free
bugs.

As a side effect this fixes following UBSAN warning:
	UBSAN: Undefined behaviour in mm/kasan/quarantine.c:102:13
	member access within misaligned address ffff88000d1efebc for type 'struct qlist_node'
	which requires 8 byte alignment

Link: http://lkml.kernel.org/r/1470062715-14077-5-git-send-email-aryabinin@virtuozzo.com
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from b3cbd9bf77cd1888114dbee1653e79aa23fd4068)
Change-Id: Iaa4959a78ffd2e49f9060099df1fb32483df3085
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:21:29 -08:00
Andrey Ryabinin
1f93a556df UPSTREAM: mm/kasan: get rid of ->alloc_size in struct kasan_alloc_meta
Size of slab object already stored in cache->object_size.

Note, that kmalloc() internally rounds up size of allocation, so
object_size may be not equal to alloc_size, but, usually we don't need
to know the exact size of allocated object.  In case if we need that
information, we still can figure it out from the report.  The dump of
shadow memory allows to identify the end of allocated memory, and
thereby the exact allocation size.

Link: http://lkml.kernel.org/r/1470062715-14077-4-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 47b5c2a0f021e90a79845d1a1353780e5edd0bce)
Change-Id: I76b555f9a8469f685607ca50f6c51b2e0ad1b4ab
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:21:25 -08:00
Andrey Ryabinin
30d8b5b402 UPSTREAM: mm: kasan: remove unused 'reserved' field from struct kasan_alloc_meta
Commit cd11016e5f52 ("mm, kasan: stackdepot implementation.  Enable
stackdepot for SLAB") added 'reserved' field, but never used it.

Link: http://lkml.kernel.org/r/1464021054-2307-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 9725759a96efb1ce56a1b93455ac0ab1901c5327)
Change-Id: I34d5d28a6f6e1014d234f38c23b6e4aa408d3e84
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:21:02 -08:00
Andrey Ryabinin
101a6c23d8 UPSTREAM: mm/kasan, slub: don't disable interrupts when object leaves quarantine
SLUB doesn't require disabled interrupts to call ___cache_free().

Link: http://lkml.kernel.org/r/1470062715-14077-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from f7376aed6c032aab820fa36806a89e16e353a0d9)
Change-Id: I9c8ae37791ab10c746414322a672bdf0ebd1ed9f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:57 -08:00
Andrey Ryabinin
f3bc37a781 UPSTREAM: mm/kasan: don't reduce quarantine in atomic contexts
Currently we call quarantine_reduce() for ___GFP_KSWAPD_RECLAIM (implied
by __GFP_RECLAIM) allocation.  So, basically we call it on almost every
allocation.  quarantine_reduce() sometimes is heavy operation, and
calling it with disabled interrupts may trigger hard LOCKUP:

 NMI watchdog: Watchdog detected hard LOCKUP on cpu 2irq event stamp: 1411258
 Call Trace:
  <NMI>   dump_stack+0x68/0x96
   watchdog_overflow_callback+0x15b/0x190
   __perf_event_overflow+0x1b1/0x540
   perf_event_overflow+0x14/0x20
   intel_pmu_handle_irq+0x36a/0xad0
   perf_event_nmi_handler+0x2c/0x50
   nmi_handle+0x128/0x480
   default_do_nmi+0xb2/0x210
   do_nmi+0x1aa/0x220
   end_repeat_nmi+0x1a/0x1e
  <<EOE>>   __kernel_text_address+0x86/0xb0
   print_context_stack+0x7b/0x100
   dump_trace+0x12b/0x350
   save_stack_trace+0x2b/0x50
   set_track+0x83/0x140
   free_debug_processing+0x1aa/0x420
   __slab_free+0x1d6/0x2e0
   ___cache_free+0xb6/0xd0
   qlist_free_all+0x83/0x100
   quarantine_reduce+0x177/0x1b0
   kasan_kmalloc+0xf3/0x100

Reduce the quarantine_reduce iff direct reclaim is allowed.

Fixes: 55834c59098d("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/1470062715-14077-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 4b3ec5a3f4b1d5c9d64b9ab704042400d050d432)
Change-Id: I7e6ad29acabc2091f98a8aac54ed041b574b5e7e
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:52 -08:00
Andrey Ryabinin
13134b919b UPSTREAM: mm/kasan: fix corruptions and false positive reports
Once an object is put into quarantine, we no longer own it, i.e.  object
could leave the quarantine and be reallocated.  So having set_track()
call after the quarantine_put() may corrupt slab objects.

 BUG kmalloc-4096 (Not tainted): Poison overwritten
 -----------------------------------------------------------------------------
 Disabling lock debugging due to kernel taint
 INFO: 0xffff8804540de850-0xffff8804540de857. First byte 0xb5 instead of 0x6b
...
 INFO: Freed in qlist_free_all+0x42/0x100 age=75 cpu=3 pid=24492
  __slab_free+0x1d6/0x2e0
  ___cache_free+0xb6/0xd0
  qlist_free_all+0x83/0x100
  quarantine_reduce+0x177/0x1b0
  kasan_kmalloc+0xf3/0x100
  kasan_slab_alloc+0x12/0x20
  kmem_cache_alloc+0x109/0x3e0
  mmap_region+0x53e/0xe40
  do_mmap+0x70f/0xa50
  vm_mmap_pgoff+0x147/0x1b0
  SyS_mmap_pgoff+0x2c7/0x5b0
  SyS_mmap+0x1b/0x30
  do_syscall_64+0x1a0/0x4e0
  return_from_SYSCALL_64+0x0/0x7a
 INFO: Slab 0xffffea0011503600 objects=7 used=7 fp=0x          (null) flags=0x8000000000004080
 INFO: Object 0xffff8804540de848 @offset=26696 fp=0xffff8804540dc588
 Redzone ffff8804540de840: bb bb bb bb bb bb bb bb                          ........
 Object ffff8804540de848: 6b 6b 6b 6b 6b 6b 6b 6b b5 52 00 00 f2 01 60 cc  kkkkkkkk.R....`.

Similarly, poisoning after the quarantine_put() leads to false positive
use-after-free reports:

 BUG: KASAN: use-after-free in anon_vma_interval_tree_insert+0x304/0x430 at addr ffff880405c540a0
 Read of size 8 by task trinity-c0/3036
 CPU: 0 PID: 3036 Comm: trinity-c0 Not tainted 4.7.0-think+ #9
 Call Trace:
   dump_stack+0x68/0x96
   kasan_report_error+0x222/0x600
   __asan_report_load8_noabort+0x61/0x70
   anon_vma_interval_tree_insert+0x304/0x430
   anon_vma_chain_link+0x91/0xd0
   anon_vma_clone+0x136/0x3f0
   anon_vma_fork+0x81/0x4c0
   copy_process.part.47+0x2c43/0x5b20
   _do_fork+0x16d/0xbd0
   SyS_clone+0x19/0x20
   do_syscall_64+0x1a0/0x4e0
   entry_SYSCALL64_slow_path+0x25/0x25

Fix this by putting an object in the quarantine after all other
operations.

Fixes: 80a9201a5965 ("mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB")
Link: http://lkml.kernel.org/r/1470062715-14077-1-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Sasha Levin <alexander.levin@verizon.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 4a3d308d6674fabf213bce9c1a661ef43a85e515)
Change-Id: Iaa699c447b97f8cb04afdd2d6a5f572bea439185
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:46 -08:00
Paul Lawrence
85f8b42430 BACKPORT: mm, kasan: switch SLUB to stackdepot, enable memory quarantine for SLUB
For KASAN builds:
 - switch SLUB allocator to using stackdepot instead of storing the
   allocation/deallocation stacks in the objects;
 - change the freelist hook so that parts of the freelist can be put
   into the quarantine.

[aryabinin@virtuozzo.com: fixes]
  Link: http://lkml.kernel.org/r/1468601423-28676-1-git-send-email-aryabinin@virtuozzo.com
Link: http://lkml.kernel.org/r/1468347165-41906-3-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 80a9201a5965f4715d5c09790862e0df84ce0614)
Change-Id: I2b59c6d50d0db62d3609edfdc7be54e48f8afa5c
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:37 -08:00
Joonsoo Kim
850627ab0f UPSTREAM: kasan/quarantine: fix bugs on qlist_move_cache()
There are two bugs on qlist_move_cache().  One is that qlist's tail
isn't set properly.  curr->next can be NULL since it is singly linked
list and NULL value on tail is invalid if there is one item on qlist.
Another one is that if cache is matched, qlist_put() is called and it
will set curr->next to NULL.  It would cause to stop the loop
prematurely.

These problems come from complicated implementation so I'd like to
re-implement it completely.  Implementation in this patch is really
simple.  Iterate all qlist_nodes and put them to appropriate list.

Unfortunately, I got this bug sometime ago and lose oops message.  But,
the bug looks trivial and no need to attach oops.

Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/1467766348-22419-1-git-send-email-iamjoonsoo.kim@lge.com
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Kuthonuzo Luruo <poll.stdin@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 0ab686d8c8303069e80300663b3be6201a8697fb)
Change-Id: Ifca87bd938c74ff18e7fc2680afb15070cc7019f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:33 -08:00
Andrey Ryabinin
3240b4d663 UPSTREAM: mm: mempool: kasan: don't poot mempool objects in quarantine
Currently we may put reserved by mempool elements into quarantine via
kasan_kfree().  This is totally wrong since quarantine may really free
these objects.  So when mempool will try to use such element,
use-after-free will happen.  Or mempool may decide that it no longer
need that element and double-free it.

So don't put object into quarantine in kasan_kfree(), just poison it.
Rename kasan_kfree() to kasan_poison_kfree() to respect that.

Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
kasan_unpoison_element() because those functions may update allocation
stacktrace.  This would be wrong for the most of the remove_element call
sites.

(The only call site where we may want to update alloc stacktrace is
 in mempool_alloc(). Kmemleak solves this by calling
 kmemleak_update_trace(), so we could make something like that too.
 But this is out of scope of this patch).

Fixes: 55834c59098d ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: Kuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 9b75a867cc9ddbafcaf35029358ac500f2635ff3)
Change-Id: Idb6c152dae8f8f2975dbe6acb7165315be8b465b
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:25 -08:00
Shuah Khan
7a85d04507 UPSTREAM: kasan: change memory hot-add error messages to info messages
Change the following memory hot-add error messages to info messages.
There is no need for these to be errors.

   kasan: WARNING: KASAN doesn't support memory hot-add
   kasan: Memory hot-add will be disabled

Link: http://lkml.kernel.org/r/1464794430-5486-1-git-send-email-shuahkh@osg.samsung.com
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 91a4c272145652d798035c17e1c02c91001d3f51)
Change-Id: I6ac2acf71cb04f18d25c3e4cbf7317055d130f74
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:21 -08:00
Andrey Ryabinin
ec4cb91ee5 BACKPORT: mm/kasan: add API to check memory regions
Memory access coded in an assembly won't be seen by KASAN as a compiler
can instrument only C code.  Add kasan_check_[read,write]() API which is
going to be used to check a certain memory range.

Link: http://lkml.kernel.org/r/1462538722-1574-3-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 64f8ebaf115bcddc4aaa902f981c57ba6506bc42)
Change-Id: I3e75c7c22e77d390c55ca1b86ec58a6d6ea1da87
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:16 -08:00
Andrey Ryabinin
d8688d3bac UPSTREAM: mm/kasan: print name of mem[set,cpy,move]() caller in report
When bogus memory access happens in mem[set,cpy,move]() it's usually
caller's fault.  So don't blame mem[set,cpy,move]() in bug report, blame
the caller instead.

Before:
  BUG: KASAN: out-of-bounds access in memset+0x23/0x40 at <address>
After:
  BUG: KASAN: out-of-bounds access in <memset_caller> at <address>

Link: http://lkml.kernel.org/r/1462538722-1574-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 936bb4bbbb832f81055328b84e5afe1fc7246a8d)
Change-Id: I3a480d017054abc387b0bee8ca664a8e62cc57d3
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:12 -08:00
Alexander Potapenko
8009eecae3 UPSTREAM: mm: kasan: initial memory quarantine implementation
Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

When the object is freed, its state changes from KASAN_STATE_ALLOC to
KASAN_STATE_QUARANTINE.  The object is poisoned and put into quarantine
instead of being returned to the allocator, therefore every subsequent
access to that object triggers a KASAN error, and the error handler is
able to say where the object has been allocated and deallocated.

When it's time for the object to leave quarantine, its state becomes
KASAN_STATE_FREE and it's returned to the allocator.  From now on the
allocator may reuse it for another allocation.  Before that happens,
it's still possible to detect a use-after free on that object (it
retains the allocation/deallocation stacks).

When the allocator reuses this object, the shadow is unpoisoned and old
allocation/deallocation stacks are wiped.  Therefore a use of this
object, even an incorrect one, won't trigger ASan warning.

Without the quarantine, it's not guaranteed that the objects aren't
reused immediately, that's why the probability of catching a
use-after-free is lower than with quarantine in place.

Quarantine isolates freed objects in a separate queue.  The objects are
returned to the allocator later, which helps to detect use-after-free
errors.

Freed objects are first added to per-cpu quarantine queues.  When a
cache is destroyed or memory shrinking is requested, the objects are
moved into the global quarantine queue.  Whenever a kmalloc call allows
memory reclaiming, the oldest objects are popped out of the global queue
until the total size of objects in quarantine is less than 3/4 of the
maximum quarantine size (which is a fraction of installed physical
memory).

As long as an object remains in the quarantine, KASAN is able to report
accesses to it, so the chance of reporting a use-after-free is
increased.  Once the object leaves quarantine, the allocator may reuse
it, in which case the object is unpoisoned and KASAN can't detect
incorrect accesses to it.

Right now quarantine support is only enabled in SLAB allocator.
Unification of KASAN features in SLAB and SLUB will be done later.

This patch is based on the "mm: kasan: quarantine" patch originally
prepared by Dmitry Chernenkov.  A number of improvements have been
suggested by Andrey Ryabinin.

[glider@google.com: v9]
  Link: http://lkml.kernel.org/r/1462987130-144092-1-git-send-email-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 55834c59098d0c5a97b0f3247e55832b67facdcf)
Change-Id: Ib808d72a40f2e5137961d93dad540e85f8bbd2c4
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:20:07 -08:00
Alexander Potapenko
e9741edda6 UPSTREAM: mm, kasan: fix compilation for CONFIG_SLAB
Add the missing argument to set_track().

Fixes: cd11016e5f52 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB")
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Konstantin Serebryany <kcc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 0b355eaaaae9bb8bb08b563ef55ecb23a4d743da)
Change-Id: I9bed7d2bef5bd7c6e6d774c347ef63678a7d76be
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:19:52 -08:00
Alexander Potapenko
7336ab7b7c BACKPORT: mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
will allow KASAN store allocation/deallocation stack traces for memory
chunks.  The stack traces are stored in a hash table and referenced by
handles which reside in the kasan_alloc_meta and kasan_free_meta
structures in the allocated memory chunks.

IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
duplication.

Right now stackdepot support is only enabled in SLAB allocator.  Once
KASAN features in SLAB are on par with those in SLUB we can switch SLUB
to stackdepot as well, thus removing the dependency on SLUB stack
bookkeeping, which wastes a lot of memory.

This patch is based on the "mm: kasan: stack depots" patch originally
prepared by Dmitry Chernenkov.

Joonsoo has said that he plans to reuse the stackdepot code for the
mm/page_owner.c debugging facility.

[akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
[aryabinin@virtuozzo.com: comment style fixes]
Signed-off-by: Alexander Potapenko <glider@google.com>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from cd11016e5f5212c13c0cec7384a525edc93b4921)
Change-Id: Ic804318410823b95d84e264a6334e018f21ef943
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:19:45 -08:00
Alexander Potapenko
a9683c505b BACKPORT: mm, kasan: add GFP flags to KASAN API
Add GFP flags to KASAN hooks for future patches to use.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 505f5dcb1c419e55a9621a01f83eb5745d8d7398)
Change-Id: I7c5539f59e6969e484a6ff4f104dce2390669cfd
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:19:40 -08:00
Alexander Potapenko
5451a4a87d UPSTREAM: mm, kasan: SLAB support
Add KASAN hooks to SLAB allocator.

This patch is based on the "mm: kasan: unified support for SLUB and SLAB
allocators" patch originally prepared by Dmitry Chernenkov.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrey Konovalov <adech.fo@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 7ed2f9e663854db313f177a511145630e398b402)
Change-Id: I131fdafc1c27a25732475f5bbd1653b66954e1b7
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:19:10 -08:00
Joonsoo Kim
09c23a8024 UPSTREAM: mm/slab: align cache size first before determination of OFF_SLAB candidate
Finding suitable OFF_SLAB candidate is more related to aligned cache
size rather than original size.  Same reasoning can be applied to the
debug pagealloc candidate.  So, this patch moves up alignment fixup to
proper position.  From that point, size is aligned so we can remove some
alignment fixups.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 832a15d209cd260180407bde1af18965b21623f3)
Change-Id: I8338d647da4a6eb6402c6fe4e2402b7db45ea5a5
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:46 -08:00
Joonsoo Kim
118cb473e1 UPSTREAM: mm/slab: use more appropriate condition check for debug_pagealloc
debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable
SLAB_POISON.  Fix it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 40323278b557a5909bbecfa181c91a3af7afbbe3)
Change-Id: I4a7dbd40b6a2eb777439f271fa600b201e6db1d3
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:42 -08:00
Joonsoo Kim
511f370eed UPSTREAM: mm/slab: factor out debugging initialization in cache_init_objs()
cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change.  So, before doing it, this patch
separates debugging initialization.  This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than debug
feature itself so effect may not be visible.  This patch will greatly
simplify changes in cache_init_objs() in following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 10b2e9e8e808bd30e1f4018a36366d07b0abd12f)
Change-Id: I9904974a674f17fc7d57daf0fe351742db67c006
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:37 -08:00
Joonsoo Kim
106caadcea UPSTREAM: mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
Now, we don't use object status buffer in any setup. Remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 249247b6f8ee362189a2f2bf598a14ff6c95fb4c)
Change-Id: I60b1fb030da5d44aff2627f57dffd9dd7436e511
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:32 -08:00
Joonsoo Kim
63fd822604 UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK
DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
status buffer so we need more memory to use it.  And, it cause
kmem_cache initialization step more complex.

To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.

When user requests to get slab object owner information, it marks that
getting information is started.  And then, all free objects in caches
are flushed to corresponding slab page.  Now, we can distinguish all
freed object so we can know all allocated objects, too.  After
collecting slab object owner information on allocated objects, mark is
checked that there is no free during the processing.  If true, we can be
sure that our information is correct so information is returned to user.

Although this way is rather complex, it has two important benefits
mentioned above.  So, I think it is worth changing.

There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.

To help review, this patch implements new way only.  Following patch
will remove useless code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from d31676dfde257cb2b3e52d4e657d8ad2251e4d49)
Change-Id: I204ea0dd5553577d17c93f32f0d5a797ba0304af
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:25 -08:00
Joonsoo Kim
5b5a89ff48 UPSTREAM: mm/slab: clean up DEBUG_PAGEALLOC processing code
Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
some sites.  It makes code unreadable and hard to change.

This patch cleans up this code.  The following patch will change the
criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.

[akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 40b44137971c2e5865a78f9f7de274449983ccb5)
Change-Id: I784df4a54b62f77a22f8fa70990387fdf968219f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:12 -08:00
Joonsoo Kim
44d2bfbf14 UPSTREAM: mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from a307ebd468e0b97c203f5a99a56a6017e4d1991a)
Change-Id: I4407dbf0a5a22ab6f9335219bafc6f28023635a0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:17:03 -08:00
Mark Rutland
36205b7fa9 UPSTREAM: kasan: add functions to clear stack poison
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from e3ae116339f9a0c77523abc95e338fa405946e07)
Change-Id: I9be31b714d5bdaec94a2dad3f0e468c094fe5fa2
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-11 13:06:35 -08:00
Greg Kroah-Hartman
8bc4213be4 This is the 4.4.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlomc30ACgkQONu9yGCS
 aT5Hkw/9GJCZ3LyqVfd9mkB97/9OTFduQUjLsqFdgQIRZdByHe6fR8cfJ1ydeiJS
 fcnBtZzJNMBXurEWourqrSWptDkFANoG0EGResza+3aXXC2SuwmuXG0JgR4KdnpE
 1pkqnKJ42rnK1XWuHIp2S0A6S+SW8fA5HrVFpQDLiXUb4NB9xerG2PS6tq94rQN0
 TB2IdJzb5ITWQkCzYvuTphl4jz2XnMJskd53c9osG2izBxVXzA650ygjtW6HrqG6
 AWvc0BK7p/Rz+ty3PbGbSNUJyDicnknS2/UARMz/BlI5SiyL8aAHVC7h3k9dh2CU
 uPVhfebFCSBcZ+iYl4UUuywjiJwDTMk/QwNZFph5Hw5HntB2LAHSG3rrv/8OIoPe
 +bvMRonfWB+rtzDiy0UbCCutswJzf6h9Pj5y/PA+GIFnNmeDQ096EOI7BFGO1MVe
 s/d/krw4xqhDRb97ZlWWX8UqRplmPSe4pWP47Gr0K4qNAPhp4B2tW6VJcMG3z2oo
 0/iV9cZ2CVxpIA2oN7ymWCKSQ764VADTcNSqC9Wd+rDso2XcBPRYUo9pbNKLqryb
 hxQR1Hj0do35qkR1L/e6thdygMV2ZyQ4xBq9fVPjDrwi2d9dO1jfnkQkL+QEa73G
 AaufuvFdJefssFP91FwGZnhhMi8kWIuJSTFM3Pg9OwNEjgt3nxo=
 =eTmV
 -----END PGP SIGNATURE-----

Merge 4.4.104 into android-4.4

Changes in 4.4.104
	netlink: add a start callback for starting a netlink dump
	ipsec: Fix aborted xfrm policy dump crash
	x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
	x86/efi: Hoist page table switching code into efi_call_virt()
	x86/efi: Build our own page table structures
	ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
	x86/efi-bgrt: Fix kernel panic when mapping BGRT data
	x86/efi-bgrt: Replace early_memremap() with memremap()
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	btrfs: clear space cache inode generation always
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	mmc: core: Do not leave the block driver in a suspended state
	eeprom: at24: check at24_read/write arguments
	bcache: Fix building error on MIPS
	Revert "drm/radeon: dont switch vt on suspend"
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	mtd: nand: Fix writing mtdoops to nand flash.
	NFS: revalidate "." etc correctly on "open".
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	nfsd: Make init_open_stateid() a bit more whole
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	Linux 4.4.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-05 11:31:58 +01:00
chenjie
0d05a5593f mm/madvise.c: fix madvise() infinite loop under special circumstances
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.

MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation.  The calling
convention is quite subtle there.  madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.

It seems this has been broken since introduction.  Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.

[mhocko@suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: zhangyi (F) <yi.zhang@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:50 +01:00
Kirill A. Shutemov
2b7ef6bdd2 mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.

Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Salvatore Bonaccorso: backport for 3.16:
 - Adjust context
 - Drop specific part for PUD-sized transparent hugepages. Support
   for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:50 +01:00
Greg Kroah-Hartman
f6ef57faf9 This is the 4.4.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloX89cACgkQONu9yGCS
 aT6SuBAAwPQ8Yh0mfaxFvcgqpvnz+qQD2GG8wMLjI8RsfS+JqaDp8IaCQcKOAJYu
 3/i0spo6LMBu8IGRsq+G69MwXFy2Vt8byBkQ2H6ZvaWxycDWOjd5nSyFmdMOG1F1
 5WLYrMgB1bekjDLu5LlL/eE3znFF+Iqp5SBiRGlL0/RtFbzwlYi6tJNim4cOmKum
 pEgNmgHr6WFkI5Yf9A49rhXOOtYURVYekxv/5x7alcSEQ5hmIplOdAVjIMmpC9vk
 QzmCqp+ST/HjH9wn5Mod88vvkp/Jk4bnl17MyYpdzl6Uct0LGMyGC4aMkKHVU8sj
 TyFf/8N//okjq5f5rEOuYjMGiR9wepCXQjW93arRClz/G2qnlK2xKwvrNmmlx7YN
 HnEBKEEjV9ObHquItdX3Bn1ybbYhdp+Xe59qMtCFnYtni02jjZvMt4/cUpgnC5Qh
 RnLTTqhyDyZdSLx7guHnoKgvKPYfApQ9Ix9epKABcvYmODWDzLfKdmwbNbqIP5Ru
 g3YmB57/RIVKPhciV7xiCVKAgw0PdJGXRbFVqFSq6Kyw+NEQV8IzaB+ynkeMpKWE
 LZx5Md3HGzyKJ/LnQU3Ec4B7l1efCzK0jI9Xq9jROCcbQpb7MJN+F7OO1Rea121o
 5w7vGN6RNn1rKX4fkcIkIUCAtYARZvdgM+W+FR5whR6xV+Xh80U=
 =VPJP
 -----END PGP SIGNATURE-----

Merge 4.4.102 into android-4.4

Changes in 4.4.102
	mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
	Linux 4.4.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-24 11:38:32 +01:00
Michal Hocko
0208fabf72 mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
Backport of the upstream commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites") is wrong for hwpoison
pages. I have accidentally negated the condition for bailout. This
basically disables hwpoison pages tracking while the code still
might crash on unusual configurations when struct pages do not have
page_ext allocated. The fix is trivial to invert the condition.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 11:26:29 +01:00
Greg Kroah-Hartman
f0b9d2d0ac This is the 4.4.101 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloXywoACgkQONu9yGCS
 aT78dxAAoM0uHsL9r+ivJ4uMj81dwBL3Rd/1Lb/PMV5Yblh/LJ2WOcXriq/JgMLt
 +ARoyugEpB8JzAi1Y3bq3Jku2TYcT0o55UmjRgZzQitdX5o8j1g1baNnpRuMz63z
 S/g4Msh5aJyoHmwgxWZ+mWKn3SYdNwHy+r0gGwgtvlUO97iXqwM3nqQ/4tHnIv1B
 sz0NtJ7cgFvWVaneUkZ4z0ZGTlKfxaQg95enyyCRWM7MJ6Be03+KnhmQZ6GEb8vP
 tf9GtXiMEDJdwppDmXjtdjFW5adejBOoCF/grvbQoEdn7XPC47k6/l5Y6A3PYLMj
 kqlC8IbMHbiQXvgwezxp6Mv+oc+LuSjSCVikZW2SGMacs5kF92+0MIUvBtfUwvsA
 FP7q6jUcT3Or4xiG4xLDQW+RLPetidd+1Ms4jia6jaCajbMjU7ZYaBuAplT4qhIl
 koJ9pn1ksna3fUyxnNFJttUN2ulGDzcSBP5EZf3bLWMXkG4daa8Cen7vBkG1VqZE
 tspXCbB/mZ/eGv/rH3b7F2BVfP2RY0YqlUZzmfTXIoCwqcmX1zGi/KMfepcZTH3b
 LOo8CBmTgSYXYh0/16GAUH3ds3QQt8d0oeaCEtf8BaAZnq5R3M8doZzGzTB6LGjG
 Rn1KsUzJPKSqgYis3FTJNU3wmPokvV1ZVXK/ee9zMq5zOtyJyOg=
 =XIwd
 -----END PGP SIGNATURE-----

Merge 4.4.101 into android-4.4

Changes in 4.4.101
	tcp: do not mangle skb->cb[] in tcp_make_synack()
	netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
	bonding: discard lowest hash bit for 802.3ad layer3+4
	vlan: fix a use-after-free in vlan_device_event()
	af_netlink: ensure that NLMSG_DONE never fails in dumps
	sctp: do not peel off an assoc from one netns to another one
	fealnx: Fix building error on MIPS
	net/sctp: Always set scope_id in sctp_inet6_skb_msgname
	ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
	serial: omap: Fix EFR write on RTS deassertion
	arm64: fix dump_instr when PAN and UAO are in use
	nvme: Fix memory order on async queue deletion
	ocfs2: should wait dio before inode lock in ocfs2_setattr()
	ipmi: fix unsigned long underflow
	mm/page_alloc.c: broken deferred calculation
	coda: fix 'kernel memory exposure attempt' in fsync
	mm: check the return value of lookup_page_ext for all call sites
	mm/page_ext.c: check if page_ext is not prepared
	mm/pagewalk.c: report holes in hugetlb ranges
	Linux 4.4.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-24 08:56:48 +01:00
Jann Horn
a3805b10de mm/pagewalk.c: report holes in hugetlb ranges
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.

This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace.  It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.

Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.

This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.

This issue was found using an AFL-based fuzzer.

v2:
 - don't crash on ->pte_hole==NULL (Andrew Morton)
 - add Cc stable (Andrew Morton)

Changed for 4.4/4.9 stable backport:
 - fix up conflict in the huge_pte_offset() call

Fixes: 1e25a271c8 ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Jaewon Kim
3630b28019 mm/page_ext.c: check if page_ext is not prepared
commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.

online_page_ext() and page_ext_init() allocate page_ext for each
section, but they do not allocate if the first PFN is !pfn_present(pfn)
or !pfn_valid(pfn).  Then section->page_ext remains as NULL.
lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled.  For a
valid PFN, __set_page_owner will try to get page_ext through
lookup_page_ext.  Without CONFIG_DEBUG_VM lookup_page_ext will misuse
NULL pointer as value 0.  This incurrs invalid address access.

This is the panic example when PFN 0x100000 is not valid but PFN
0x13FC00 is being used for page_ext.  section->page_ext is NULL,
get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
0x13FC00.

To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.

  Unable to handle kernel paging request at virtual address 01dfa014
  ------------[ cut here ]------------
  Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
  Internal error: Oops: 96000045 [#1] PREEMPT SMP
  Modules linked in:
  PC is at __set_page_owner+0x48/0x78
  LR is at __set_page_owner+0x44/0x78
    __set_page_owner+0x48/0x78
    get_page_from_freelist+0x880/0x8e8
    __alloc_pages_nodemask+0x14c/0xc48
    __do_page_cache_readahead+0xdc/0x264
    filemap_fault+0x2ac/0x550
    ext4_filemap_fault+0x3c/0x58
    __do_fault+0x80/0x120
    handle_mm_fault+0x704/0xbb0
    do_page_fault+0x2e8/0x394
    do_mem_abort+0x88/0x124

Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites").

Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b70 ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Yang Shi
e34e744f70 mm: check the return value of lookup_page_ext for all call sites
commit f86e4271978bd93db466d6a95dad4b0fdcdb04f6 upstream.

Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e.  memory hotplug.

Tested with ltp with "page_owner=0".

[1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

[akpm@linux-foundation.org: fix build-breaking typos]
[arnd@arndb.de: fix build problems from lookup_page_ext]
  Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Pavel Tatashin
c1b3703b64 mm/page_alloc.c: broken deferred calculation
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.

In reset_deferred_meminit() we determine number of pages that must not
be deferred.  We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.

The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes.  However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.

The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.

Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Greg Kroah-Hartman
89074de67a This is the 4.4.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
 aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
 eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
 hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
 TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
 K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
 UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
 /mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
 sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
 v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
 f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
 /o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
 =gevA
 -----END PGP SIGNATURE-----

Merge 4.4.94 into android-4.4

Changes in 4.4.94
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	drm/dp/mst: save vcpi with payloads
	MIPS: Fix minimum alignment requirement of IRQ stack
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	bpf/verifier: reject BPF_ALU64|BPF_END
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	tipc: use only positive error codes in messages
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	i2c: at91: ensure state is restored after suspending
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	nfsd/callback: Cleanup callback cred on shutdown
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	Revert "tty: goldfish: Fix a parameter of a call to free_irq"
	Linux 4.4.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-22 08:09:11 +02:00
Grygorii Maistrenko
9ac38e30f2 slub: do not merge cache if slub_debug contains a never-merge flag
[ Upstream commit c6e28895a4372992961888ffaadc9efc643b5bfe ]

In case CONFIG_SLUB_DEBUG_ON=n, find_mergeable() gets debug features from
commandline but never checks if there are features from the
SLAB_NEVER_MERGE set.

As a result selected by slub_debug caches are always mergeable if they
have been created without a custom constructor set or without one of the
SLAB_* debug features on.

This moves the SLAB_NEVER_MERGE check below the flags update from
commandline to make sure it won't merge the slab cache if one of the debug
features is on.

Link: http://lkml.kernel.org/r/20170101124451.GA4740@lp-laptop-d
Signed-off-by: Grygorii Maistrenko <grygoriimkd@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:05 +02:00
Jaegeuk Kim
13f002354d f2fs: catch up to v4.14-rc1
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.

Changes include:

commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")

Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2017-10-03 17:19:14 -07:00