Commit graph

9938 commits

Author SHA1 Message Date
Joonsoo Kim
09c23a8024 UPSTREAM: mm/slab: align cache size first before determination of OFF_SLAB candidate
Finding suitable OFF_SLAB candidate is more related to aligned cache
size rather than original size.  Same reasoning can be applied to the
debug pagealloc candidate.  So, this patch moves up alignment fixup to
proper position.  From that point, size is aligned so we can remove some
alignment fixups.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 832a15d209cd260180407bde1af18965b21623f3)
Change-Id: I8338d647da4a6eb6402c6fe4e2402b7db45ea5a5
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:46 -08:00
Joonsoo Kim
118cb473e1 UPSTREAM: mm/slab: use more appropriate condition check for debug_pagealloc
debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable
SLAB_POISON.  Fix it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 40323278b557a5909bbecfa181c91a3af7afbbe3)
Change-Id: I4a7dbd40b6a2eb777439f271fa600b201e6db1d3
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:42 -08:00
Joonsoo Kim
511f370eed UPSTREAM: mm/slab: factor out debugging initialization in cache_init_objs()
cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change.  So, before doing it, this patch
separates debugging initialization.  This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than debug
feature itself so effect may not be visible.  This patch will greatly
simplify changes in cache_init_objs() in following patch.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 10b2e9e8e808bd30e1f4018a36366d07b0abd12f)
Change-Id: I9904974a674f17fc7d57daf0fe351742db67c006
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:37 -08:00
Joonsoo Kim
106caadcea UPSTREAM: mm/slab: remove object status buffer for DEBUG_SLAB_LEAK
Now, we don't use object status buffer in any setup. Remove it.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 249247b6f8ee362189a2f2bf598a14ff6c95fb4c)
Change-Id: I60b1fb030da5d44aff2627f57dffd9dd7436e511
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:32 -08:00
Joonsoo Kim
63fd822604 UPSTREAM: mm/slab: alternative implementation for DEBUG_SLAB_LEAK
DEBUG_SLAB_LEAK is a debug option.  It's current implementation requires
status buffer so we need more memory to use it.  And, it cause
kmem_cache initialization step more complex.

To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.

When user requests to get slab object owner information, it marks that
getting information is started.  And then, all free objects in caches
are flushed to corresponding slab page.  Now, we can distinguish all
freed object so we can know all allocated objects, too.  After
collecting slab object owner information on allocated objects, mark is
checked that there is no free during the processing.  If true, we can be
sure that our information is correct so information is returned to user.

Although this way is rather complex, it has two important benefits
mentioned above.  So, I think it is worth changing.

There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.

To help review, this patch implements new way only.  Following patch
will remove useless code.

Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from d31676dfde257cb2b3e52d4e657d8ad2251e4d49)
Change-Id: I204ea0dd5553577d17c93f32f0d5a797ba0304af
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:25 -08:00
Joonsoo Kim
5b5a89ff48 UPSTREAM: mm/slab: clean up DEBUG_PAGEALLOC processing code
Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
some sites.  It makes code unreadable and hard to change.

This patch cleans up this code.  The following patch will change the
criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.

[akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 40b44137971c2e5865a78f9f7de274449983ccb5)
Change-Id: I784df4a54b62f77a22f8fa70990387fdf968219f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:18:12 -08:00
Joonsoo Kim
44d2bfbf14 UPSTREAM: mm/slab: activate debug_pagealloc in SLAB when it is actually enabled
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from a307ebd468e0b97c203f5a99a56a6017e4d1991a)
Change-Id: I4407dbf0a5a22ab6f9335219bafc6f28023635a0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-14 08:17:03 -08:00
Mark Rutland
36205b7fa9 UPSTREAM: kasan: add functions to clear stack poison
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.

In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code.  If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.

Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.

Instead, this series explicitly removes any stale poison before it can
be hit.  In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.

On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents.  To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared.  Architectures which don't perform a cold return as part of
idle do not need any additional code.

This patch (of 3):

Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.

In some cases (e.g.  hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code.  If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.

If a CPU returns to the kernel via a different path (e.g.  a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.

To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called.  This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack.  These
will be used by subsequent patches to avoid problems with hotplug and
idle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from e3ae116339f9a0c77523abc95e338fa405946e07)
Change-Id: I9be31b714d5bdaec94a2dad3f0e468c094fe5fa2
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-11 13:06:35 -08:00
Greg Kroah-Hartman
8bc4213be4 This is the 4.4.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlomc30ACgkQONu9yGCS
 aT5Hkw/9GJCZ3LyqVfd9mkB97/9OTFduQUjLsqFdgQIRZdByHe6fR8cfJ1ydeiJS
 fcnBtZzJNMBXurEWourqrSWptDkFANoG0EGResza+3aXXC2SuwmuXG0JgR4KdnpE
 1pkqnKJ42rnK1XWuHIp2S0A6S+SW8fA5HrVFpQDLiXUb4NB9xerG2PS6tq94rQN0
 TB2IdJzb5ITWQkCzYvuTphl4jz2XnMJskd53c9osG2izBxVXzA650ygjtW6HrqG6
 AWvc0BK7p/Rz+ty3PbGbSNUJyDicnknS2/UARMz/BlI5SiyL8aAHVC7h3k9dh2CU
 uPVhfebFCSBcZ+iYl4UUuywjiJwDTMk/QwNZFph5Hw5HntB2LAHSG3rrv/8OIoPe
 +bvMRonfWB+rtzDiy0UbCCutswJzf6h9Pj5y/PA+GIFnNmeDQ096EOI7BFGO1MVe
 s/d/krw4xqhDRb97ZlWWX8UqRplmPSe4pWP47Gr0K4qNAPhp4B2tW6VJcMG3z2oo
 0/iV9cZ2CVxpIA2oN7ymWCKSQ764VADTcNSqC9Wd+rDso2XcBPRYUo9pbNKLqryb
 hxQR1Hj0do35qkR1L/e6thdygMV2ZyQ4xBq9fVPjDrwi2d9dO1jfnkQkL+QEa73G
 AaufuvFdJefssFP91FwGZnhhMi8kWIuJSTFM3Pg9OwNEjgt3nxo=
 =eTmV
 -----END PGP SIGNATURE-----

Merge 4.4.104 into android-4.4

Changes in 4.4.104
	netlink: add a start callback for starting a netlink dump
	ipsec: Fix aborted xfrm policy dump crash
	x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
	x86/efi: Hoist page table switching code into efi_call_virt()
	x86/efi: Build our own page table structures
	ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
	x86/efi-bgrt: Fix kernel panic when mapping BGRT data
	x86/efi-bgrt: Replace early_memremap() with memremap()
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	btrfs: clear space cache inode generation always
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	mmc: core: Do not leave the block driver in a suspended state
	eeprom: at24: check at24_read/write arguments
	bcache: Fix building error on MIPS
	Revert "drm/radeon: dont switch vt on suspend"
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	mtd: nand: Fix writing mtdoops to nand flash.
	NFS: revalidate "." etc correctly on "open".
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	nfsd: Make init_open_stateid() a bit more whole
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	Linux 4.4.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-05 11:31:58 +01:00
chenjie
0d05a5593f mm/madvise.c: fix madvise() infinite loop under special circumstances
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.

MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation.  The calling
convention is quite subtle there.  madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.

It seems this has been broken since introduction.  Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.

[mhocko@suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: zhangyi (F) <yi.zhang@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:50 +01:00
Kirill A. Shutemov
2b7ef6bdd2 mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.

Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().

We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.

The patch also changes touch_pud() in the same way.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Salvatore Bonaccorso: backport for 3.16:
 - Adjust context
 - Drop specific part for PUD-sized transparent hugepages. Support
   for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:50 +01:00
Greg Kroah-Hartman
f6ef57faf9 This is the 4.4.102 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloX89cACgkQONu9yGCS
 aT6SuBAAwPQ8Yh0mfaxFvcgqpvnz+qQD2GG8wMLjI8RsfS+JqaDp8IaCQcKOAJYu
 3/i0spo6LMBu8IGRsq+G69MwXFy2Vt8byBkQ2H6ZvaWxycDWOjd5nSyFmdMOG1F1
 5WLYrMgB1bekjDLu5LlL/eE3znFF+Iqp5SBiRGlL0/RtFbzwlYi6tJNim4cOmKum
 pEgNmgHr6WFkI5Yf9A49rhXOOtYURVYekxv/5x7alcSEQ5hmIplOdAVjIMmpC9vk
 QzmCqp+ST/HjH9wn5Mod88vvkp/Jk4bnl17MyYpdzl6Uct0LGMyGC4aMkKHVU8sj
 TyFf/8N//okjq5f5rEOuYjMGiR9wepCXQjW93arRClz/G2qnlK2xKwvrNmmlx7YN
 HnEBKEEjV9ObHquItdX3Bn1ybbYhdp+Xe59qMtCFnYtni02jjZvMt4/cUpgnC5Qh
 RnLTTqhyDyZdSLx7guHnoKgvKPYfApQ9Ix9epKABcvYmODWDzLfKdmwbNbqIP5Ru
 g3YmB57/RIVKPhciV7xiCVKAgw0PdJGXRbFVqFSq6Kyw+NEQV8IzaB+ynkeMpKWE
 LZx5Md3HGzyKJ/LnQU3Ec4B7l1efCzK0jI9Xq9jROCcbQpb7MJN+F7OO1Rea121o
 5w7vGN6RNn1rKX4fkcIkIUCAtYARZvdgM+W+FR5whR6xV+Xh80U=
 =VPJP
 -----END PGP SIGNATURE-----

Merge 4.4.102 into android-4.4

Changes in 4.4.102
	mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
	Linux 4.4.102

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-24 11:38:32 +01:00
Michal Hocko
0208fabf72 mm, hwpoison: fixup "mm: check the return value of lookup_page_ext for all call sites"
Backport of the upstream commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites") is wrong for hwpoison
pages. I have accidentally negated the condition for bailout. This
basically disables hwpoison pages tracking while the code still
might crash on unusual configurations when struct pages do not have
page_ext allocated. The fix is trivial to invert the condition.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 11:26:29 +01:00
Greg Kroah-Hartman
f0b9d2d0ac This is the 4.4.101 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloXywoACgkQONu9yGCS
 aT78dxAAoM0uHsL9r+ivJ4uMj81dwBL3Rd/1Lb/PMV5Yblh/LJ2WOcXriq/JgMLt
 +ARoyugEpB8JzAi1Y3bq3Jku2TYcT0o55UmjRgZzQitdX5o8j1g1baNnpRuMz63z
 S/g4Msh5aJyoHmwgxWZ+mWKn3SYdNwHy+r0gGwgtvlUO97iXqwM3nqQ/4tHnIv1B
 sz0NtJ7cgFvWVaneUkZ4z0ZGTlKfxaQg95enyyCRWM7MJ6Be03+KnhmQZ6GEb8vP
 tf9GtXiMEDJdwppDmXjtdjFW5adejBOoCF/grvbQoEdn7XPC47k6/l5Y6A3PYLMj
 kqlC8IbMHbiQXvgwezxp6Mv+oc+LuSjSCVikZW2SGMacs5kF92+0MIUvBtfUwvsA
 FP7q6jUcT3Or4xiG4xLDQW+RLPetidd+1Ms4jia6jaCajbMjU7ZYaBuAplT4qhIl
 koJ9pn1ksna3fUyxnNFJttUN2ulGDzcSBP5EZf3bLWMXkG4daa8Cen7vBkG1VqZE
 tspXCbB/mZ/eGv/rH3b7F2BVfP2RY0YqlUZzmfTXIoCwqcmX1zGi/KMfepcZTH3b
 LOo8CBmTgSYXYh0/16GAUH3ds3QQt8d0oeaCEtf8BaAZnq5R3M8doZzGzTB6LGjG
 Rn1KsUzJPKSqgYis3FTJNU3wmPokvV1ZVXK/ee9zMq5zOtyJyOg=
 =XIwd
 -----END PGP SIGNATURE-----

Merge 4.4.101 into android-4.4

Changes in 4.4.101
	tcp: do not mangle skb->cb[] in tcp_make_synack()
	netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
	bonding: discard lowest hash bit for 802.3ad layer3+4
	vlan: fix a use-after-free in vlan_device_event()
	af_netlink: ensure that NLMSG_DONE never fails in dumps
	sctp: do not peel off an assoc from one netns to another one
	fealnx: Fix building error on MIPS
	net/sctp: Always set scope_id in sctp_inet6_skb_msgname
	ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
	serial: omap: Fix EFR write on RTS deassertion
	arm64: fix dump_instr when PAN and UAO are in use
	nvme: Fix memory order on async queue deletion
	ocfs2: should wait dio before inode lock in ocfs2_setattr()
	ipmi: fix unsigned long underflow
	mm/page_alloc.c: broken deferred calculation
	coda: fix 'kernel memory exposure attempt' in fsync
	mm: check the return value of lookup_page_ext for all call sites
	mm/page_ext.c: check if page_ext is not prepared
	mm/pagewalk.c: report holes in hugetlb ranges
	Linux 4.4.101

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-11-24 08:56:48 +01:00
Jann Horn
a3805b10de mm/pagewalk.c: report holes in hugetlb ranges
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.

This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace.  It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.

Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.

This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.

This issue was found using an AFL-based fuzzer.

v2:
 - don't crash on ->pte_hole==NULL (Andrew Morton)
 - add Cc stable (Andrew Morton)

Changed for 4.4/4.9 stable backport:
 - fix up conflict in the huge_pte_offset() call

Fixes: 1e25a271c8 ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Jaewon Kim
3630b28019 mm/page_ext.c: check if page_ext is not prepared
commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.

online_page_ext() and page_ext_init() allocate page_ext for each
section, but they do not allocate if the first PFN is !pfn_present(pfn)
or !pfn_valid(pfn).  Then section->page_ext remains as NULL.
lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled.  For a
valid PFN, __set_page_owner will try to get page_ext through
lookup_page_ext.  Without CONFIG_DEBUG_VM lookup_page_ext will misuse
NULL pointer as value 0.  This incurrs invalid address access.

This is the panic example when PFN 0x100000 is not valid but PFN
0x13FC00 is being used for page_ext.  section->page_ext is NULL,
get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
0x13FC00.

To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.

  Unable to handle kernel paging request at virtual address 01dfa014
  ------------[ cut here ]------------
  Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
  Internal error: Oops: 96000045 [#1] PREEMPT SMP
  Modules linked in:
  PC is at __set_page_owner+0x48/0x78
  LR is at __set_page_owner+0x44/0x78
    __set_page_owner+0x48/0x78
    get_page_from_freelist+0x880/0x8e8
    __alloc_pages_nodemask+0x14c/0xc48
    __do_page_cache_readahead+0xdc/0x264
    filemap_fault+0x2ac/0x550
    ext4_filemap_fault+0x3c/0x58
    __do_fault+0x80/0x120
    handle_mm_fault+0x704/0xbb0
    do_page_fault+0x2e8/0x394
    do_mem_abort+0x88/0x124

Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites").

Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b70 ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Yang Shi
e34e744f70 mm: check the return value of lookup_page_ext for all call sites
commit f86e4271978bd93db466d6a95dad4b0fdcdb04f6 upstream.

Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e.  memory hotplug.

Tested with ltp with "page_owner=0".

[1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE

[akpm@linux-foundation.org: fix build-breaking typos]
[arnd@arndb.de: fix build problems from lookup_page_ext]
  Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Pavel Tatashin
c1b3703b64 mm/page_alloc.c: broken deferred calculation
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.

In reset_deferred_meminit() we determine number of pages that must not
be deferred.  We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.

The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes.  However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.

The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.

Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:25 +01:00
Greg Kroah-Hartman
89074de67a This is the 4.4.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
 aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
 eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
 hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
 TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
 K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
 UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
 /mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
 sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
 v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
 f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
 /o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
 =gevA
 -----END PGP SIGNATURE-----

Merge 4.4.94 into android-4.4

Changes in 4.4.94
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	drm/dp/mst: save vcpi with payloads
	MIPS: Fix minimum alignment requirement of IRQ stack
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	bpf/verifier: reject BPF_ALU64|BPF_END
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	tipc: use only positive error codes in messages
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	i2c: at91: ensure state is restored after suspending
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	nfsd/callback: Cleanup callback cred on shutdown
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	Revert "tty: goldfish: Fix a parameter of a call to free_irq"
	Linux 4.4.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-22 08:09:11 +02:00
Grygorii Maistrenko
9ac38e30f2 slub: do not merge cache if slub_debug contains a never-merge flag
[ Upstream commit c6e28895a4372992961888ffaadc9efc643b5bfe ]

In case CONFIG_SLUB_DEBUG_ON=n, find_mergeable() gets debug features from
commandline but never checks if there are features from the
SLAB_NEVER_MERGE set.

As a result selected by slub_debug caches are always mergeable if they
have been created without a custom constructor set or without one of the
SLAB_* debug features on.

This moves the SLAB_NEVER_MERGE check below the flags update from
commandline to make sure it won't merge the slab cache if one of the debug
features is on.

Link: http://lkml.kernel.org/r/20170101124451.GA4740@lp-laptop-d
Signed-off-by: Grygorii Maistrenko <grygoriimkd@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:05 +02:00
Jaegeuk Kim
13f002354d f2fs: catch up to v4.14-rc1
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.

Changes include:

commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")

Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2017-10-03 17:19:14 -07:00
Greg Kroah-Hartman
d68ba9f116 This is the 4.4.89 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnLaLoACgkQONu9yGCS
 aT7hDw/+Ipx/xnjIUJFV/aqo8lTh3XqP/TjD5whoi+yYC8axLEZBLiOSLZceVjsG
 hi2mP22gKn1i7GLXNeWIZ+rMtVzAN+qNg7i8cjWNfFp1fA7cCfFaYvlV0LVrO2tK
 WnvvE8r5kQAKyQG8498ebEjianxwxHVERnNiE5/SDpCNj14DnwCJBTEYM0tEnuXZ
 /jBIIs4xvndVa0fFfUjuAzh65AefAT1BmgsPll4GnFMUFHh30smYdFla5LL0GNIq
 FQGFvIi8Q02disSMg9lFJVOlazc/HUREiFB1qy1DRtGMnS6/Q0HW0kCxeRi/7QEi
 +HN2rLxtbpnuD5P7W4lDJ5/cyCHMIv8SJ8OqUd8uxbTWz31P/QxbM7d35d+w3rq8
 dv3sQ6CMRnuIXGL5dFHh7zYqlzNS9PKjLmxzAw9grDf+nVsDxE4KUfJy00DSN1I1
 Bopi1kCD2nUMOiBrmxkIczN6OOvcGBHh6/TTB2WEKVHn42D0fjLnO66kJVJLMsBm
 vDdKJDDSGM/0HiUa5ydr6R0Ae7My3h5AJZRa5gn0kL/myatX/vsa0B2ZLpHlVipM
 GhODBsDFkI4k4yceONDZPJmhhVab1lewTMuIW5D2KRMsgpQqLmlOyL5gykfH0rTx
 FVnLSoMAHsgm6qVPwRS5BqK/UnXogfqjiB0iXzNNZnkiABWWoUQ=
 =Skkr
 -----END PGP SIGNATURE-----

Merge 4.4.89 into android-4.4

Changes in 4.4.89
	ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
	ipv6: add rcu grace period before freeing fib6_node
	ipv6: fix sparse warning on rt6i_node
	qlge: avoid memcpy buffer overflow
	Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
	tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
	Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
	Revert "net: fix percpu memory leaks"
	gianfar: Fix Tx flow control deactivation
	ipv6: fix memory leak with multiple tables during netns destruction
	ipv6: fix typo in fib6_net_exit()
	f2fs: check hot_data for roll-forward recovery
	x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
	md/raid5: release/flush io in raid5_do_work()
	nfsd: Fix general protection fault in release_lock_stateid()
	mm: prevent double decrease of nr_reserved_highatomic
	tty: improve tty_insert_flip_char() fast path
	tty: improve tty_insert_flip_char() slow path
	tty: fix __tty_insert_flip_char regression
	Input: i8042 - add Gigabyte P57 to the keyboard reset table
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
	MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
	MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
	crypto: AF_ALG - remove SGL terminator indicator when chaining
	ext4: fix incorrect quotaoff if the quota feature is enabled
	ext4: fix quota inconsistency during orphan cleanup for read-only mounts
	powerpc: Fix DAR reporting when alignment handler faults
	block: Relax a check in blk_start_queue()
	md/bitmap: disable bitmap_resize for file-backed bitmaps.
	skd: Avoid that module unloading triggers a use-after-free
	skd: Submit requests to firmware before triggering the doorbell
	scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
	scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
	scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
	scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
	scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
	scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
	scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
	scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
	scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
	scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
	scsi: storvsc: fix memory leak on ring buffer busy
	scsi: sg: remove 'save_scat_len'
	scsi: sg: use standard lists for sg_requests
	scsi: sg: off by one in sg_ioctl()
	scsi: sg: factor out sg_fill_request_table()
	scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
	scsi: qla2xxx: Fix an integer overflow in sysfs code
	ftrace: Fix selftest goto location on error
	tracing: Apply trace_clock changes to instance max buffer
	ARC: Re-enable MMU upon Machine Check exception
	PCI: shpchp: Enable bridge bus mastering if MSI is enabled
	media: v4l2-compat-ioctl32: Fix timespec conversion
	media: uvcvideo: Prevent heap overflow when accessing mapped controls
	bcache: initialize dirty stripes in flash_dev_run()
	bcache: Fix leak of bdev reference
	bcache: do not subtract sectors_to_gc for bypassed IO
	bcache: correct cache_dirty_target in __update_writeback_rate()
	bcache: Correct return value for sysfs attach errors
	bcache: fix for gc and write-back race
	bcache: fix bch_hprint crash and improve output
	ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
	Linux 4.4.89

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-27 11:52:16 +02:00
Minchan Kim
c576160ff3 mm: prevent double decrease of nr_reserved_highatomic
commit 4855e4a7f29d6d10b0b9c84e189c770c9a94e91e upstream.

There is race between page freeing and unreserved highatomic.

 CPU 0				    CPU 1

    free_hot_cold_page
      mt = get_pfnblock_migratetype
      set_pcppage_migratetype(page, mt)
    				    unreserve_highatomic_pageblock
    				    spin_lock_irqsave(&zone->lock)
    				    move_freepages_block
    				    set_pageblock_migratetype(page)
    				    spin_unlock_irqrestore(&zone->lock)
      free_pcppages_bulk
        __free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed.  By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock severak
times and then it will make mismatch between nr_reserved_highatomic and
the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not and
decrease the count only if the pageblock is highatomic.

Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:13 +02:00
Greg Kroah-Hartman
2e8f1517cf This is the 4.4.84 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmfaTcACgkQONu9yGCS
 aT65LhAAlTz6ssLhstl1QhFIUZ5cTIMouONCYK2+fwg67J3nf8I4whaj0eXX9RIT
 4f6L1JNn3YV6lEfXZLs3EHeiMWft2cal95yQULwhsSwI/1rAzjYkFtj59gPxp2Rc
 03ZrU8UnWNZmKpzneiwnd0kkFi+7wBKT5GbERy6Voh1hpAg8HbHQeWdVxFXKCdLD
 eSP1+RNsvknZBjcJibxhsArs8E8r5t+dXDzi0HYpiZvctV23VXD+y2UDE1RMEDx5
 k5fH30DIxd3T1JU1qHGnJUlfK5jKVho76zoSThwEFm9xqoZat/xrby5gW5sMcWeD
 0BMw4F6GYE8BoeViC/+iujR0B8ngU0e+ExH+M6WYoEGPH1BFHyPNqDoKjnyAjyyH
 tQEOD/0aRWuxcBVyk34EafNZeou/AeDd0IReAHciCIomN0+3u104+HlxkGH1oXEn
 u0O5kVXQPaB/YeXd3jRLSfDmzxojaaihTeJZGFi//1iAj+jJEeYagfeI+flqrtaC
 Gcwi55HrNrLbEj9kBFLEnm8RgFyWFsO0oVbfu1bPUGZOmuMi4u1Ptkffi3p/Wrsh
 cx9ErKXj6meOgkcmzCWYl1Ygp3rY3bdlbidixJnzEfOTeZ2FyxnMz3BQAxGeOPTD
 OhUevEK08oMTb1YDt3i7Sh1BGKpU0AEaEw5i8m36m4rC6KdqJfA=
 =JeJw
 -----END PGP SIGNATURE-----

Merge 4.4.84 into android-4.4

Changes in 4.4.84
	netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
	audit: Fix use after free in audit_remove_watch_rule()
	parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
	crypto: x86/sha1 - Fix reads beyond the number of blocks passed
	Input: elan_i2c - add ELAN0608 to the ACPI table
	Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB
	ALSA: seq: 2nd attempt at fixing race creating a queue
	ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
	ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
	mm/mempolicy: fix use after free when calling get_mempolicy
	mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
	xen: fix bio vec merging
	x86/asm/64: Clear AC on NMI entries
	irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
	irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
	Sanitize 'move_pages()' permission checks
	pids: make task_tgid_nr_ns() safe
	perf/x86: Fix LBR related crashes on Intel Atom
	usb: optimize acpi companion search for usb port devices
	usb: qmi_wwan: add D-Link DWM-222 device ID
	Linux 4.4.84

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-29 14:33:50 +02:00
Linus Torvalds
46d51a26ef Sanitize 'move_pages()' permission checks
commit 197e7e521384a23b9e585178f3f11c9fa08274b9 upstream.

The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).

That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.

So change the access checks to the more common 'ptrace_may_access()'
model instead.

This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.

Famous last words.

Reported-by: Otto Ebeling <otto.ebeling@iki.fi>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-24 17:02:36 -07:00
zhong jiang
cc971fa12b mm/mempolicy: fix use after free when calling get_mempolicy
commit 73223e4e2e3867ebf033a5a8eb2e5df0158ccc99 upstream.

I hit a use after free issue when executing trinity and repoduced it
with KASAN enabled.  The related call trace is as follows.

  BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
  Read of size 2 by task syz-executor1/798

  INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
     __slab_alloc+0x768/0x970
     kmem_cache_alloc+0x2e7/0x450
     mpol_new.part.2+0x74/0x160
     mpol_new+0x66/0x80
     SyS_mbind+0x267/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
     __slab_free+0x495/0x8e0
     kmem_cache_free+0x2f3/0x4c0
     __mpol_put+0x2b/0x40
     SyS_mbind+0x383/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
  INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600

  Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
  Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
  Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
  Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
  Memory state around the buggy address:
  ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc

!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem.  do_get_mempolicy,
however, drops the lock midway while we can still access it later.

Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 8bccd85ffb ("[PATCH] Implement sys_* do_*
layering in the memory policy layer.").  but when we have the the
current mempolicy ref count model.  The issue was introduced
accordingly.

Fix the issue by removing the premature release.

Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-24 17:02:35 -07:00
Greg Kroah-Hartman
f869132f15 This is the 4.4.83 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmUrdAACgkQONu9yGCS
 aT49MBAAqKJXpRzIBnzLR45QLRU5jfkUhaCtCckBAbyLE+rUH0lE4L37JHYcZ9jr
 79gG06QAuWJaTd4Nug7DocmPiqpPWi+PY46yjUQ1j3tllKWdp7b/PJXvYX3zbK+d
 vDgn6T1AyAoCBKa2aLU26SAYmfLCT+jhHzbMaRQ4eAcYE8u8w8jrfngVmnunXVme
 u6CkAZpPMXXm5jUpxgPguEOm2WMubPYEF2BMJIVuvYypeJGM0EYbOHNEUMK5jkPv
 T17+4EzSCeqGaDZElxPW4NuaHMStZW36g9gQOti3o/8/5shNLyJK3vYzWG+06zfH
 6CNElSk7Y3Fl6qALLWfd1dkjImtJvWKDVWTC43woFT/96DtXueGxrYJYRF+px9bq
 dBWAW86g5Tp2JTM+6VhN0N/Z5ANK48Oi2NrzqJXK7DrmZbS5mxMIZw239QJnEOBh
 hSxDbe9pkNJvSmR+yF+qxkz78XOOvBz4zIkGl6M70cRQWnJ0g4tCSyy2hrEooDzZ
 sfaokSdClzt3qRoFwSZIGZLpvRp9vSepXNN/nvUTX3dOLcjproVYMZJWiAUqTUyD
 /0gwrJTpDP3nZGrHdmeWL/erQDWP1aFiXlsJ0E87ymSt7KYNYFGH2ePv7Ujov/AH
 dlmvQFhSW1v7xiuiiQo9gxIo8djHqZ8FLbTCznQcQ8Scm4cMNAM=
 =riD5
 -----END PGP SIGNATURE-----

Merge 4.4.83 into android-4.4

Changes in 4.4.83
	cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
	mm: ratelimit PFNs busy info message
	iscsi-target: fix memory leak in iscsit_setup_text_cmd()
	iscsi-target: Fix iscsi_np reset hung task during parallel delete
	fuse: initialize the flock flag in fuse_file on allocation
	nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
	USB: serial: option: add D-Link DWM-222 device ID
	USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
	USB: serial: pl2303: add new ATEN device id
	usb: musb: fix tx fifo flush handling again
	USB: hcd: Mark secondary HCD as dead if the primary one died
	staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
	iio: accel: bmc150: Always restore device to normal mode after suspend-resume
	iio: light: tsl2563: use correct event code
	uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
	USB: Check for dropped connection before switching to full speed
	usb: core: unlink urbs from the tail of the endpoint's urb_list
	usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
	usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
	iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
	pnfs/blocklayout: require 64-bit sector_t
	pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
	pinctrl: samsung: Remove bogus irq_[un]mask from resource management
	Linux 4.4.83

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-16 14:05:33 -07:00
Jonathan Toppins
9ea732ebb5 mm: ratelimit PFNs busy info message
commit 75dddef32514f7aa58930bde6a1263253bc3d4ba upstream.

The RDMA subsystem can generate several thousand of these messages per
second eventually leading to a kernel crash.  Ratelimit these messages
to prevent this crash.

Doug said:
 "I've been carrying a version of this for several kernel versions. I
  don't remember when they started, but we have one (and only one) class
  of machines: Dell PE R730xd, that generate these errors. When it
  happens, without a rate limit, we get rcu timeouts and kernel oopses.
  With the rate limit, we just get a lot of annoying kernel messages but
  the machine continues on, recovers, and eventually the memory
  operations all succeed"

And:
 "> Well... why are all these EBUSY's occurring? It sounds inefficient
  > (at least) but if it is expected, normal and unavoidable then
  > perhaps we should just remove that message altogether?

  I don't have an answer to that question. To be honest, I haven't
  looked real hard. We never had this at all, then it started out of the
  blue, but only on our Dell 730xd machines (and it hits all of them),
  but no other classes or brands of machines. And we have our 730xd
  machines loaded up with different brands and models of cards (for
  instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
  ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
  meant it wasn't tied to any particular brand/model of RDMA hardware.
  To me, it always smelled of a hardware oddity specific to maybe the
  CPUs or mainboard chipsets in these machines, so given that I'm not an
  mm expert anyway, I never chased it down.

  A few other relevant details: it showed up somewhere around 4.8/4.9 or
  thereabouts. It never happened before, but the prinkt has been there
  since the 3.18 days, so possibly the test to trigger this message was
  changed, or something else in the allocator changed such that the
  situation started happening on these machines?

  And, like I said, it is specific to our 730xd machines (but they are
  all identical, so that could mean it's something like their specific
  ram configuration is causing the allocator to hit this on these
  machine but not on other machines in the cluster, I don't want to say
  it's necessarily the model of chipset or CPU, there are other bits of
  identicalness between these machines)"

Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-16 13:40:28 -07:00
Greg Kroah-Hartman
4b8fc9f2bc This is the 4.4.82 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmPuZcACgkQONu9yGCS
 aT5o0BAAlT21EbhyxoMPC6xrPHAF1Oi8mTVfpu+618AUs3B1M6xge/EKI08B/8DP
 MZgaqSqY5ttaIlDKX5OVhY+HiuMg3SbIaFDzhS+OzjpuIjSA9ljNHazp5/l2HsQu
 9zyPFN02L2zqWYppyDo6FQBfStB5rUHB4eVMgD6zuNU/YQovtibGqAY4LBfWvxf/
 eDO6VfjiS4zzcCoplZxcxim1YVZ+HX09BuwniJzukM4C4/uMDubwMlJmrN9YsQZW
 x5zWnLHce2MATk9yF4BzMI/iRDR+Bm6Vx3m1Vzq9WDu7/kkMTVdYjXHmZn02YQub
 q4q1nDZyzBf8SvE4kf+fMYS8+dUrwiKf0lahBTK31J5Bc33lRfBfvv+dr/aEnp/Q
 FhraSkcBDrulnxuq77WZbvWzj0otF1pCTtURyCSfdc4SOFwVIz2NLQ2ZnnXk4gnN
 h5TqjxSDwr2CwTMzOnaKjBcuWnKPvn3/Pjm+/MJS8wvQYPZv8a4AzMIwxjDEN78Z
 +FvtaWEoUCnlP869hyR7gTfk2541+qjMdDRRUPSQ16PvepKy1AG9iCqVvZThScyQ
 PygaiBYZ9pbcyFuExLQrj2FDY2odinPfN8IsCQQbk5Es5mCdzJZOOkLeO2PO0MxD
 Dya79igFnpNj7ZEu6T7lD6Izg/6fYWu7qKmpDKQ7/xn4hHxj/Ig=
 =D9TG
 -----END PGP SIGNATURE-----

Merge 4.4.82 into android-4.4

Changes in 4.4.82
	tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
	net: fix keepalive code vs TCP_FASTOPEN_CONNECT
	bpf, s390: fix jit branch offset related to ldimm64
	net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
	tcp: fastopen: tcp_connect() must refresh the route
	net: avoid skb_warn_bad_offload false positives on UFO
	packet: fix tp_reserve race in packet_set_ring
	revert "net: account for current skb length when deciding about UFO"
	revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
	udp: consistently apply ufo or fragmentation
	sparc64: Prevent perf from running during super critical sections
	KVM: arm/arm64: Handle hva aging while destroying the vm
	mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
	ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
	net: account for current skb length when deciding about UFO
	Linux 4.4.82

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-14 10:17:08 -07:00
Matthew Dawson
d45aabadbc mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
commit 7640131032db9118a78af715ac77ba2debeeb17c upstream.

When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging.  Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.

Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrii Bordunov <aborduno@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-12 19:29:09 -07:00
Greg Kroah-Hartman
dfff30bca9 This is the 4.4.81 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmN2eMACgkQONu9yGCS
 aT7zCQ//eDgCF9YJnE1v8/JJ0yl2uK7XjVrF/tpPvzgTgszu4En4kGfhUO+WvmkU
 0/pqYBMAPZEbfmx+6q8FJx/MHDjFA1oKb+a9pS1RUovzWDLQoRxYwiBtR2osmuOE
 f1fbDMt9ETDUxUGLhRJ/vuzeIjmouhPkz5vZAg863+sKYYjPHlczymcgMs0sRMsE
 3kkgo6mhCKTLt8gvioSUjeVWs4a5y3unvImhSLjEHjcfydlDLwA8RuFdFwBIgNfP
 yPrgW3v5l9HHXI1lWMcOCTpVeDI272sKNOppYg4r2N/I/epBN79j7jGrqGQpG8NP
 mKOkgRDoR7ifyKLSS55R8anLyNoi4jfQAHbOxlSVGymwpd9kRuHoeTE5+IqYs+V5
 qLkqLz63hmbfRQuW6az6L+SGVwgj3DSHakGQFkB0ouB8h5ubU2OqINxOsaNABbHD
 C1Q9giqG8b2MEv5D4O4m7BhK1tDzSJmT2tb9UG+UV8LJn1PhFSnSMkjP4S7trZl+
 +8myxdoNVvDMpd23UqM7o1fuYalbslTKED9el31FimOaNF79+tzyjnNbWA6zqX+X
 U3I+Pp2FafOS2heTLTX59fz09LKRI+iP3pnlCBpp1a+MKAIEbjeW8YB5zTKrSNOv
 RkZ+1qIQtmGyhVp/YDsua5J1lhZVXeLeoEqDXYerELOdGKF30jw=
 =pHqB
 -----END PGP SIGNATURE-----

Merge 4.4.81 into android-4.4

Changes in 4.4.81
	libata: array underflow in ata_find_dev()
	workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
	ALSA: hda - Fix speaker output from VAIO VPCL14M1R
	ASoC: do not close shared backend dailink
	KVM: async_pf: make rcu irq exit if not triggered from idle task
	mm/page_alloc: Remove kernel address exposure in free_reserved_area()
	ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
	ext4: fix overflow caused by missing cast in ext4_resize_fs()
	ARM: dts: armada-38x: Fix irq type for pca955
	media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
	target: Avoid mappedlun symlink creation during lun shutdown
	iscsi-target: Always wait for kthread_should_stop() before kthread exit
	iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
	iscsi-target: Fix initial login PDU asynchronous socket close OOPs
	iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
	iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
	mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
	media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
	f2fs: sanity check checkpoint segno and blkoff
	drm: rcar-du: fix backport bug
	saa7164: fix double fetch PCIe access condition
	ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
	net: Zero terminate ifr_name in dev_ifname().
	ipv6: avoid overflow of offset in ip6_find_1stfragopt
	ipv4: initialize fib_trie prior to register_netdev_notifier call.
	rtnetlink: allocate more memory for dev_set_mac_address()
	mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
	openvswitch: fix potential out of bound access in parse_ct
	packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
	ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
	net: ethernet: nb8800: Handle all 4 RGMII modes identically
	dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
	dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
	dccp: fix a memleak for dccp_feat_init err process
	sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
	sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
	net/mlx5: Fix command bad flow on command entry allocation failure
	net: phy: Correctly process PHY_HALTED in phy_stop_machine()
	net: phy: Fix PHY unbind crash
	xen-netback: correctly schedule rate-limited queues
	sparc64: Measure receiver forward progress to avoid send mondo timeout
	wext: handle NULL extra data in iwe_stream_add_point better
	sh_eth: R8A7740 supports packet shecksumming
	net: phy: dp83867: fix irq generation
	tg3: Fix race condition in tg3_get_stats64().
	x86/boot: Add missing declaration of string functions
	phy state machine: failsafe leave invalid RUNNING state
	scsi: qla2xxx: Get mutex lock before checking optrom_state
	drm/virtio: fix framebuffer sparse warning
	virtio_blk: fix panic in initialization error path
	ARM: 8632/1: ftrace: fix syscall name matching
	mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
	lib/Kconfig.debug: fix frv build failure
	signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
	mm: don't dereference struct page fields of invalid pages
	ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
	net: account for current skb length when deciding about UFO
	workqueue: implicit ordered attribute should be overridable
	Linux 4.4.81

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-11 13:09:21 -07:00
Ard Biesheuvel
78c04996b5 mm: don't dereference struct page fields of invalid pages
[ Upstream commit f073bdc51771f5a5c7a8d1191bfc3ae371d44de7 ]

The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone.  However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with.  This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.

So reorder the VM_BUG_ON() with the pfn_valid_within() check.

Link: http://lkml.kernel.org/r/1481706707-6211-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Robert Richter <rrichter@cavium.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11 09:08:59 -07:00
Mel Gorman
f1181047ff mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.

Stable note for 4.4: The upstream patch patches madvise(MADV_FREE) but 4.4
	does not have support for that feature. The changelog is left
	as-is but the hunk related to madvise is omitted from the backport.

Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.

He described the race as follows:

        CPU0                            CPU1
        ----                            ----
                                        user accesses memory using RW PTE
                                        [PTE now cached in TLB]
        try_to_unmap_one()
        ==> ptep_get_and_clear()
        ==> set_tlb_ubc_flush_pending()
                                        mprotect(addr, PROT_READ)
                                        ==> change_pte_range()
                                        ==> [ PTE non-present - no flush ]

                                        user writes using cached RW PTE
        ...

        try_to_unmap_flush()

The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.

For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access.  For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB.  However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.

Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption.  In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch.  Other users such as gup as a race with
reclaim looks just at PTEs.  huge page variants should be ok as they
don't race with reclaim.  mincore only looks at PTEs.  userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.

Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected.  His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.

[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11 09:08:50 -07:00
Josh Poimboeuf
12f60018f6 mm/page_alloc: Remove kernel address exposure in free_reserved_area()
commit adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 upstream.

Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg.  The only leaks I see on my local
system are:

  Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
  Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
  Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
  Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
  Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)

Linus says:

  "I suspect we should just remove [the addresses in the 'Freeing'
   messages]. I'm sure they are useful in theory, but I suspect they
   were more useful back when the whole "free init memory" was
   originally done.

   These days, if we have a use-after-free, I suspect the init-mem
   situation is the easiest situation by far. Compared to all the dynamic
   allocations which are much more likely to show it anyway. So having
   debug output for that case is likely not all that productive."

With this patch the freeing messages now look like this:

  Freeing SMP alternatives memory: 32K
  Freeing initrd memory: 10588K
  Freeing unused kernel memory: 3592K
  Freeing unused kernel memory: 1352K
  Freeing unused kernel memory: 632K

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-11 09:08:47 -07:00
Greg Kroah-Hartman
59ff2e15be This is the 4.4.78 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllxlOsACgkQONu9yGCS
 aT4naw//VrWIuIf523IP6egcS+iprNM1lt8HZIzTQ0b2x0qv82UFA9FSs3e97dbG
 LJAUEmHW+oBbm6AekAUrNFF62qaMM5HYxVgGYeiniA1dRvLCDG/OxzxPc0hv6Kmm
 VsbZBlPQEj2B5tqUOkpvQNcqbKpIVglBsK4tjeHE/mRziUkIoPXWKS6Tt7VKGtBa
 gIGY7VASyKhPtxGCdbKzHsO/IGi3cmpwAyhqQDRBR/5Dxy3p9xlQ4gTpobeL+x8z
 A7WHqVNbgfdz21LBii5xE8+GUwiRjlYhxeFJTjhM6wo2/XCtw1FJgc7EMmsbVtbZ
 xYu1/tkYaZGoxOQ7sH5jNgVjE8IcyVimDa5eJ7p7fbh3AzsyXzAJIRc8cS7cL40F
 jLDWrYDnm5t7ziITrAf0uoMmRZLHqS2Bv5sqaoCxR3D51r6LVaNdavD6hYN1CLRA
 fjlDvnvwxbLPI4YPr7PZNu4oSJiawKx9jEBOTFSYu9L1XLvdvdcVU6ULAdpShnLn
 80a+YJmYsNg54im7sxFUw6z87AScznzirIpXEJoUO+Hs0SN9Rq/BQx8cKvVeaMEI
 z53c4Hci45Go/ozZVqCTxGbkQ6tKWIKBSo6kl3xchwms4lmijpWuwbAkDUzECNvv
 0RgyQvNx4d2cRJ8hIVcdVFzp+h+kgNqVQQ2nDld7/1QSZHLPhFo=
 =vrgw
 -----END PGP SIGNATURE-----

Merge 4.4.78 into android-4.4

Changes in 4.4.78
	net_sched: fix error recovery at qdisc creation
	net: sched: Fix one possible panic when no destroy callback
	net/phy: micrel: configure intterupts after autoneg workaround
	ipv6: avoid unregistering inet6_dev for loopback
	net: dp83640: Avoid NULL pointer dereference.
	tcp: reset sk_rx_dst in tcp_disconnect()
	net: prevent sign extension in dev_get_stats()
	bpf: prevent leaking pointer via xadd on unpriviledged
	net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
	ipv6: dad: don't remove dynamic addresses if link is down
	net: ipv6: Compare lwstate in detecting duplicate nexthops
	vrf: fix bug_on triggered by rx when destroying a vrf
	rds: tcp: use sock_create_lite() to create the accept socket
	brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
	cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
	cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
	cfg80211: Check if PMKID attribute is of expected size
	irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
	parisc: Report SIGSEGV instead of SIGBUS when running out of stack
	parisc: use compat_sys_keyctl()
	parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
	parisc/mm: Ensure IRQs are off in switch_mm()
	tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
	kernel/extable.c: mark core_kernel_text notrace
	mm/list_lru.c: fix list_lru_count_node() to be race free
	fs/dcache.c: fix spin lockup issue on nlru->lock
	checkpatch: silence perl 5.26.0 unescaped left brace warnings
	binfmt_elf: use ELF_ET_DYN_BASE only for PIE
	arm: move ELF_ET_DYN_BASE to 4MB
	arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
	powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
	s390: reduce ELF_ET_DYN_BASE
	exec: Limit arg stack to at most 75% of _STK_LIM
	vt: fix unchecked __put_user() in tioclinux ioctls
	mnt: In umount propagation reparent in a separate pass
	mnt: In propgate_umount handle visiting mounts in any order
	mnt: Make propagate_umount less slow for overlapping mount propagation trees
	selftests/capabilities: Fix the test_execve test
	tpm: Get rid of chip->pdev
	tpm: Provide strong locking for device removal
	Add "shutdown" to "struct class".
	tpm: Issue a TPM2_Shutdown for TPM2 devices.
	mm: fix overflow check in expand_upwards()
	crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
	crypto: atmel - only treat EBUSY as transient if backlog
	crypto: sha1-ssse3 - Disable avx2
	crypto: caam - fix signals handling
	sched/topology: Fix overlapping sched_group_mask
	sched/topology: Optimize build_group_mask()
	PM / wakeirq: Convert to SRCU
	PM / QoS: return -EINVAL for bogus strings
	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
	KVM: x86: disable MPX if host did not enable MPX XSAVE features
	kvm: vmx: Do not disable intercepts for BNDCFGS
	kvm: x86: Guest BNDCFGS requires guest MPX support
	kvm: vmx: Check value written to IA32_BNDCFGS
	kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
	Linux 4.4.78

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21 09:14:57 +02:00
Helge Deller
8f93a9aa1d mm: fix overflow check in expand_upwards()
commit 37511fb5c91db93d8bd6e3f52f86e5a7ff7cfcdf upstream.

Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.

Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.

Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.

Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:58 +02:00
Sahitya Tummala
2d0db02d2e mm/list_lru.c: fix list_lru_count_node() to be race free
commit 2c80cd57c74339889a8752b20862a16c28929c3a upstream.

list_lru_count_node() iterates over all memcgs to get the total number of
entries on the node but it can race with memcg_drain_all_list_lrus(),
which migrates the entries from a dead cgroup to another.  This can return
incorrect number of entries from list_lru_count_node().

Fix this by keeping track of entries per node and simply return it in
list_lru_count_node().

Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:56 +02:00
Greg Kroah-Hartman
cc3d2b7361 This is the 4.4.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp5y8ACgkQONu9yGCS
 aT6g8hAApzYi9TwiaF6wyYXsrp7YvOm4NyMaVBl4t7v/nFql6VsUL+qWaJKB5EL9
 o72ybYPUzbxGTVWCm/wiBO31VWea0ak0pBbyywBiowGgwAcgG/jpqZobale4Y2TE
 15jEpmpA5+3BmXpMkrv/dz4LHZ4jm65/ADhMbkPGRZqUJ3mHmyVoi50l67dpTE5+
 xWQIErycwlVMppJGnXPeFFgeD7Etch7OJ9CishQRNMb3F8H58WiQrMWWe1NfL0DO
 H2g18IBHMsxEYJqnRqxviTOMe8S96Km+lKGX0LOTRYt+2OQpfIF7buU6N+6C96rO
 7V2n2G02m2mOFVUFlDYF1RQ9IBrxHJf9iGkaZBwsaxX7XAK63ZjRxgjnEL7gMPU/
 TMCOWZ53BdZezz2eAmdhySsV+4Xt6MmJJE8rR47AgsM2Le3tgK421zmraunmA0fR
 eoJS99YHcftAHXCD3puGLafEwGVe0G4eQbY4L7mj1Y9VjaAbmmsWq9rlNOQMZRgH
 JTNyYik1C7yGPJX1iKi9hLAKldzBwPuM3GfZMZQIOjA4t2VtSon7in5iKrihRg3N
 BSKXr6+orNw32tsqcC4kpLPbFUFb6zx3EKELwSJwD9ICN7swJEk7gFw7w/F/SOxI
 C1W4Ulm6EcYTWHDePERQ4zHlllHAmyJup61d9HnwA6HhPOLaff4=
 =oeNk
 -----END PGP SIGNATURE-----

Merge 4.4.77 into android-4.4

Changes in 4.4.77
	fs: add a VALID_OPEN_FLAGS
	fs: completely ignore unknown open flags
	driver core: platform: fix race condition with driver_override
	bgmac: reset & enable Ethernet core before using it
	mm: fix classzone_idx underflow in shrink_zones()
	tracing/kprobes: Allow to create probe with a module name starting with a digit
	drm/virtio: don't leak bo on drm_gem_object_init failure
	usb: dwc3: replace %p with %pK
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	Add USB quirk for HVR-950q to avoid intermittent device resets
	usb: usbip: set buffer pointers to NULL after free
	usb: Fix typo in the definition of Endpoint[out]Request
	mac80211_hwsim: Replace bogus hrtimer clockid
	sysctl: don't print negative flag for proc_douintvec
	sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
	pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
	pinctrl: meson: meson8b: fix the NAND DQS pins
	pinctrl: sunxi: Fix SPDIF function name for A83T
	pinctrl: mxs: atomically switch mux and drive strength config
	pinctrl: sh-pfc: Update info pointer after SoC-specific init
	USB: serial: option: add two Longcheer device ids
	USB: serial: qcserial: new Sierra Wireless EM7305 device ID
	gfs2: Fix glock rhashtable rcu bug
	x86/tools: Fix gcc-7 warning in relocs.c
	x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
	ath10k: override CE5 config for QCA9377
	KEYS: Fix an error code in request_master_key()
	RDMA/uverbs: Check port number supplied by user verbs cmds
	mqueue: fix a use-after-free in sys_mq_notify()
	tools include: Add a __fallthrough statement
	tools string: Use __fallthrough in perf_atoll()
	tools strfilter: Use __fallthrough
	perf top: Use __fallthrough
	perf intel-pt: Use __fallthrough
	perf thread_map: Correctly size buffer used with dirent->dt_name
	perf scripting perl: Fix compile error with some perl5 versions
	perf tests: Avoid possible truncation with dirent->d_name + snprintf
	perf bench numa: Avoid possible truncation when using snprintf()
	perf tools: Use readdir() instead of deprecated readdir_r()
	perf thread_map: Use readdir() instead of deprecated readdir_r()
	perf script: Use readdir() instead of deprecated readdir_r()
	perf tools: Remove duplicate const qualifier
	perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
	perf pmu: Fix misleadingly indented assignment (whitespace)
	perf dwarf: Guard !x86_64 definitions under #ifdef else clause
	perf trace: Do not process PERF_RECORD_LOST twice
	perf tests: Remove wrong semicolon in while loop in CQM test
	perf tools: Use readdir() instead of deprecated readdir_r() again
	md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
	md: fix super_offset endianness in super_1_rdev_size_change
	tcp: fix tcp_mark_head_lost to check skb len before fragmenting
	staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
	staging: comedi: fix clean-up of comedi_class in comedi_init()
	ext4: check return value of kstrtoull correctly in reserved_clusters_store
	x86/mm/pat: Don't report PAT on CPUs that don't support it
	saa7134: fix warm Medion 7134 EEPROM read
	Linux 4.4.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-15 13:29:08 +02:00
Vlastimil Babka
78f20db864 mm: fix classzone_idx underflow in shrink_zones()
[Not upstream as that would take 34+ patches]

We've got reported a BUG in do_try_to_free_pages():

BUG: unable to handle kernel paging request at ffff8ffffff28990
IP: [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
PGD 0
Oops: 0000 [#1] SMP
megaraid_sas sg scsi_mod efivarfs autofs4
Supported: No, Unsupported modules are loaded
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
task: ffff88ffd0d4c540 ti: ffff88ffd0e48000 task.ti: ffff88ffd0e48000
RIP: 0010:[<ffffffff8119abe0>]  [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
RSP: 0018:ffff88ffd0e4ba60  EFLAGS: 00010206
RAX: 000006fffffff900 RBX: 00000000ffffffff RCX: ffff88fffff29000
RDX: 000000ffffffff00 RSI: 0000000000000003 RDI: 00000000024200c8
RBP: 0000000001320122 R08: 0000000000000000 R09: ffff88ffd0e4bbac
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88ffd0e4bae0
R13: 0000000000000e00 R14: ffff88fffff2a500 R15: ffff88fffff2b300
FS:  0000000000000000(0000) GS:ffff88ffe6440000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8ffffff28990 CR3: 0000000001c0a000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
 00000002db570a80 024200c80000001e ffff88fffff2b300 0000000000000000
 ffff88fffffd5700 ffff88ffd0d4c540 ffff88ffd0d4c540 ffffffff0000000c
 0000000000000000 0000000000000040 00000000024200c8 ffff88ffd0e4bae0
Call Trace:
 [<ffffffff8119afea>] try_to_free_pages+0xba/0x170
 [<ffffffff8118cf2f>] __alloc_pages_nodemask+0x53f/0xb20
 [<ffffffff811d39ff>] alloc_pages_current+0x7f/0x100
 [<ffffffff811e2232>] migrate_pages+0x202/0x710
 [<ffffffff815dadaa>] __offline_pages.constprop.23+0x4ba/0x790
 [<ffffffff81463263>] memory_subsys_offline+0x43/0x70
 [<ffffffff8144cbed>] device_offline+0x7d/0xa0
 [<ffffffff81392fa2>] acpi_bus_offline+0xa5/0xef
 [<ffffffff81394a77>] acpi_device_hotplug+0x21b/0x41f
 [<ffffffff8138dab7>] acpi_hotplug_work_fn+0x1a/0x23
 [<ffffffff81093cee>] process_one_work+0x14e/0x410
 [<ffffffff81094546>] worker_thread+0x116/0x490
 [<ffffffff810999ed>] kthread+0xbd/0xe0
 [<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70

This translates to the loop in shrink_zone():

classzone_idx = requested_highidx;
while (!populated_zone(zone->zone_pgdat->node_zones +
					classzone_idx))
	classzone_idx--;

where no zone is populated, so classzone_idx becomes -1 (in RBX).

Added debugging output reveals that we enter the function with
sc->gfp_mask == GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE
requested_highidx = gfp_zone(sc->gfp_mask) == 2 (ZONE_NORMAL)

Inside the for loop, however:
gfp_zone(sc->gfp_mask) == 3 (ZONE_MOVABLE)

This means we have gone through this branch:

if (buffer_heads_over_limit)
    sc->gfp_mask |= __GFP_HIGHMEM;

This changes the gfp_zone() result, but requested_highidx remains unchanged.
On nodes where the only populated zone is movable, the inner while loop will
check only lower zones, which are not populated, and underflow classzone_idx.

To sum up, the bug occurs in configurations with ZONE_MOVABLE (such as when
booted with the movable_node parameter) and only in situations when
buffer_heads_over_limit is true, and there's an allocation with __GFP_MOVABLE
and without __GFP_HIGHMEM performing direct reclaim.

This patch makes sure that classzone_idx starts with the correct zone.

Mainline has been affected in versions 4.6 and 4.7, but the culprit commit has
been also included in stable trees.
In mainline, this has been fixed accidentally as part of 34-patch series (plus
follow-up fixes) "Move LRU page reclaim from zones to nodes", which makes the
mainline commit unsuitable for stable backport, unfortunately.

Fixes: 7bf52fb891b6 ("mm: vmscan: reclaim highmem zone if buffer_heads is over limit")
Obsoleted-by: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Debugged-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 11:57:44 +02:00
Greg Kroah-Hartman
64a73ff728 This is the 4.4.76 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllc3f0ACgkQONu9yGCS
 aT4fmA/+OHeYbhpaMRKqrUpsxB3NpROr2Z47ow6vaVjYZzd0irrODLlfIfDQ6EEo
 N3v28povu16VeYXk+4h8bsAP2K2j6/BlRaSi2hB6dmnY8GDMaXEfRojPYAlzVz50
 qnK/6152siDDarUx1h5Zc8GcmX/tEl6h3bOOxDcwLR+RvyIcWxenuR+uqRM/AV6o
 BPEiOuMu7P6LjID7KYgBTFNajVBMLrDXt4SCWdzOZmlNt0QXgKB9yw68vTcc+edC
 ZcXqa0M6nEWSDvwobbwBZhFL8H2dJjzweyjeFBgxnxgmOrRh6kvZG2wsz2c8O3/P
 g8TuMxU7siu+I3lFwKy+dgZ/1REz+6Q3oFBqXsuddrcPYu23rV6mz/GxqWy4cerb
 M4eTWz6L9vA2GoYpvBaWi0tKC9tkNM49g48Y24a6CW1O4dJWlz3RrpTiZmequbNF
 mo8EKomSXn4kYAm1xT03DGljQkK/i2JtyI5sk2hLEqqxKvZ/3q9xxLLKOVx8dPvs
 PIbfpapfYMXXMWgR6e+UKueNLgevfWE12X/OU4SgvSY4n/07/mH40XEd3zd82IsZ
 1Mw0qj3JnqCAFDBBMsDYa+OvABaGD1dHARuiv+aeqW8tqoBglFHxWqF+SQVNXLIE
 qTLiKz78vjQpH0zGpkA3HEOh/h4L7a0y3qRMECsk5SUxXsgu1gg=
 =bwNU
 -----END PGP SIGNATURE-----

Merge 4.4.76 into android-4.4

Changes in 4.4.76
	ipv6: release dst on error in ip6_dst_lookup_tail
	net: don't call strlen on non-terminated string in dev_set_alias()
	decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
	net: Zero ifla_vf_info in rtnl_fill_vfinfo()
	af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
	Fix an intermittent pr_emerg warning about lo becoming free.
	net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
	igmp: acquire pmc lock for ip_mc_clear_src()
	igmp: add a missing spin_lock_init()
	ipv6: fix calling in6_ifa_hold incorrectly for dad work
	net/mlx5: Wait for FW readiness before initializing command interface
	decnet: always not take dst->__refcnt when inserting dst into hash table
	net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
	sfc: provide dummy definitions of vswitch functions
	ipv6: Do not leak throw route references
	rtnetlink: add IFLA_GROUP to ifla_policy
	netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
	netfilter: synproxy: fix conntrackd interaction
	NFSv4: fix a reference leak caused WARNING messages
	drm/ast: Handle configuration without P2A bridge
	mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
	MIPS: Avoid accidental raw backtrace
	MIPS: pm-cps: Drop manual cache-line alignment of ready_count
	MIPS: Fix IRQ tracing & lockdep when rescheduling
	ALSA: hda - Fix endless loop of codec configure
	ALSA: hda - set input_path bitmap to zero after moving it to new place
	drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
	usb: gadget: f_fs: Fix possibe deadlock
	sysctl: enable strict writes
	block: fix module reference leak on put_disk() call for cgroups throttle
	mm: numa: avoid waiting on freed migrated pages
	KVM: x86: fix fixing of hypercalls
	scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
	scsi: lpfc: Set elsiocb contexts to NULL after freeing it
	qla2xxx: Fix erroneous invalid handle message
	ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
	net: mvneta: Fix for_each_present_cpu usage
	MIPS: ath79: fix regression in PCI window initialization
	net: korina: Fix NAPI versus resources freeing
	MIPS: ralink: MT7688 pinmux fixes
	MIPS: ralink: fix USB frequency scaling
	MIPS: ralink: Fix invalid assignment of SoC type
	MIPS: ralink: fix MT7628 pinmux typos
	MIPS: ralink: fix MT7628 wled_an pinmux gpio
	mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
	bgmac: fix a missing check for build_skb
	mtd: bcm47xxpart: don't fail because of bit-flips
	bgmac: Fix reversed test of build_skb() return value.
	net: bgmac: Fix SOF bit checking
	net: bgmac: Start transmit queue in bgmac_open
	net: bgmac: Remove superflous netif_carrier_on()
	powerpc/eeh: Enable IO path on permanent error
	gianfar: Do not reuse pages from emergency reserve
	Btrfs: fix truncate down when no_holes feature is enabled
	virtio_console: fix a crash in config_work_handler
	swiotlb-xen: update dev_addr after swapping pages
	xen-netfront: Fix Rx stall during network stress and OOM
	scsi: virtio_scsi: Reject commands when virtqueue is broken
	platform/x86: ideapad-laptop: handle ACPI event 1
	amd-xgbe: Check xgbe_init() return code
	net: dsa: Check return value of phy_connect_direct()
	drm/amdgpu: check ring being ready before using
	vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
	virtio_net: fix PAGE_SIZE > 64k
	vxlan: do not age static remote mac entries
	ibmveth: Add a proper check for the availability of the checksum features
	kernel/panic.c: add missing \n
	HID: i2c-hid: Add sleep between POWER ON and RESET
	scsi: lpfc: avoid double free of resource identifiers
	spi: davinci: use dma_mapping_error()
	mac80211: initialize SMPS field in HT capabilities
	x86/mpx: Use compatible types in comparison to fix sparse error
	coredump: Ensure proper size of sparse core files
	swiotlb: ensure that page-sized mappings are page-aligned
	s390/ctl_reg: make __ctl_load a full memory barrier
	be2net: fix status check in be_cmd_pmac_add()
	perf probe: Fix to show correct locations for events on modules
	net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
	sctp: check af before verify address in sctp_addr_id2transport
	ravb: Fix use-after-free on `ifconfig eth0 down`
	jump label: fix passing kbuild_cflags when checking for asm goto support
	xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
	xfrm: NULL dereference on allocation failure
	xfrm: Oops on error in pfkey_msg2xfrm_state()
	watchdog: bcm281xx: Fix use of uninitialized spinlock.
	sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
	ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
	ARM: 8685/1: ensure memblock-limit is pmd-aligned
	x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
	x86/mm: Fix flush_tlb_page() on Xen
	ocfs2: o2hb: revert hb threshold to keep compatible
	iommu/vt-d: Don't over-free page table directories
	iommu: Handle default domain attach failure
	iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
	cpufreq: s3c2416: double free on driver init error path
	KVM: x86: fix emulation of RSM and IRET instructions
	KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
	KVM: x86: zero base3 of unusable segments
	KVM: nVMX: Fix exception injection
	Linux 4.4.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-05 16:16:58 +02:00
Mark Rutland
cdbf92675f mm: numa: avoid waiting on freed migrated pages
commit 3c226c637b69104f6b9f1c6ec5b08d7b741b3229 upstream.

In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
waiting until the pmd is unlocked before we return and retry.  However,
we can race with migrate_misplaced_transhuge_page():

    // do_huge_pmd_numa_page                // migrate_misplaced_transhuge_page()
    // Holds 0 refs on page                 // Holds 2 refs on page

    vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
    /* ... */
    if (pmd_trans_migrating(*vmf->pmd)) {
            page = pmd_page(*vmf->pmd);
            spin_unlock(vmf->ptl);
                                            ptl = pmd_lock(mm, pmd);
                                            if (page_count(page) != 2)) {
                                                    /* roll back */
                                            }
                                            /* ... */
                                            mlock_migrate_page(new_page, page);
                                            /* ... */
                                            spin_unlock(ptl);
                                            put_page(page);
                                            put_page(page); // page freed here
            wait_on_page_locked(page);
            goto out;
    }

This can result in the freed page having its waiters flag set
unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
page alloc/free functions.  This has been observed on arm64 KVM guests.

We can avoid this by having do_huge_pmd_numa_page() take a reference on
the page before dropping the pmd lock, mirroring what we do in
__migration_entry_wait().

When we hit the race, migrate_misplaced_transhuge_page() will see the
reference and abort the migration, as it may do today in other cases.

Fixes: b8916634b7 ("mm: Prevent parallel splits during THP migration")
Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:16 +02:00
David Rientjes
74de12dbfa mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
commit 460bcec84e11c75122ace5976214abbc596eb91b upstream.

We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.

Reschedule when needed.

Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:15 +02:00
Greg Kroah-Hartman
77ddb50929 This is the 4.4.74 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllQl/sACgkQONu9yGCS
 aT5zMRAAuDBpWjQ1IFtgmzQnKGyjS3fm5X/EgPmT81PFKXay5/TH6Hc85TvorChk
 mCC7qybadCFPjieBfUeCGhTposiGkbOZdYIzduzLeHPe7Eda88NKJw5ZS3x+RDro
 if6BZNtQPwPk9jQ95zpBu/p6eCuIGFzQObif8XHga9eEVP+TPGDKFn5EdLM8j99t
 ErKYyTLFEiZYa52hpCBbVz/4mX8bJOoAlZaitcbvaFbG0OodA5SL24sKlr7tAPrM
 ajnuqv+ghOUjbXrUlrTGxCjJ7vCJjdBqNzuxVFNj5P1xDucpBW8uuWGob0XWTMbB
 hj/ToAIQXQXrZKFpASWW74B4QZDcjo7dbhDWOurBaAsyLuBzAi26pI+q6TqgCQUO
 k17ilfk9LVEvvFhiQ7xpJPNnkh6tCEk7Jdblru6ZL5fHCAYe+qUDj56TbqjFJCQK
 +bDzPi0QXkEGQNKxo7zDu5iGQ0Gb0zD2Z3MrGD+3pCkM5yG0PXjzZ7lOlboyPzwY
 88dxuuTRmm8yGEEm81BKmDYqAA1l4FCrap8u9FLoNyoZyMnK7B+SHHuPRBRhL3F2
 I3L/v8BbJhXTsDNPXEsXtpZZpn2wxJp4x4gKWmCcOb5MM1nbFrFtwdj0cKobu6Xe
 ygNMEkjlW2uUrZoDXthj1ICda/cEw/R0gMWzBeNNVfErOZEmFxM=
 =zl9i
 -----END PGP SIGNATURE-----

Merge 4.4.74 into android-4.4

Changes in 4.4.74
	configfs: Fix race between create_link and configfs_rmdir
	can: gs_usb: fix memory leak in gs_cmd_reset()
	cpufreq: conservative: Allow down_threshold to take values from 1 to 10
	vb2: Fix an off by one error in 'vb2_plane_vaddr'
	mac80211: don't look at the PM bit of BAR frames
	mac80211/wpa: use constant time memory comparison for MACs
	mac80211: fix CSA in IBSS mode
	mac80211: fix IBSS presp allocation size
	serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
	x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
	mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
	staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
	iio: proximity: as3935: recalibrate RCO after resume
	USB: hub: fix SS max number of ports
	usb: core: fix potential memory leak in error path during hcd creation
	pvrusb2: reduce stack usage pvr2_eeprom_analyze()
	USB: gadget: dummy_hcd: fix hub-descriptor removable fields
	usb: r8a66597-hcd: select a different endpoint on timeout
	usb: r8a66597-hcd: decrease timeout
	drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
	usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
	USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
	mm/memory-failure.c: use compound_head() flags for huge pages
	swap: cond_resched in swap_cgroup_prepare()
	genirq: Release resources in __setup_irq() error path
	alarmtimer: Prevent overflow of relative timers
	usb: dwc3: exynos fix axius clock error path to do cleanup
	MIPS: Fix bnezc/jialc return address calculation
	alarmtimer: Rate limit periodic intervals
	mm: larger stack guard gap, between vmas
	Allow stack to grow up to address space limit
	mm: fix new crash in unmapped_area_topdown()
	Linux 4.4.74

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-27 09:47:59 +02:00
Greg Kroah-Hartman
5672779e72 This is the 4.4.73 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllEst4ACgkQONu9yGCS
 aT5RJQ//UBkwmDInzy8BbfZk7pXY4IXzbXYfZ/nS5QbmPQNcQArKeIsOH+C1ZTKq
 suo4wR4yWUzATr8BBBxbUKyyrAsA85oDq/fl3EyPVLYOHdChnM/sS9H/eFrQhJIN
 7PM9j1YiqyTohkagB2tkWSb8tO1r/wgTSTLmIE+RzsWn6rvMCPMPVY+3OGgC1fuT
 LJrgtXKGBl9zNxmrns3VlSou1MhJvh/y8ESIrhdYHdZKos0zsgQaISXJjhZtyQcV
 OnEzsuz9NMXWE9XGjNpyAm88Nh8T41Ey/vwOjt6mkvWac3r77IgI5NWaLV/QDyqm
 d3jpVWK5BSPcLsmeN4LwQC5aYayvHlh8CfP8ZlBx1xkB5TpclnqXGgQA9BYpXAKw
 XoeFl8n8xLaPrgX8gp3kw/f6C6443OC2JVeRvgnH/0ZM7M0+rZxE6DstRcUHGqf6
 K8PN+AssQpBLIjXSGHnzDVHME/1xWUSmJZfLd5bd6NJ1zSZqZOy1gkf5dx77p0Ka
 UaGVOg6UzOojr3GeUTE62bRO2ZuAno0QO1NQJJkUK1CbNMYmE61vYLM8i0pLKWZJ
 3mDlhcoGK8aJH8chNLU3mgkXECU/9zOVKveWZFoghhMlv8ImgeTuiqZhvztzzT38
 42DxdXPfMzxCwBF02zYu4qn+WDJbNyOqMQrlMEHwwb88wnKiOUg=
 =Ic1J
 -----END PGP SIGNATURE-----

Merge 4.4.73 into android-4.4

Changes in 4.4.73
	s390/vmem: fix identity mapping
	partitions/msdos: FreeBSD UFS2 file systems are not recognized
	ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
	staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
	Call echo service immediately after socket reconnect
	net: xilinx_emaclite: fix freezes due to unordered I/O
	net: xilinx_emaclite: fix receive buffer overflow
	ipv6: Handle IPv4-mapped src to in6addr_any dst.
	ipv6: Inhibit IPv4-mapped src address on the wire.
	NET: Fix /proc/net/arp for AX.25
	NET: mkiss: Fix panic
	net: hns: Fix the device being used for dma mapping during TX
	sierra_net: Skip validating irrelevant fields for IDLE LSIs
	sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
	i2c: piix4: Fix request_region size
	ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
	PM / runtime: Avoid false-positive warnings from might_sleep_if()
	jump label: pass kbuild_cflags when checking for asm goto support
	kasan: respect /proc/sys/kernel/traceoff_on_warning
	log2: make order_base_2() behave correctly on const input value zero
	ethtool: do not vzalloc(0) on registers dump
	fscache: Fix dead object requeue
	fscache: Clear outstanding writes when disabling a cookie
	FS-Cache: Initialise stores_lock in netfs cookie
	ipv6: fix flow labels when the traffic class is non-0
	drm/nouveau: prevent userspace from deleting client object
	drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
	net/mlx4_core: Avoid command timeouts during VF driver device shutdown
	gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
	pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
	net: adaptec: starfire: add checks for dma mapping errors
	parisc, parport_gsc: Fixes for printk continuation lines
	drm/nouveau: Don't enabling polling twice on runtime resume
	drm/ast: Fixed system hanged if disable P2A
	ravb: unmap descriptors when freeing rings
	nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
	r8152: re-schedule napi for tx
	r8152: fix rtl8152_post_reset function
	r8152: avoid start_xmit to schedule napi when napi is disabled
	sctp: sctp_addr_id2transport should verify the addr before looking up assoc
	romfs: use different way to generate fsid for BLOCK or MTD
	proc: add a schedule point in proc_pid_readdir()
	tipc: ignore requests when the connection state is not CONNECTED
	xtensa: don't use linux IRQ #0
	s390/kvm: do not rely on the ILC on kvm host protection fauls
	sparc64: make string buffers large enough
	Linux 4.4.73

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-27 09:47:36 +02:00
Hugh Dickins
1f2284fac2 mm: fix new crash in unmapped_area_topdown()
commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89 upstream.

Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing.  That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown().  Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there.  Fix that, and
the similar case in its alternative, unmapped_area().

Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:11 +02:00
Helge Deller
f41512c6ac Allow stack to grow up to address space limit
commit bd726c90b6b8ce87602208701b208a208e6d5600 upstream.

Fix expand_upwards() on architectures with an upward-growing stack (parisc,
metag and partly IA-64) to allow the stack to reliably grow exactly up to
the address space limit given by TASK_SIZE.

Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:11 +02:00
Hugh Dickins
4b35943067 mm: larger stack guard gap, between vmas
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
[gkh: minor build fixes for 4.4]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:11 +02:00
Yu Zhao
6af90091b6 swap: cond_resched in swap_cgroup_prepare()
commit ef70762948dde012146926720b70e79736336764 upstream.

I saw need_resched() warnings when swapping on large swapfile (TBs)
because continuously allocating many pages in swap_cgroup_prepare() took
too long.

We already cond_resched when freeing page in swap_cgroup_swapoff().  Do
the same for the page allocation.

Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:10 +02:00
James Morse
bfbd244c5f mm/memory-failure.c: use compound_head() flags for huge pages
commit 7258ae5c5a2ce2f5969e8b18b881be40ab55433d upstream.

memory_failure() chooses a recovery action function based on the page
flags.  For huge pages it uses the tail page flags which don't have
anything interesting set, resulting in:

> Memory failure: 0x9be3b4: Unknown page state
> Memory failure: 0x9be3b4: recovery action for unknown page: Failed

Instead, save a copy of the head page's flags if this is a huge page,
this means if there are no relevant flags for this tail page, we use the
head pages flags instead.  This results in the me_huge_page() recovery
action being called:

> Memory failure: 0x9b7969: recovery action for huge page: Delayed

For hugepages that have not yet been allocated, this allows the hugepage
to be dequeued.

Fixes: 524fca1e73 ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:10 +02:00
Peter Zijlstra
55d0f89a1a kasan: respect /proc/sys/kernel/traceoff_on_warning
[ Upstream commit 4f40c6e5627ea73b4e7c615c59631f38cc880885 ]

After much waiting I finally reproduced a KASAN issue, only to find my
trace-buffer empty of useful information because it got spooled out :/

Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning
interface.

Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-17 06:39:36 +02:00