Finding suitable OFF_SLAB candidate is more related to aligned cache
size rather than original size. Same reasoning can be applied to the
debug pagealloc candidate. So, this patch moves up alignment fixup to
proper position. From that point, size is aligned so we can remove some
alignment fixups.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 832a15d209cd260180407bde1af18965b21623f3)
Change-Id: I8338d647da4a6eb6402c6fe4e2402b7db45ea5a5
Signed-off-by: Paul Lawrence <paullawrence@google.com>
debug_pagealloc debugging is related to SLAB_POISON flag rather than
FORCED_DEBUG option, although FORCED_DEBUG option will enable
SLAB_POISON. Fix it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 40323278b557a5909bbecfa181c91a3af7afbbe3)
Change-Id: I4a7dbd40b6a2eb777439f271fa600b201e6db1d3
Signed-off-by: Paul Lawrence <paullawrence@google.com>
cache_init_objs() will be changed in following patch and current form
doesn't fit well for that change. So, before doing it, this patch
separates debugging initialization. This would cause two loop iteration
when debugging is enabled, but, this overhead seems too light than debug
feature itself so effect may not be visible. This patch will greatly
simplify changes in cache_init_objs() in following patch.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 10b2e9e8e808bd30e1f4018a36366d07b0abd12f)
Change-Id: I9904974a674f17fc7d57daf0fe351742db67c006
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Now, we don't use object status buffer in any setup. Remove it.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 249247b6f8ee362189a2f2bf598a14ff6c95fb4c)
Change-Id: I60b1fb030da5d44aff2627f57dffd9dd7436e511
Signed-off-by: Paul Lawrence <paullawrence@google.com>
DEBUG_SLAB_LEAK is a debug option. It's current implementation requires
status buffer so we need more memory to use it. And, it cause
kmem_cache initialization step more complex.
To remove this extra memory usage and to simplify initialization step,
this patch implement this feature with another way.
When user requests to get slab object owner information, it marks that
getting information is started. And then, all free objects in caches
are flushed to corresponding slab page. Now, we can distinguish all
freed object so we can know all allocated objects, too. After
collecting slab object owner information on allocated objects, mark is
checked that there is no free during the processing. If true, we can be
sure that our information is correct so information is returned to user.
Although this way is rather complex, it has two important benefits
mentioned above. So, I think it is worth changing.
There is one drawback that it takes more time to get slab object owner
information but it is just a debug option so it doesn't matter at all.
To help review, this patch implements new way only. Following patch
will remove useless code.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from d31676dfde257cb2b3e52d4e657d8ad2251e4d49)
Change-Id: I204ea0dd5553577d17c93f32f0d5a797ba0304af
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Currently, open code for checking DEBUG_PAGEALLOC cache is spread to
some sites. It makes code unreadable and hard to change.
This patch cleans up this code. The following patch will change the
criteria for DEBUG_PAGEALLOC cache so this clean-up will help it, too.
[akpm@linux-foundation.org: fix build with CONFIG_DEBUG_PAGEALLOC=n]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from 40b44137971c2e5865a78f9f7de274449983ccb5)
Change-Id: I784df4a54b62f77a22f8fa70990387fdf968219f
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from a307ebd468e0b97c203f5a99a56a6017e4d1991a)
Change-Id: I4407dbf0a5a22ab6f9335219bafc6f28023635a0
Signed-off-by: Paul Lawrence <paullawrence@google.com>
Functions which the compiler has instrumented for ASAN place poison on
the stack shadow upon entry and remove this poison prior to returning.
In some cases (e.g. hotplug and idle), CPUs may exit the kernel a
number of levels deep in C code. If there are any instrumented
functions on this critical path, these will leave portions of the idle
thread stack shadow poisoned.
If a CPU returns to the kernel via a different path (e.g. a cold
entry), then depending on stack frame layout subsequent calls to
instrumented functions may use regions of the stack with stale poison,
resulting in (spurious) KASAN splats to the console.
Contemporary GCCs always add stack shadow poisoning when ASAN is
enabled, even when asked to not instrument a function [1], so we can't
simply annotate functions on the critical path to avoid poisoning.
Instead, this series explicitly removes any stale poison before it can
be hit. In the common hotplug case we clear the entire stack shadow in
common code, before a CPU is brought online.
On architectures which perform a cold return as part of cpu idle may
retain an architecture-specific amount of stack contents. To retain the
poison for this retained context, the arch code must call the core KASAN
code, passing a "watermark" stack pointer value beyond which shadow will
be cleared. Architectures which don't perform a cold return as part of
idle do not need any additional code.
This patch (of 3):
Functions which the compiler has instrumented for KASAN place poison on
the stack shadow upon entry and remove this poision prior to returning.
In some cases (e.g. hotplug and idle), CPUs may exit the kernel a number
of levels deep in C code. If there are any instrumented functions on this
critical path, these will leave portions of the stack shadow poisoned.
If a CPU returns to the kernel via a different path (e.g. a cold entry),
then depending on stack frame layout subsequent calls to instrumented
functions may use regions of the stack with stale poison, resulting in
(spurious) KASAN splats to the console.
To avoid this, we must clear stale poison from the stack prior to
instrumented functions being called. This patch adds functions to the
KASAN core for removing poison from (portions of) a task's stack. These
will be used by subsequent patches to avoid problems with hotplug and
idle.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 64145065
(cherry-picked from e3ae116339f9a0c77523abc95e338fa405946e07)
Change-Id: I9be31b714d5bdaec94a2dad3f0e468c094fe5fa2
Signed-off-by: Paul Lawrence <paullawrence@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlomc30ACgkQONu9yGCS
aT5Hkw/9GJCZ3LyqVfd9mkB97/9OTFduQUjLsqFdgQIRZdByHe6fR8cfJ1ydeiJS
fcnBtZzJNMBXurEWourqrSWptDkFANoG0EGResza+3aXXC2SuwmuXG0JgR4KdnpE
1pkqnKJ42rnK1XWuHIp2S0A6S+SW8fA5HrVFpQDLiXUb4NB9xerG2PS6tq94rQN0
TB2IdJzb5ITWQkCzYvuTphl4jz2XnMJskd53c9osG2izBxVXzA650ygjtW6HrqG6
AWvc0BK7p/Rz+ty3PbGbSNUJyDicnknS2/UARMz/BlI5SiyL8aAHVC7h3k9dh2CU
uPVhfebFCSBcZ+iYl4UUuywjiJwDTMk/QwNZFph5Hw5HntB2LAHSG3rrv/8OIoPe
+bvMRonfWB+rtzDiy0UbCCutswJzf6h9Pj5y/PA+GIFnNmeDQ096EOI7BFGO1MVe
s/d/krw4xqhDRb97ZlWWX8UqRplmPSe4pWP47Gr0K4qNAPhp4B2tW6VJcMG3z2oo
0/iV9cZ2CVxpIA2oN7ymWCKSQ764VADTcNSqC9Wd+rDso2XcBPRYUo9pbNKLqryb
hxQR1Hj0do35qkR1L/e6thdygMV2ZyQ4xBq9fVPjDrwi2d9dO1jfnkQkL+QEa73G
AaufuvFdJefssFP91FwGZnhhMi8kWIuJSTFM3Pg9OwNEjgt3nxo=
=eTmV
-----END PGP SIGNATURE-----
Merge 4.4.104 into android-4.4
Changes in 4.4.104
netlink: add a start callback for starting a netlink dump
ipsec: Fix aborted xfrm policy dump crash
x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
x86/efi: Hoist page table switching code into efi_call_virt()
x86/efi: Build our own page table structures
ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
x86/efi-bgrt: Fix kernel panic when mapping BGRT data
x86/efi-bgrt: Replace early_memremap() with memremap()
mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
mm/madvise.c: fix madvise() infinite loop under special circumstances
btrfs: clear space cache inode generation always
KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
KVM: x86: Exit to user-mode on #UD intercept when emulator requires
KVM: x86: inject exceptions produced by x86_decode_insn
mmc: core: Do not leave the block driver in a suspended state
eeprom: at24: check at24_read/write arguments
bcache: Fix building error on MIPS
Revert "drm/radeon: dont switch vt on suspend"
drm/radeon: fix atombios on big endian
drm/panel: simple: Add missing panel_simple_unprepare() calls
mtd: nand: Fix writing mtdoops to nand flash.
NFS: revalidate "." etc correctly on "open".
drm/i915: Don't try indexed reads to alternate slave addresses
drm/i915: Prevent zero length "index" write
nfsd: Make init_open_stateid() a bit more whole
nfsd: Fix stateid races between OPEN and CLOSE
nfsd: Fix another OPEN stateid race
Linux 4.4.104
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 6ea8d958a2c95a1d514015d4e29ba21a8c0a1a91 upstream.
MADVISE_WILLNEED has always been a noop for DAX (formerly XIP) mappings.
Unfortunately madvise_willneed() doesn't communicate this information
properly to the generic madvise syscall implementation. The calling
convention is quite subtle there. madvise_vma() is supposed to either
return an error or update &prev otherwise the main loop will never
advance to the next vma and it will keep looping for ever without a way
to get out of the kernel.
It seems this has been broken since introduction. Nobody has noticed
because nobody seems to be using MADVISE_WILLNEED on these DAX mappings.
[mhocko@suse.com: rewrite changelog]
Link: http://lkml.kernel.org/r/20171127115318.911-1-guoxuenan@huawei.com
Fixes: fe77ba6f4f ("[PATCH] xip: madvice/fadvice: execute in place")
Signed-off-by: chenjie <chenjie6@huawei.com>
Signed-off-by: guoxuenan <guoxuenan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: zhangyi (F) <yi.zhang@huawei.com>
Cc: Miao Xie <miaoxie@huawei.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a8f97366452ed491d13cf1e44241bc0b5740b1f0 upstream.
Currently, we unconditionally make page table dirty in touch_pmd().
It may result in false-positive can_follow_write_pmd().
We may avoid the situation, if we would only make the page table entry
dirty if caller asks for write access -- FOLL_WRITE.
The patch also changes touch_pud() in the same way.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[Salvatore Bonaccorso: backport for 3.16:
- Adjust context
- Drop specific part for PUD-sized transparent hugepages. Support
for PUD-sized transparent hugepages was added in v4.11-rc1
]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Backport of the upstream commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites") is wrong for hwpoison
pages. I have accidentally negated the condition for bailout. This
basically disables hwpoison pages tracking while the code still
might crash on unusual configurations when struct pages do not have
page_ext allocated. The fix is trivial to invert the condition.
Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloXywoACgkQONu9yGCS
aT78dxAAoM0uHsL9r+ivJ4uMj81dwBL3Rd/1Lb/PMV5Yblh/LJ2WOcXriq/JgMLt
+ARoyugEpB8JzAi1Y3bq3Jku2TYcT0o55UmjRgZzQitdX5o8j1g1baNnpRuMz63z
S/g4Msh5aJyoHmwgxWZ+mWKn3SYdNwHy+r0gGwgtvlUO97iXqwM3nqQ/4tHnIv1B
sz0NtJ7cgFvWVaneUkZ4z0ZGTlKfxaQg95enyyCRWM7MJ6Be03+KnhmQZ6GEb8vP
tf9GtXiMEDJdwppDmXjtdjFW5adejBOoCF/grvbQoEdn7XPC47k6/l5Y6A3PYLMj
kqlC8IbMHbiQXvgwezxp6Mv+oc+LuSjSCVikZW2SGMacs5kF92+0MIUvBtfUwvsA
FP7q6jUcT3Or4xiG4xLDQW+RLPetidd+1Ms4jia6jaCajbMjU7ZYaBuAplT4qhIl
koJ9pn1ksna3fUyxnNFJttUN2ulGDzcSBP5EZf3bLWMXkG4daa8Cen7vBkG1VqZE
tspXCbB/mZ/eGv/rH3b7F2BVfP2RY0YqlUZzmfTXIoCwqcmX1zGi/KMfepcZTH3b
LOo8CBmTgSYXYh0/16GAUH3ds3QQt8d0oeaCEtf8BaAZnq5R3M8doZzGzTB6LGjG
Rn1KsUzJPKSqgYis3FTJNU3wmPokvV1ZVXK/ee9zMq5zOtyJyOg=
=XIwd
-----END PGP SIGNATURE-----
Merge 4.4.101 into android-4.4
Changes in 4.4.101
tcp: do not mangle skb->cb[] in tcp_make_synack()
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
bonding: discard lowest hash bit for 802.3ad layer3+4
vlan: fix a use-after-free in vlan_device_event()
af_netlink: ensure that NLMSG_DONE never fails in dumps
sctp: do not peel off an assoc from one netns to another one
fealnx: Fix building error on MIPS
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
serial: omap: Fix EFR write on RTS deassertion
arm64: fix dump_instr when PAN and UAO are in use
nvme: Fix memory order on async queue deletion
ocfs2: should wait dio before inode lock in ocfs2_setattr()
ipmi: fix unsigned long underflow
mm/page_alloc.c: broken deferred calculation
coda: fix 'kernel memory exposure attempt' in fsync
mm: check the return value of lookup_page_ext for all call sites
mm/page_ext.c: check if page_ext is not prepared
mm/pagewalk.c: report holes in hugetlb ranges
Linux 4.4.101
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 373c4557d2aa362702c4c2d41288fb1e54990b7c upstream.
This matters at least for the mincore syscall, which will otherwise copy
uninitialized memory from the page allocator to userspace. It is
probably also a correctness error for /proc/$pid/pagemap, but I haven't
tested that.
Removing the `walk->hugetlb_entry` condition in walk_hugetlb_range() has
no effect because the caller already checks for that.
This only reports holes in hugetlb ranges to callers who have specified
a hugetlb_entry callback.
This issue was found using an AFL-based fuzzer.
v2:
- don't crash on ->pte_hole==NULL (Andrew Morton)
- add Cc stable (Andrew Morton)
Changed for 4.4/4.9 stable backport:
- fix up conflict in the huge_pte_offset() call
Fixes: 1e25a271c8 ("mincore: apply page table walker on do_mincore()")
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e492080e640c2d1235ddf3441cae634cfffef7e1 upstream.
online_page_ext() and page_ext_init() allocate page_ext for each
section, but they do not allocate if the first PFN is !pfn_present(pfn)
or !pfn_valid(pfn). Then section->page_ext remains as NULL.
lookup_page_ext checks NULL only if CONFIG_DEBUG_VM is enabled. For a
valid PFN, __set_page_owner will try to get page_ext through
lookup_page_ext. Without CONFIG_DEBUG_VM lookup_page_ext will misuse
NULL pointer as value 0. This incurrs invalid address access.
This is the panic example when PFN 0x100000 is not valid but PFN
0x13FC00 is being used for page_ext. section->page_ext is NULL,
get_entry returned invalid page_ext address as 0x1DFA000 for a PFN
0x13FC00.
To avoid this panic, CONFIG_DEBUG_VM should be removed so that page_ext
will be checked at all times.
Unable to handle kernel paging request at virtual address 01dfa014
------------[ cut here ]------------
Kernel BUG at ffffff80082371e0 [verbose debug info unavailable]
Internal error: Oops: 96000045 [#1] PREEMPT SMP
Modules linked in:
PC is at __set_page_owner+0x48/0x78
LR is at __set_page_owner+0x44/0x78
__set_page_owner+0x48/0x78
get_page_from_freelist+0x880/0x8e8
__alloc_pages_nodemask+0x14c/0xc48
__do_page_cache_readahead+0xdc/0x264
filemap_fault+0x2ac/0x550
ext4_filemap_fault+0x3c/0x58
__do_fault+0x80/0x120
handle_mm_fault+0x704/0xbb0
do_page_fault+0x2e8/0x394
do_mem_abort+0x88/0x124
Pre-4.7 kernels also need commit f86e4271978b ("mm: check the return
value of lookup_page_ext for all call sites").
Link: http://lkml.kernel.org/r/20171107094131.14621-1-jaewon31.kim@samsung.com
Fixes: eefa864b70 ("mm/page_ext: resurrect struct page extending code for debugging")
Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f86e4271978bd93db466d6a95dad4b0fdcdb04f6 upstream.
Per the discussion with Joonsoo Kim [1], we need check the return value
of lookup_page_ext() for all call sites since it might return NULL in
some cases, although it is unlikely, i.e. memory hotplug.
Tested with ltp with "page_owner=0".
[1] http://lkml.kernel.org/r/20160519002809.GA10245@js1304-P5Q-DELUXE
[akpm@linux-foundation.org: fix build-breaking typos]
[arnd@arndb.de: fix build problems from lookup_page_ext]
Link: http://lkml.kernel.org/r/6285269.2CksypHdYp@wuerfel
[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/1464023768-31025-1-git-send-email-yang.shi@linaro.org
Signed-off-by: Yang Shi <yang.shi@linaro.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.
In reset_deferred_meminit() we determine number of pages that must not
be deferred. We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.
The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes. However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.
The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
/mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
/o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
=gevA
-----END PGP SIGNATURE-----
Merge 4.4.94 into android-4.4
Changes in 4.4.94
percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
drm/dp/mst: save vcpi with payloads
MIPS: Fix minimum alignment requirement of IRQ stack
sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
bpf/verifier: reject BPF_ALU64|BPF_END
udpv6: Fix the checksum computation when HW checksum does not apply
ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
net: emac: Fix napi poll list corruption
packet: hold bind lock when rebinding to fanout hook
bpf: one perf event close won't free bpf program attached by another perf event
isdn/i4l: fetch the ppp_write buffer in one shot
vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
l2tp: Avoid schedule while atomic in exit_net
l2tp: fix race condition in l2tp_tunnel_delete
tun: bail out from tun_get_user() if the skb is empty
packet: in packet_do_bind, test fanout with bind_lock held
packet: only test po->has_vnet_hdr once in packet_snd
net: Set sk_prot_creator when cloning sockets to the right proto
tipc: use only positive error codes in messages
Revert "bsg-lib: don't free job in bsg_prepare_job"
locking/lockdep: Add nest_lock integrity test
watchdog: kempld: fix gcc-4.3 build
irqchip/crossbar: Fix incorrect type of local variables
mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
mac80211: fix power saving clients handling in iwlwifi
net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
iio: adc: xilinx: Fix error handling
Btrfs: send, fix failure to rename top level inode due to name collision
f2fs: do not wait for writeback in write_begin
md/linear: shutup lockdep warnning
sparc64: Migrate hvcons irq to panicked cpu
net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
crypto: xts - Add ECB dependency
ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
slub: do not merge cache if slub_debug contains a never-merge flag
scsi: scsi_dh_emc: return success in clariion_std_inquiry()
net: mvpp2: release reference to txq_cpu[] entry after unmapping
i2c: at91: ensure state is restored after suspending
ceph: clean up unsafe d_parent accesses in build_dentry_path
uapi: fix linux/rds.h userspace compilation errors
uapi: fix linux/mroute6.h userspace compilation errors
target/iscsi: Fix unsolicited data seq_end_offset calculation
nfsd/callback: Cleanup callback cred on shutdown
cpufreq: CPPC: add ACPI_PROCESSOR dependency
Revert "tty: goldfish: Fix a parameter of a call to free_irq"
Linux 4.4.94
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit c6e28895a4372992961888ffaadc9efc643b5bfe ]
In case CONFIG_SLUB_DEBUG_ON=n, find_mergeable() gets debug features from
commandline but never checks if there are features from the
SLAB_NEVER_MERGE set.
As a result selected by slub_debug caches are always mergeable if they
have been created without a custom constructor set or without one of the
SLAB_* debug features on.
This moves the SLAB_NEVER_MERGE check below the flags update from
commandline to make sure it won't merge the slab cache if one of the debug
features is on.
Link: http://lkml.kernel.org/r/20170101124451.GA4740@lp-laptop-d
Signed-off-by: Grygorii Maistrenko <grygoriimkd@gmail.com>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.
Changes include:
commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")
Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnLaLoACgkQONu9yGCS
aT7hDw/+Ipx/xnjIUJFV/aqo8lTh3XqP/TjD5whoi+yYC8axLEZBLiOSLZceVjsG
hi2mP22gKn1i7GLXNeWIZ+rMtVzAN+qNg7i8cjWNfFp1fA7cCfFaYvlV0LVrO2tK
WnvvE8r5kQAKyQG8498ebEjianxwxHVERnNiE5/SDpCNj14DnwCJBTEYM0tEnuXZ
/jBIIs4xvndVa0fFfUjuAzh65AefAT1BmgsPll4GnFMUFHh30smYdFla5LL0GNIq
FQGFvIi8Q02disSMg9lFJVOlazc/HUREiFB1qy1DRtGMnS6/Q0HW0kCxeRi/7QEi
+HN2rLxtbpnuD5P7W4lDJ5/cyCHMIv8SJ8OqUd8uxbTWz31P/QxbM7d35d+w3rq8
dv3sQ6CMRnuIXGL5dFHh7zYqlzNS9PKjLmxzAw9grDf+nVsDxE4KUfJy00DSN1I1
Bopi1kCD2nUMOiBrmxkIczN6OOvcGBHh6/TTB2WEKVHn42D0fjLnO66kJVJLMsBm
vDdKJDDSGM/0HiUa5ydr6R0Ae7My3h5AJZRa5gn0kL/myatX/vsa0B2ZLpHlVipM
GhODBsDFkI4k4yceONDZPJmhhVab1lewTMuIW5D2KRMsgpQqLmlOyL5gykfH0rTx
FVnLSoMAHsgm6qVPwRS5BqK/UnXogfqjiB0iXzNNZnkiABWWoUQ=
=Skkr
-----END PGP SIGNATURE-----
Merge 4.4.89 into android-4.4
Changes in 4.4.89
ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
ipv6: add rcu grace period before freeing fib6_node
ipv6: fix sparse warning on rt6i_node
qlge: avoid memcpy buffer overflow
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
Revert "net: fix percpu memory leaks"
gianfar: Fix Tx flow control deactivation
ipv6: fix memory leak with multiple tables during netns destruction
ipv6: fix typo in fib6_net_exit()
f2fs: check hot_data for roll-forward recovery
x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
md/raid5: release/flush io in raid5_do_work()
nfsd: Fix general protection fault in release_lock_stateid()
mm: prevent double decrease of nr_reserved_highatomic
tty: improve tty_insert_flip_char() fast path
tty: improve tty_insert_flip_char() slow path
tty: fix __tty_insert_flip_char regression
Input: i8042 - add Gigabyte P57 to the keyboard reset table
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
crypto: AF_ALG - remove SGL terminator indicator when chaining
ext4: fix incorrect quotaoff if the quota feature is enabled
ext4: fix quota inconsistency during orphan cleanup for read-only mounts
powerpc: Fix DAR reporting when alignment handler faults
block: Relax a check in blk_start_queue()
md/bitmap: disable bitmap_resize for file-backed bitmaps.
skd: Avoid that module unloading triggers a use-after-free
skd: Submit requests to firmware before triggering the doorbell
scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
scsi: storvsc: fix memory leak on ring buffer busy
scsi: sg: remove 'save_scat_len'
scsi: sg: use standard lists for sg_requests
scsi: sg: off by one in sg_ioctl()
scsi: sg: factor out sg_fill_request_table()
scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
scsi: qla2xxx: Fix an integer overflow in sysfs code
ftrace: Fix selftest goto location on error
tracing: Apply trace_clock changes to instance max buffer
ARC: Re-enable MMU upon Machine Check exception
PCI: shpchp: Enable bridge bus mastering if MSI is enabled
media: v4l2-compat-ioctl32: Fix timespec conversion
media: uvcvideo: Prevent heap overflow when accessing mapped controls
bcache: initialize dirty stripes in flash_dev_run()
bcache: Fix leak of bdev reference
bcache: do not subtract sectors_to_gc for bypassed IO
bcache: correct cache_dirty_target in __update_writeback_rate()
bcache: Correct return value for sysfs attach errors
bcache: fix for gc and write-back race
bcache: fix bch_hprint crash and improve output
ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
Linux 4.4.89
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 4855e4a7f29d6d10b0b9c84e189c770c9a94e91e upstream.
There is race between page freeing and unreserved highatomic.
CPU 0 CPU 1
free_hot_cold_page
mt = get_pfnblock_migratetype
set_pcppage_migratetype(page, mt)
unreserve_highatomic_pageblock
spin_lock_irqsave(&zone->lock)
move_freepages_block
set_pageblock_migratetype(page)
spin_unlock_irqrestore(&zone->lock)
free_pcppages_bulk
__free_one_page(mt) <- mt is stale
By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed. By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock severak
times and then it will make mismatch between nr_reserved_highatomic and
the number of reserved pageblock.
So, this patch verifies whether the pageblock is highatomic or not and
decrease the count only if the pageblock is highatomic.
Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Miles Chen <miles.chen@mediatek.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmfaTcACgkQONu9yGCS
aT65LhAAlTz6ssLhstl1QhFIUZ5cTIMouONCYK2+fwg67J3nf8I4whaj0eXX9RIT
4f6L1JNn3YV6lEfXZLs3EHeiMWft2cal95yQULwhsSwI/1rAzjYkFtj59gPxp2Rc
03ZrU8UnWNZmKpzneiwnd0kkFi+7wBKT5GbERy6Voh1hpAg8HbHQeWdVxFXKCdLD
eSP1+RNsvknZBjcJibxhsArs8E8r5t+dXDzi0HYpiZvctV23VXD+y2UDE1RMEDx5
k5fH30DIxd3T1JU1qHGnJUlfK5jKVho76zoSThwEFm9xqoZat/xrby5gW5sMcWeD
0BMw4F6GYE8BoeViC/+iujR0B8ngU0e+ExH+M6WYoEGPH1BFHyPNqDoKjnyAjyyH
tQEOD/0aRWuxcBVyk34EafNZeou/AeDd0IReAHciCIomN0+3u104+HlxkGH1oXEn
u0O5kVXQPaB/YeXd3jRLSfDmzxojaaihTeJZGFi//1iAj+jJEeYagfeI+flqrtaC
Gcwi55HrNrLbEj9kBFLEnm8RgFyWFsO0oVbfu1bPUGZOmuMi4u1Ptkffi3p/Wrsh
cx9ErKXj6meOgkcmzCWYl1Ygp3rY3bdlbidixJnzEfOTeZ2FyxnMz3BQAxGeOPTD
OhUevEK08oMTb1YDt3i7Sh1BGKpU0AEaEw5i8m36m4rC6KdqJfA=
=JeJw
-----END PGP SIGNATURE-----
Merge 4.4.84 into android-4.4
Changes in 4.4.84
netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister
audit: Fix use after free in audit_remove_watch_rule()
parisc: pci memory bar assignment fails with 64bit kernels on dino/cujo
crypto: x86/sha1 - Fix reads beyond the number of blocks passed
Input: elan_i2c - add ELAN0608 to the ACPI table
Input: elan_i2c - Add antoher Lenovo ACPI ID for upcoming Lenovo NB
ALSA: seq: 2nd attempt at fixing race creating a queue
ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
mm/mempolicy: fix use after free when calling get_mempolicy
mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
xen: fix bio vec merging
x86/asm/64: Clear AC on NMI entries
irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()
irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
Sanitize 'move_pages()' permission checks
pids: make task_tgid_nr_ns() safe
perf/x86: Fix LBR related crashes on Intel Atom
usb: optimize acpi companion search for usb port devices
usb: qmi_wwan: add D-Link DWM-222 device ID
Linux 4.4.84
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 197e7e521384a23b9e585178f3f11c9fa08274b9 upstream.
The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).
That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.
So change the access checks to the more common 'ptrace_may_access()'
model instead.
This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.
Famous last words.
Reported-by: Otto Ebeling <otto.ebeling@iki.fi>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73223e4e2e3867ebf033a5a8eb2e5df0158ccc99 upstream.
I hit a use after free issue when executing trinity and repoduced it
with KASAN enabled. The related call trace is as follows.
BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
Read of size 2 by task syz-executor1/798
INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
__slab_alloc+0x768/0x970
kmem_cache_alloc+0x2e7/0x450
mpol_new.part.2+0x74/0x160
mpol_new+0x66/0x80
SyS_mbind+0x267/0x9f0
system_call_fastpath+0x16/0x1b
INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
__slab_free+0x495/0x8e0
kmem_cache_free+0x2f3/0x4c0
__mpol_put+0x2b/0x40
SyS_mbind+0x383/0x9f0
system_call_fastpath+0x16/0x1b
INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600
Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a ........ZZZZZZZZ
Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk
Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5 kkkkkkk.
Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb ........
Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZ
Memory state around the buggy address:
ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc
!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem. do_get_mempolicy,
however, drops the lock midway while we can still access it later.
Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 8bccd85ffb ("[PATCH] Implement sys_* do_*
layering in the memory policy layer."). but when we have the the
current mempolicy ref count model. The issue was introduced
accordingly.
Fix the issue by removing the premature release.
Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmUrdAACgkQONu9yGCS
aT49MBAAqKJXpRzIBnzLR45QLRU5jfkUhaCtCckBAbyLE+rUH0lE4L37JHYcZ9jr
79gG06QAuWJaTd4Nug7DocmPiqpPWi+PY46yjUQ1j3tllKWdp7b/PJXvYX3zbK+d
vDgn6T1AyAoCBKa2aLU26SAYmfLCT+jhHzbMaRQ4eAcYE8u8w8jrfngVmnunXVme
u6CkAZpPMXXm5jUpxgPguEOm2WMubPYEF2BMJIVuvYypeJGM0EYbOHNEUMK5jkPv
T17+4EzSCeqGaDZElxPW4NuaHMStZW36g9gQOti3o/8/5shNLyJK3vYzWG+06zfH
6CNElSk7Y3Fl6qALLWfd1dkjImtJvWKDVWTC43woFT/96DtXueGxrYJYRF+px9bq
dBWAW86g5Tp2JTM+6VhN0N/Z5ANK48Oi2NrzqJXK7DrmZbS5mxMIZw239QJnEOBh
hSxDbe9pkNJvSmR+yF+qxkz78XOOvBz4zIkGl6M70cRQWnJ0g4tCSyy2hrEooDzZ
sfaokSdClzt3qRoFwSZIGZLpvRp9vSepXNN/nvUTX3dOLcjproVYMZJWiAUqTUyD
/0gwrJTpDP3nZGrHdmeWL/erQDWP1aFiXlsJ0E87ymSt7KYNYFGH2ePv7Ujov/AH
dlmvQFhSW1v7xiuiiQo9gxIo8djHqZ8FLbTCznQcQ8Scm4cMNAM=
=riD5
-----END PGP SIGNATURE-----
Merge 4.4.83 into android-4.4
Changes in 4.4.83
cpuset: fix a deadlock due to incomplete patching of cpusets_enabled()
mm: ratelimit PFNs busy info message
iscsi-target: fix memory leak in iscsit_setup_text_cmd()
iscsi-target: Fix iscsi_np reset hung task during parallel delete
fuse: initialize the flock flag in fuse_file on allocation
nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
USB: serial: option: add D-Link DWM-222 device ID
USB: serial: cp210x: add support for Qivicon USB ZigBee dongle
USB: serial: pl2303: add new ATEN device id
usb: musb: fix tx fifo flush handling again
USB: hcd: Mark secondary HCD as dead if the primary one died
staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read
iio: accel: bmc150: Always restore device to normal mode after suspend-resume
iio: light: tsl2563: use correct event code
uas: Add US_FL_IGNORE_RESIDUE for Initio Corporation INIC-3069
USB: Check for dropped connection before switching to full speed
usb: core: unlink urbs from the tail of the endpoint's urb_list
usb: quirks: Add no-lpm quirk for Moshi USB to Ethernet Adapter
usb:xhci:Add quirk for Certain failing HP keyboard on reset after resume
iio: adc: vf610_adc: Fix VALT selection value for REFSEL bits
pnfs/blocklayout: require 64-bit sector_t
pinctrl: sunxi: add a missing function of A10/A20 pinctrl driver
pinctrl: samsung: Remove bogus irq_[un]mask from resource management
Linux 4.4.83
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 75dddef32514f7aa58930bde6a1263253bc3d4ba upstream.
The RDMA subsystem can generate several thousand of these messages per
second eventually leading to a kernel crash. Ratelimit these messages
to prevent this crash.
Doug said:
"I've been carrying a version of this for several kernel versions. I
don't remember when they started, but we have one (and only one) class
of machines: Dell PE R730xd, that generate these errors. When it
happens, without a rate limit, we get rcu timeouts and kernel oopses.
With the rate limit, we just get a lot of annoying kernel messages but
the machine continues on, recovers, and eventually the memory
operations all succeed"
And:
"> Well... why are all these EBUSY's occurring? It sounds inefficient
> (at least) but if it is expected, normal and unavoidable then
> perhaps we should just remove that message altogether?
I don't have an answer to that question. To be honest, I haven't
looked real hard. We never had this at all, then it started out of the
blue, but only on our Dell 730xd machines (and it hits all of them),
but no other classes or brands of machines. And we have our 730xd
machines loaded up with different brands and models of cards (for
instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an
ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines
meant it wasn't tied to any particular brand/model of RDMA hardware.
To me, it always smelled of a hardware oddity specific to maybe the
CPUs or mainboard chipsets in these machines, so given that I'm not an
mm expert anyway, I never chased it down.
A few other relevant details: it showed up somewhere around 4.8/4.9 or
thereabouts. It never happened before, but the prinkt has been there
since the 3.18 days, so possibly the test to trigger this message was
changed, or something else in the allocator changed such that the
situation started happening on these machines?
And, like I said, it is specific to our 730xd machines (but they are
all identical, so that could mean it's something like their specific
ram configuration is causing the allocator to hit this on these
machine but not on other machines in the cluster, I don't want to say
it's necessarily the model of chipset or CPU, there are other bits of
identicalness between these machines)"
Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Reviewed-by: Doug Ledford <dledford@redhat.com>
Tested-by: Doug Ledford <dledford@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmPuZcACgkQONu9yGCS
aT5o0BAAlT21EbhyxoMPC6xrPHAF1Oi8mTVfpu+618AUs3B1M6xge/EKI08B/8DP
MZgaqSqY5ttaIlDKX5OVhY+HiuMg3SbIaFDzhS+OzjpuIjSA9ljNHazp5/l2HsQu
9zyPFN02L2zqWYppyDo6FQBfStB5rUHB4eVMgD6zuNU/YQovtibGqAY4LBfWvxf/
eDO6VfjiS4zzcCoplZxcxim1YVZ+HX09BuwniJzukM4C4/uMDubwMlJmrN9YsQZW
x5zWnLHce2MATk9yF4BzMI/iRDR+Bm6Vx3m1Vzq9WDu7/kkMTVdYjXHmZn02YQub
q4q1nDZyzBf8SvE4kf+fMYS8+dUrwiKf0lahBTK31J5Bc33lRfBfvv+dr/aEnp/Q
FhraSkcBDrulnxuq77WZbvWzj0otF1pCTtURyCSfdc4SOFwVIz2NLQ2ZnnXk4gnN
h5TqjxSDwr2CwTMzOnaKjBcuWnKPvn3/Pjm+/MJS8wvQYPZv8a4AzMIwxjDEN78Z
+FvtaWEoUCnlP869hyR7gTfk2541+qjMdDRRUPSQ16PvepKy1AG9iCqVvZThScyQ
PygaiBYZ9pbcyFuExLQrj2FDY2odinPfN8IsCQQbk5Es5mCdzJZOOkLeO2PO0MxD
Dya79igFnpNj7ZEu6T7lD6Izg/6fYWu7qKmpDKQ7/xn4hHxj/Ig=
=D9TG
-----END PGP SIGNATURE-----
Merge 4.4.82 into android-4.4
Changes in 4.4.82
tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
net: fix keepalive code vs TCP_FASTOPEN_CONNECT
bpf, s390: fix jit branch offset related to ldimm64
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
tcp: fastopen: tcp_connect() must refresh the route
net: avoid skb_warn_bad_offload false positives on UFO
packet: fix tp_reserve race in packet_set_ring
revert "net: account for current skb length when deciding about UFO"
revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
udp: consistently apply ufo or fragmentation
sparc64: Prevent perf from running during super critical sections
KVM: arm/arm64: Handle hva aging while destroying the vm
mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
net: account for current skb length when deciding about UFO
Linux 4.4.82
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 7640131032db9118a78af715ac77ba2debeeb17c upstream.
When removing an element from the mempool, mark it as unpoisoned in KASAN
before verifying its contents for SLUB/SLAB debugging. Otherwise KASAN
will flag the reads checking the element use-after-free writes as
use-after-free reads.
Signed-off-by: Matthew Dawson <matthew@mjdsystems.ca>
Acked-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrii Bordunov <aborduno@cisco.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmN2eMACgkQONu9yGCS
aT7zCQ//eDgCF9YJnE1v8/JJ0yl2uK7XjVrF/tpPvzgTgszu4En4kGfhUO+WvmkU
0/pqYBMAPZEbfmx+6q8FJx/MHDjFA1oKb+a9pS1RUovzWDLQoRxYwiBtR2osmuOE
f1fbDMt9ETDUxUGLhRJ/vuzeIjmouhPkz5vZAg863+sKYYjPHlczymcgMs0sRMsE
3kkgo6mhCKTLt8gvioSUjeVWs4a5y3unvImhSLjEHjcfydlDLwA8RuFdFwBIgNfP
yPrgW3v5l9HHXI1lWMcOCTpVeDI272sKNOppYg4r2N/I/epBN79j7jGrqGQpG8NP
mKOkgRDoR7ifyKLSS55R8anLyNoi4jfQAHbOxlSVGymwpd9kRuHoeTE5+IqYs+V5
qLkqLz63hmbfRQuW6az6L+SGVwgj3DSHakGQFkB0ouB8h5ubU2OqINxOsaNABbHD
C1Q9giqG8b2MEv5D4O4m7BhK1tDzSJmT2tb9UG+UV8LJn1PhFSnSMkjP4S7trZl+
+8myxdoNVvDMpd23UqM7o1fuYalbslTKED9el31FimOaNF79+tzyjnNbWA6zqX+X
U3I+Pp2FafOS2heTLTX59fz09LKRI+iP3pnlCBpp1a+MKAIEbjeW8YB5zTKrSNOv
RkZ+1qIQtmGyhVp/YDsua5J1lhZVXeLeoEqDXYerELOdGKF30jw=
=pHqB
-----END PGP SIGNATURE-----
Merge 4.4.81 into android-4.4
Changes in 4.4.81
libata: array underflow in ata_find_dev()
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
ALSA: hda - Fix speaker output from VAIO VPCL14M1R
ASoC: do not close shared backend dailink
KVM: async_pf: make rcu irq exit if not triggered from idle task
mm/page_alloc: Remove kernel address exposure in free_reserved_area()
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ARM: dts: armada-38x: Fix irq type for pca955
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
target: Avoid mappedlun symlink creation during lun shutdown
iscsi-target: Always wait for kthread_should_stop() before kthread exit
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
iscsi-target: Fix initial login PDU asynchronous socket close OOPs
iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
f2fs: sanity check checkpoint segno and blkoff
drm: rcar-du: fix backport bug
saa7164: fix double fetch PCIe access condition
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
net: Zero terminate ifr_name in dev_ifname().
ipv6: avoid overflow of offset in ip6_find_1stfragopt
ipv4: initialize fib_trie prior to register_netdev_notifier call.
rtnetlink: allocate more memory for dev_set_mac_address()
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
openvswitch: fix potential out of bound access in parse_ct
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
net: ethernet: nb8800: Handle all 4 RGMII modes identically
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
dccp: fix a memleak for dccp_feat_init err process
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
net/mlx5: Fix command bad flow on command entry allocation failure
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
net: phy: Fix PHY unbind crash
xen-netback: correctly schedule rate-limited queues
sparc64: Measure receiver forward progress to avoid send mondo timeout
wext: handle NULL extra data in iwe_stream_add_point better
sh_eth: R8A7740 supports packet shecksumming
net: phy: dp83867: fix irq generation
tg3: Fix race condition in tg3_get_stats64().
x86/boot: Add missing declaration of string functions
phy state machine: failsafe leave invalid RUNNING state
scsi: qla2xxx: Get mutex lock before checking optrom_state
drm/virtio: fix framebuffer sparse warning
virtio_blk: fix panic in initialization error path
ARM: 8632/1: ftrace: fix syscall name matching
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
lib/Kconfig.debug: fix frv build failure
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
mm: don't dereference struct page fields of invalid pages
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
net: account for current skb length when deciding about UFO
workqueue: implicit ordered attribute should be overridable
Linux 4.4.81
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f073bdc51771f5a5c7a8d1191bfc3ae371d44de7 ]
The VM_BUG_ON() check in move_freepages() checks whether the node id of
a page matches the node id of its zone. However, it does this before
having checked whether the struct page pointer refers to a valid struct
page to begin with. This is guaranteed in most cases, but may not be
the case if CONFIG_HOLES_IN_ZONE=y.
So reorder the VM_BUG_ON() with the pfn_valid_within() check.
Link: http://lkml.kernel.org/r/1481706707-6211-2-git-send-email-ard.biesheuvel@linaro.org
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Hanjun Guo <hanjun.guo@linaro.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: Robert Richter <rrichter@cavium.com>
Cc: James Morse <james.morse@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
Stable note for 4.4: The upstream patch patches madvise(MADV_FREE) but 4.4
does not have support for that feature. The changelog is left
as-is but the hunk related to madvise is omitted from the backport.
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.
For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.
Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.
Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.
[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit adb1fe9ae2ee6ef6bc10f3d5a588020e7664dfa7 upstream.
Linus suggested we try to remove some of the low-hanging fruit related
to kernel address exposure in dmesg. The only leaks I see on my local
system are:
Freeing SMP alternatives memory: 32K (ffffffff9e309000 - ffffffff9e311000)
Freeing initrd memory: 10588K (ffffa0b736b42000 - ffffa0b737599000)
Freeing unused kernel memory: 3592K (ffffffff9df87000 - ffffffff9e309000)
Freeing unused kernel memory: 1352K (ffffa0b7288ae000 - ffffa0b728a00000)
Freeing unused kernel memory: 632K (ffffa0b728d62000 - ffffa0b728e00000)
Linus says:
"I suspect we should just remove [the addresses in the 'Freeing'
messages]. I'm sure they are useful in theory, but I suspect they
were more useful back when the whole "free init memory" was
originally done.
These days, if we have a use-after-free, I suspect the init-mem
situation is the easiest situation by far. Compared to all the dynamic
allocations which are much more likely to show it anyway. So having
debug output for that case is likely not all that productive."
With this patch the freeing messages now look like this:
Freeing SMP alternatives memory: 32K
Freeing initrd memory: 10588K
Freeing unused kernel memory: 3592K
Freeing unused kernel memory: 1352K
Freeing unused kernel memory: 632K
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6836ff90c45b71d38e5d4405aec56fa9e5d1d4b2.1477405374.git.jpoimboe@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllxlOsACgkQONu9yGCS
aT4naw//VrWIuIf523IP6egcS+iprNM1lt8HZIzTQ0b2x0qv82UFA9FSs3e97dbG
LJAUEmHW+oBbm6AekAUrNFF62qaMM5HYxVgGYeiniA1dRvLCDG/OxzxPc0hv6Kmm
VsbZBlPQEj2B5tqUOkpvQNcqbKpIVglBsK4tjeHE/mRziUkIoPXWKS6Tt7VKGtBa
gIGY7VASyKhPtxGCdbKzHsO/IGi3cmpwAyhqQDRBR/5Dxy3p9xlQ4gTpobeL+x8z
A7WHqVNbgfdz21LBii5xE8+GUwiRjlYhxeFJTjhM6wo2/XCtw1FJgc7EMmsbVtbZ
xYu1/tkYaZGoxOQ7sH5jNgVjE8IcyVimDa5eJ7p7fbh3AzsyXzAJIRc8cS7cL40F
jLDWrYDnm5t7ziITrAf0uoMmRZLHqS2Bv5sqaoCxR3D51r6LVaNdavD6hYN1CLRA
fjlDvnvwxbLPI4YPr7PZNu4oSJiawKx9jEBOTFSYu9L1XLvdvdcVU6ULAdpShnLn
80a+YJmYsNg54im7sxFUw6z87AScznzirIpXEJoUO+Hs0SN9Rq/BQx8cKvVeaMEI
z53c4Hci45Go/ozZVqCTxGbkQ6tKWIKBSo6kl3xchwms4lmijpWuwbAkDUzECNvv
0RgyQvNx4d2cRJ8hIVcdVFzp+h+kgNqVQQ2nDld7/1QSZHLPhFo=
=vrgw
-----END PGP SIGNATURE-----
Merge 4.4.78 into android-4.4
Changes in 4.4.78
net_sched: fix error recovery at qdisc creation
net: sched: Fix one possible panic when no destroy callback
net/phy: micrel: configure intterupts after autoneg workaround
ipv6: avoid unregistering inet6_dev for loopback
net: dp83640: Avoid NULL pointer dereference.
tcp: reset sk_rx_dst in tcp_disconnect()
net: prevent sign extension in dev_get_stats()
bpf: prevent leaking pointer via xadd on unpriviledged
net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
ipv6: dad: don't remove dynamic addresses if link is down
net: ipv6: Compare lwstate in detecting duplicate nexthops
vrf: fix bug_on triggered by rx when destroying a vrf
rds: tcp: use sock_create_lite() to create the accept socket
brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
cfg80211: Check if PMKID attribute is of expected size
irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
parisc: Report SIGSEGV instead of SIGBUS when running out of stack
parisc: use compat_sys_keyctl()
parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
parisc/mm: Ensure IRQs are off in switch_mm()
tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
kernel/extable.c: mark core_kernel_text notrace
mm/list_lru.c: fix list_lru_count_node() to be race free
fs/dcache.c: fix spin lockup issue on nlru->lock
checkpatch: silence perl 5.26.0 unescaped left brace warnings
binfmt_elf: use ELF_ET_DYN_BASE only for PIE
arm: move ELF_ET_DYN_BASE to 4MB
arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
s390: reduce ELF_ET_DYN_BASE
exec: Limit arg stack to at most 75% of _STK_LIM
vt: fix unchecked __put_user() in tioclinux ioctls
mnt: In umount propagation reparent in a separate pass
mnt: In propgate_umount handle visiting mounts in any order
mnt: Make propagate_umount less slow for overlapping mount propagation trees
selftests/capabilities: Fix the test_execve test
tpm: Get rid of chip->pdev
tpm: Provide strong locking for device removal
Add "shutdown" to "struct class".
tpm: Issue a TPM2_Shutdown for TPM2 devices.
mm: fix overflow check in expand_upwards()
crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
crypto: atmel - only treat EBUSY as transient if backlog
crypto: sha1-ssse3 - Disable avx2
crypto: caam - fix signals handling
sched/topology: Fix overlapping sched_group_mask
sched/topology: Optimize build_group_mask()
PM / wakeirq: Convert to SRCU
PM / QoS: return -EINVAL for bogus strings
tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
KVM: x86: disable MPX if host did not enable MPX XSAVE features
kvm: vmx: Do not disable intercepts for BNDCFGS
kvm: x86: Guest BNDCFGS requires guest MPX support
kvm: vmx: Check value written to IA32_BNDCFGS
kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
Linux 4.4.78
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 37511fb5c91db93d8bd6e3f52f86e5a7ff7cfcdf upstream.
Jörn Engel noticed that the expand_upwards() function might not return
-ENOMEM in case the requested address is (unsigned long)-PAGE_SIZE and
if the architecture didn't defined TASK_SIZE as multiple of PAGE_SIZE.
Affected architectures are arm, frv, m68k, blackfin, h8300 and xtensa
which all define TASK_SIZE as 0xffffffff, but since none of those have
an upwards-growing stack we currently have no actual issue.
Nevertheless let's fix this just in case any of the architectures with
an upward-growing stack (currently parisc, metag and partly ia64) define
TASK_SIZE similar.
Link: http://lkml.kernel.org/r/20170702192452.GA11868@p100.box
Fixes: bd726c90b6b8 ("Allow stack to grow up to address space limit")
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Jörn Engel <joern@purestorage.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2c80cd57c74339889a8752b20862a16c28929c3a upstream.
list_lru_count_node() iterates over all memcgs to get the total number of
entries on the node but it can race with memcg_drain_all_list_lrus(),
which migrates the entries from a dead cgroup to another. This can return
incorrect number of entries from list_lru_count_node().
Fix this by keeping track of entries per node and simply return it in
list_lru_count_node().
Link: http://lkml.kernel.org/r/1498707555-30525-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp5y8ACgkQONu9yGCS
aT6g8hAApzYi9TwiaF6wyYXsrp7YvOm4NyMaVBl4t7v/nFql6VsUL+qWaJKB5EL9
o72ybYPUzbxGTVWCm/wiBO31VWea0ak0pBbyywBiowGgwAcgG/jpqZobale4Y2TE
15jEpmpA5+3BmXpMkrv/dz4LHZ4jm65/ADhMbkPGRZqUJ3mHmyVoi50l67dpTE5+
xWQIErycwlVMppJGnXPeFFgeD7Etch7OJ9CishQRNMb3F8H58WiQrMWWe1NfL0DO
H2g18IBHMsxEYJqnRqxviTOMe8S96Km+lKGX0LOTRYt+2OQpfIF7buU6N+6C96rO
7V2n2G02m2mOFVUFlDYF1RQ9IBrxHJf9iGkaZBwsaxX7XAK63ZjRxgjnEL7gMPU/
TMCOWZ53BdZezz2eAmdhySsV+4Xt6MmJJE8rR47AgsM2Le3tgK421zmraunmA0fR
eoJS99YHcftAHXCD3puGLafEwGVe0G4eQbY4L7mj1Y9VjaAbmmsWq9rlNOQMZRgH
JTNyYik1C7yGPJX1iKi9hLAKldzBwPuM3GfZMZQIOjA4t2VtSon7in5iKrihRg3N
BSKXr6+orNw32tsqcC4kpLPbFUFb6zx3EKELwSJwD9ICN7swJEk7gFw7w/F/SOxI
C1W4Ulm6EcYTWHDePERQ4zHlllHAmyJup61d9HnwA6HhPOLaff4=
=oeNk
-----END PGP SIGNATURE-----
Merge 4.4.77 into android-4.4
Changes in 4.4.77
fs: add a VALID_OPEN_FLAGS
fs: completely ignore unknown open flags
driver core: platform: fix race condition with driver_override
bgmac: reset & enable Ethernet core before using it
mm: fix classzone_idx underflow in shrink_zones()
tracing/kprobes: Allow to create probe with a module name starting with a digit
drm/virtio: don't leak bo on drm_gem_object_init failure
usb: dwc3: replace %p with %pK
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
Add USB quirk for HVR-950q to avoid intermittent device resets
usb: usbip: set buffer pointers to NULL after free
usb: Fix typo in the definition of Endpoint[out]Request
mac80211_hwsim: Replace bogus hrtimer clockid
sysctl: don't print negative flag for proc_douintvec
sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
pinctrl: meson: meson8b: fix the NAND DQS pins
pinctrl: sunxi: Fix SPDIF function name for A83T
pinctrl: mxs: atomically switch mux and drive strength config
pinctrl: sh-pfc: Update info pointer after SoC-specific init
USB: serial: option: add two Longcheer device ids
USB: serial: qcserial: new Sierra Wireless EM7305 device ID
gfs2: Fix glock rhashtable rcu bug
x86/tools: Fix gcc-7 warning in relocs.c
x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
ath10k: override CE5 config for QCA9377
KEYS: Fix an error code in request_master_key()
RDMA/uverbs: Check port number supplied by user verbs cmds
mqueue: fix a use-after-free in sys_mq_notify()
tools include: Add a __fallthrough statement
tools string: Use __fallthrough in perf_atoll()
tools strfilter: Use __fallthrough
perf top: Use __fallthrough
perf intel-pt: Use __fallthrough
perf thread_map: Correctly size buffer used with dirent->dt_name
perf scripting perl: Fix compile error with some perl5 versions
perf tests: Avoid possible truncation with dirent->d_name + snprintf
perf bench numa: Avoid possible truncation when using snprintf()
perf tools: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf script: Use readdir() instead of deprecated readdir_r()
perf tools: Remove duplicate const qualifier
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf pmu: Fix misleadingly indented assignment (whitespace)
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf trace: Do not process PERF_RECORD_LOST twice
perf tests: Remove wrong semicolon in while loop in CQM test
perf tools: Use readdir() instead of deprecated readdir_r() again
md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
md: fix super_offset endianness in super_1_rdev_size_change
tcp: fix tcp_mark_head_lost to check skb len before fragmenting
staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
staging: comedi: fix clean-up of comedi_class in comedi_init()
ext4: check return value of kstrtoull correctly in reserved_clusters_store
x86/mm/pat: Don't report PAT on CPUs that don't support it
saa7134: fix warm Medion 7134 EEPROM read
Linux 4.4.77
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[Not upstream as that would take 34+ patches]
We've got reported a BUG in do_try_to_free_pages():
BUG: unable to handle kernel paging request at ffff8ffffff28990
IP: [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
PGD 0
Oops: 0000 [#1] SMP
megaraid_sas sg scsi_mod efivarfs autofs4
Supported: No, Unsupported modules are loaded
Workqueue: kacpi_hotplug acpi_hotplug_work_fn
task: ffff88ffd0d4c540 ti: ffff88ffd0e48000 task.ti: ffff88ffd0e48000
RIP: 0010:[<ffffffff8119abe0>] [<ffffffff8119abe0>] do_try_to_free_pages+0x140/0x490
RSP: 0018:ffff88ffd0e4ba60 EFLAGS: 00010206
RAX: 000006fffffff900 RBX: 00000000ffffffff RCX: ffff88fffff29000
RDX: 000000ffffffff00 RSI: 0000000000000003 RDI: 00000000024200c8
RBP: 0000000001320122 R08: 0000000000000000 R09: ffff88ffd0e4bbac
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88ffd0e4bae0
R13: 0000000000000e00 R14: ffff88fffff2a500 R15: ffff88fffff2b300
FS: 0000000000000000(0000) GS:ffff88ffe6440000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffff8ffffff28990 CR3: 0000000001c0a000 CR4: 00000000003406e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
00000002db570a80 024200c80000001e ffff88fffff2b300 0000000000000000
ffff88fffffd5700 ffff88ffd0d4c540 ffff88ffd0d4c540 ffffffff0000000c
0000000000000000 0000000000000040 00000000024200c8 ffff88ffd0e4bae0
Call Trace:
[<ffffffff8119afea>] try_to_free_pages+0xba/0x170
[<ffffffff8118cf2f>] __alloc_pages_nodemask+0x53f/0xb20
[<ffffffff811d39ff>] alloc_pages_current+0x7f/0x100
[<ffffffff811e2232>] migrate_pages+0x202/0x710
[<ffffffff815dadaa>] __offline_pages.constprop.23+0x4ba/0x790
[<ffffffff81463263>] memory_subsys_offline+0x43/0x70
[<ffffffff8144cbed>] device_offline+0x7d/0xa0
[<ffffffff81392fa2>] acpi_bus_offline+0xa5/0xef
[<ffffffff81394a77>] acpi_device_hotplug+0x21b/0x41f
[<ffffffff8138dab7>] acpi_hotplug_work_fn+0x1a/0x23
[<ffffffff81093cee>] process_one_work+0x14e/0x410
[<ffffffff81094546>] worker_thread+0x116/0x490
[<ffffffff810999ed>] kthread+0xbd/0xe0
[<ffffffff815e4e7f>] ret_from_fork+0x3f/0x70
This translates to the loop in shrink_zone():
classzone_idx = requested_highidx;
while (!populated_zone(zone->zone_pgdat->node_zones +
classzone_idx))
classzone_idx--;
where no zone is populated, so classzone_idx becomes -1 (in RBX).
Added debugging output reveals that we enter the function with
sc->gfp_mask == GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE
requested_highidx = gfp_zone(sc->gfp_mask) == 2 (ZONE_NORMAL)
Inside the for loop, however:
gfp_zone(sc->gfp_mask) == 3 (ZONE_MOVABLE)
This means we have gone through this branch:
if (buffer_heads_over_limit)
sc->gfp_mask |= __GFP_HIGHMEM;
This changes the gfp_zone() result, but requested_highidx remains unchanged.
On nodes where the only populated zone is movable, the inner while loop will
check only lower zones, which are not populated, and underflow classzone_idx.
To sum up, the bug occurs in configurations with ZONE_MOVABLE (such as when
booted with the movable_node parameter) and only in situations when
buffer_heads_over_limit is true, and there's an allocation with __GFP_MOVABLE
and without __GFP_HIGHMEM performing direct reclaim.
This patch makes sure that classzone_idx starts with the correct zone.
Mainline has been affected in versions 4.6 and 4.7, but the culprit commit has
been also included in stable trees.
In mainline, this has been fixed accidentally as part of 34-patch series (plus
follow-up fixes) "Move LRU page reclaim from zones to nodes", which makes the
mainline commit unsuitable for stable backport, unfortunately.
Fixes: 7bf52fb891b6 ("mm: vmscan: reclaim highmem zone if buffer_heads is over limit")
Obsoleted-by: b2e18757f2c9 ("mm, vmscan: begin reclaiming pages on a per-node basis")
Debugged-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllc3f0ACgkQONu9yGCS
aT4fmA/+OHeYbhpaMRKqrUpsxB3NpROr2Z47ow6vaVjYZzd0irrODLlfIfDQ6EEo
N3v28povu16VeYXk+4h8bsAP2K2j6/BlRaSi2hB6dmnY8GDMaXEfRojPYAlzVz50
qnK/6152siDDarUx1h5Zc8GcmX/tEl6h3bOOxDcwLR+RvyIcWxenuR+uqRM/AV6o
BPEiOuMu7P6LjID7KYgBTFNajVBMLrDXt4SCWdzOZmlNt0QXgKB9yw68vTcc+edC
ZcXqa0M6nEWSDvwobbwBZhFL8H2dJjzweyjeFBgxnxgmOrRh6kvZG2wsz2c8O3/P
g8TuMxU7siu+I3lFwKy+dgZ/1REz+6Q3oFBqXsuddrcPYu23rV6mz/GxqWy4cerb
M4eTWz6L9vA2GoYpvBaWi0tKC9tkNM49g48Y24a6CW1O4dJWlz3RrpTiZmequbNF
mo8EKomSXn4kYAm1xT03DGljQkK/i2JtyI5sk2hLEqqxKvZ/3q9xxLLKOVx8dPvs
PIbfpapfYMXXMWgR6e+UKueNLgevfWE12X/OU4SgvSY4n/07/mH40XEd3zd82IsZ
1Mw0qj3JnqCAFDBBMsDYa+OvABaGD1dHARuiv+aeqW8tqoBglFHxWqF+SQVNXLIE
qTLiKz78vjQpH0zGpkA3HEOh/h4L7a0y3qRMECsk5SUxXsgu1gg=
=bwNU
-----END PGP SIGNATURE-----
Merge 4.4.76 into android-4.4
Changes in 4.4.76
ipv6: release dst on error in ip6_dst_lookup_tail
net: don't call strlen on non-terminated string in dev_set_alias()
decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
net: Zero ifla_vf_info in rtnl_fill_vfinfo()
af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
Fix an intermittent pr_emerg warning about lo becoming free.
net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
igmp: acquire pmc lock for ip_mc_clear_src()
igmp: add a missing spin_lock_init()
ipv6: fix calling in6_ifa_hold incorrectly for dad work
net/mlx5: Wait for FW readiness before initializing command interface
decnet: always not take dst->__refcnt when inserting dst into hash table
net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
sfc: provide dummy definitions of vswitch functions
ipv6: Do not leak throw route references
rtnetlink: add IFLA_GROUP to ifla_policy
netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
netfilter: synproxy: fix conntrackd interaction
NFSv4: fix a reference leak caused WARNING messages
drm/ast: Handle configuration without P2A bridge
mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
MIPS: Avoid accidental raw backtrace
MIPS: pm-cps: Drop manual cache-line alignment of ready_count
MIPS: Fix IRQ tracing & lockdep when rescheduling
ALSA: hda - Fix endless loop of codec configure
ALSA: hda - set input_path bitmap to zero after moving it to new place
drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
usb: gadget: f_fs: Fix possibe deadlock
sysctl: enable strict writes
block: fix module reference leak on put_disk() call for cgroups throttle
mm: numa: avoid waiting on freed migrated pages
KVM: x86: fix fixing of hypercalls
scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
scsi: lpfc: Set elsiocb contexts to NULL after freeing it
qla2xxx: Fix erroneous invalid handle message
ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
net: mvneta: Fix for_each_present_cpu usage
MIPS: ath79: fix regression in PCI window initialization
net: korina: Fix NAPI versus resources freeing
MIPS: ralink: MT7688 pinmux fixes
MIPS: ralink: fix USB frequency scaling
MIPS: ralink: Fix invalid assignment of SoC type
MIPS: ralink: fix MT7628 pinmux typos
MIPS: ralink: fix MT7628 wled_an pinmux gpio
mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
bgmac: fix a missing check for build_skb
mtd: bcm47xxpart: don't fail because of bit-flips
bgmac: Fix reversed test of build_skb() return value.
net: bgmac: Fix SOF bit checking
net: bgmac: Start transmit queue in bgmac_open
net: bgmac: Remove superflous netif_carrier_on()
powerpc/eeh: Enable IO path on permanent error
gianfar: Do not reuse pages from emergency reserve
Btrfs: fix truncate down when no_holes feature is enabled
virtio_console: fix a crash in config_work_handler
swiotlb-xen: update dev_addr after swapping pages
xen-netfront: Fix Rx stall during network stress and OOM
scsi: virtio_scsi: Reject commands when virtqueue is broken
platform/x86: ideapad-laptop: handle ACPI event 1
amd-xgbe: Check xgbe_init() return code
net: dsa: Check return value of phy_connect_direct()
drm/amdgpu: check ring being ready before using
vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
virtio_net: fix PAGE_SIZE > 64k
vxlan: do not age static remote mac entries
ibmveth: Add a proper check for the availability of the checksum features
kernel/panic.c: add missing \n
HID: i2c-hid: Add sleep between POWER ON and RESET
scsi: lpfc: avoid double free of resource identifiers
spi: davinci: use dma_mapping_error()
mac80211: initialize SMPS field in HT capabilities
x86/mpx: Use compatible types in comparison to fix sparse error
coredump: Ensure proper size of sparse core files
swiotlb: ensure that page-sized mappings are page-aligned
s390/ctl_reg: make __ctl_load a full memory barrier
be2net: fix status check in be_cmd_pmac_add()
perf probe: Fix to show correct locations for events on modules
net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
sctp: check af before verify address in sctp_addr_id2transport
ravb: Fix use-after-free on `ifconfig eth0 down`
jump label: fix passing kbuild_cflags when checking for asm goto support
xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
xfrm: NULL dereference on allocation failure
xfrm: Oops on error in pfkey_msg2xfrm_state()
watchdog: bcm281xx: Fix use of uninitialized spinlock.
sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
ARM: 8685/1: ensure memblock-limit is pmd-aligned
x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
x86/mm: Fix flush_tlb_page() on Xen
ocfs2: o2hb: revert hb threshold to keep compatible
iommu/vt-d: Don't over-free page table directories
iommu: Handle default domain attach failure
iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
cpufreq: s3c2416: double free on driver init error path
KVM: x86: fix emulation of RSM and IRET instructions
KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
KVM: x86: zero base3 of unusable segments
KVM: nVMX: Fix exception injection
Linux 4.4.76
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3c226c637b69104f6b9f1c6ec5b08d7b741b3229 upstream.
In do_huge_pmd_numa_page(), we attempt to handle a migrating thp pmd by
waiting until the pmd is unlocked before we return and retry. However,
we can race with migrate_misplaced_transhuge_page():
// do_huge_pmd_numa_page // migrate_misplaced_transhuge_page()
// Holds 0 refs on page // Holds 2 refs on page
vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
/* ... */
if (pmd_trans_migrating(*vmf->pmd)) {
page = pmd_page(*vmf->pmd);
spin_unlock(vmf->ptl);
ptl = pmd_lock(mm, pmd);
if (page_count(page) != 2)) {
/* roll back */
}
/* ... */
mlock_migrate_page(new_page, page);
/* ... */
spin_unlock(ptl);
put_page(page);
put_page(page); // page freed here
wait_on_page_locked(page);
goto out;
}
This can result in the freed page having its waiters flag set
unexpectedly, which trips the PAGE_FLAGS_CHECK_AT_PREP checks in the
page alloc/free functions. This has been observed on arm64 KVM guests.
We can avoid this by having do_huge_pmd_numa_page() take a reference on
the page before dropping the pmd lock, mirroring what we do in
__migration_entry_wait().
When we hit the race, migrate_misplaced_transhuge_page() will see the
reference and abort the migration, as it may do today in other cases.
Fixes: b8916634b7 ("mm: Prevent parallel splits during THP migration")
Link: http://lkml.kernel.org/r/1497349722-6731-2-git-send-email-will.deacon@arm.com
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Steve Capper <steve.capper@arm.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 460bcec84e11c75122ace5976214abbc596eb91b upstream.
We got need_resched() warnings in swap_cgroup_swapoff() because
swap_cgroup_ctrl[type].length is particularly large.
Reschedule when needed.
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1704061315270.80559@chino.kir.corp.google.com
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllQl/sACgkQONu9yGCS
aT5zMRAAuDBpWjQ1IFtgmzQnKGyjS3fm5X/EgPmT81PFKXay5/TH6Hc85TvorChk
mCC7qybadCFPjieBfUeCGhTposiGkbOZdYIzduzLeHPe7Eda88NKJw5ZS3x+RDro
if6BZNtQPwPk9jQ95zpBu/p6eCuIGFzQObif8XHga9eEVP+TPGDKFn5EdLM8j99t
ErKYyTLFEiZYa52hpCBbVz/4mX8bJOoAlZaitcbvaFbG0OodA5SL24sKlr7tAPrM
ajnuqv+ghOUjbXrUlrTGxCjJ7vCJjdBqNzuxVFNj5P1xDucpBW8uuWGob0XWTMbB
hj/ToAIQXQXrZKFpASWW74B4QZDcjo7dbhDWOurBaAsyLuBzAi26pI+q6TqgCQUO
k17ilfk9LVEvvFhiQ7xpJPNnkh6tCEk7Jdblru6ZL5fHCAYe+qUDj56TbqjFJCQK
+bDzPi0QXkEGQNKxo7zDu5iGQ0Gb0zD2Z3MrGD+3pCkM5yG0PXjzZ7lOlboyPzwY
88dxuuTRmm8yGEEm81BKmDYqAA1l4FCrap8u9FLoNyoZyMnK7B+SHHuPRBRhL3F2
I3L/v8BbJhXTsDNPXEsXtpZZpn2wxJp4x4gKWmCcOb5MM1nbFrFtwdj0cKobu6Xe
ygNMEkjlW2uUrZoDXthj1ICda/cEw/R0gMWzBeNNVfErOZEmFxM=
=zl9i
-----END PGP SIGNATURE-----
Merge 4.4.74 into android-4.4
Changes in 4.4.74
configfs: Fix race between create_link and configfs_rmdir
can: gs_usb: fix memory leak in gs_cmd_reset()
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
vb2: Fix an off by one error in 'vb2_plane_vaddr'
mac80211: don't look at the PM bit of BAR frames
mac80211/wpa: use constant time memory comparison for MACs
mac80211: fix CSA in IBSS mode
mac80211: fix IBSS presp allocation size
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
iio: proximity: as3935: recalibrate RCO after resume
USB: hub: fix SS max number of ports
usb: core: fix potential memory leak in error path during hcd creation
pvrusb2: reduce stack usage pvr2_eeprom_analyze()
USB: gadget: dummy_hcd: fix hub-descriptor removable fields
usb: r8a66597-hcd: select a different endpoint on timeout
usb: r8a66597-hcd: decrease timeout
drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
mm/memory-failure.c: use compound_head() flags for huge pages
swap: cond_resched in swap_cgroup_prepare()
genirq: Release resources in __setup_irq() error path
alarmtimer: Prevent overflow of relative timers
usb: dwc3: exynos fix axius clock error path to do cleanup
MIPS: Fix bnezc/jialc return address calculation
alarmtimer: Rate limit periodic intervals
mm: larger stack guard gap, between vmas
Allow stack to grow up to address space limit
mm: fix new crash in unmapped_area_topdown()
Linux 4.4.74
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllEst4ACgkQONu9yGCS
aT5RJQ//UBkwmDInzy8BbfZk7pXY4IXzbXYfZ/nS5QbmPQNcQArKeIsOH+C1ZTKq
suo4wR4yWUzATr8BBBxbUKyyrAsA85oDq/fl3EyPVLYOHdChnM/sS9H/eFrQhJIN
7PM9j1YiqyTohkagB2tkWSb8tO1r/wgTSTLmIE+RzsWn6rvMCPMPVY+3OGgC1fuT
LJrgtXKGBl9zNxmrns3VlSou1MhJvh/y8ESIrhdYHdZKos0zsgQaISXJjhZtyQcV
OnEzsuz9NMXWE9XGjNpyAm88Nh8T41Ey/vwOjt6mkvWac3r77IgI5NWaLV/QDyqm
d3jpVWK5BSPcLsmeN4LwQC5aYayvHlh8CfP8ZlBx1xkB5TpclnqXGgQA9BYpXAKw
XoeFl8n8xLaPrgX8gp3kw/f6C6443OC2JVeRvgnH/0ZM7M0+rZxE6DstRcUHGqf6
K8PN+AssQpBLIjXSGHnzDVHME/1xWUSmJZfLd5bd6NJ1zSZqZOy1gkf5dx77p0Ka
UaGVOg6UzOojr3GeUTE62bRO2ZuAno0QO1NQJJkUK1CbNMYmE61vYLM8i0pLKWZJ
3mDlhcoGK8aJH8chNLU3mgkXECU/9zOVKveWZFoghhMlv8ImgeTuiqZhvztzzT38
42DxdXPfMzxCwBF02zYu4qn+WDJbNyOqMQrlMEHwwb88wnKiOUg=
=Ic1J
-----END PGP SIGNATURE-----
Merge 4.4.73 into android-4.4
Changes in 4.4.73
s390/vmem: fix identity mapping
partitions/msdos: FreeBSD UFS2 file systems are not recognized
ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
Call echo service immediately after socket reconnect
net: xilinx_emaclite: fix freezes due to unordered I/O
net: xilinx_emaclite: fix receive buffer overflow
ipv6: Handle IPv4-mapped src to in6addr_any dst.
ipv6: Inhibit IPv4-mapped src address on the wire.
NET: Fix /proc/net/arp for AX.25
NET: mkiss: Fix panic
net: hns: Fix the device being used for dma mapping during TX
sierra_net: Skip validating irrelevant fields for IDLE LSIs
sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
i2c: piix4: Fix request_region size
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
PM / runtime: Avoid false-positive warnings from might_sleep_if()
jump label: pass kbuild_cflags when checking for asm goto support
kasan: respect /proc/sys/kernel/traceoff_on_warning
log2: make order_base_2() behave correctly on const input value zero
ethtool: do not vzalloc(0) on registers dump
fscache: Fix dead object requeue
fscache: Clear outstanding writes when disabling a cookie
FS-Cache: Initialise stores_lock in netfs cookie
ipv6: fix flow labels when the traffic class is non-0
drm/nouveau: prevent userspace from deleting client object
drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
net/mlx4_core: Avoid command timeouts during VF driver device shutdown
gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
net: adaptec: starfire: add checks for dma mapping errors
parisc, parport_gsc: Fixes for printk continuation lines
drm/nouveau: Don't enabling polling twice on runtime resume
drm/ast: Fixed system hanged if disable P2A
ravb: unmap descriptors when freeing rings
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
r8152: re-schedule napi for tx
r8152: fix rtl8152_post_reset function
r8152: avoid start_xmit to schedule napi when napi is disabled
sctp: sctp_addr_id2transport should verify the addr before looking up assoc
romfs: use different way to generate fsid for BLOCK or MTD
proc: add a schedule point in proc_pid_readdir()
tipc: ignore requests when the connection state is not CONNECTED
xtensa: don't use linux IRQ #0
s390/kvm: do not rely on the ILC on kvm host protection fauls
sparc64: make string buffers large enough
Linux 4.4.73
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit f4cb767d76cf7ee72f97dd76f6cfa6c76a5edc89 upstream.
Trinity gets kernel BUG at mm/mmap.c:1963! in about 3 minutes of
mmap testing. That's the VM_BUG_ON(gap_end < gap_start) at the
end of unmapped_area_topdown(). Linus points out how MAP_FIXED
(which does not have to respect our stack guard gap intentions)
could result in gap_end below gap_start there. Fix that, and
the similar case in its alternative, unmapped_area().
Fixes: 1be7107fbe18 ("mm: larger stack guard gap, between vmas")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Debugged-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bd726c90b6b8ce87602208701b208a208e6d5600 upstream.
Fix expand_upwards() on architectures with an upward-growing stack (parisc,
metag and partly IA-64) to allow the stack to reliably grow exactly up to
the address space limit given by TASK_SIZE.
Signed-off-by: Helge Deller <deller@gmx.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
[gkh: minor build fixes for 4.4]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef70762948dde012146926720b70e79736336764 upstream.
I saw need_resched() warnings when swapping on large swapfile (TBs)
because continuously allocating many pages in swap_cgroup_prepare() took
too long.
We already cond_resched when freeing page in swap_cgroup_swapoff(). Do
the same for the page allocation.
Link: http://lkml.kernel.org/r/20170604200109.17606-1-yuzhao@google.com
Signed-off-by: Yu Zhao <yuzhao@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7258ae5c5a2ce2f5969e8b18b881be40ab55433d upstream.
memory_failure() chooses a recovery action function based on the page
flags. For huge pages it uses the tail page flags which don't have
anything interesting set, resulting in:
> Memory failure: 0x9be3b4: Unknown page state
> Memory failure: 0x9be3b4: recovery action for unknown page: Failed
Instead, save a copy of the head page's flags if this is a huge page,
this means if there are no relevant flags for this tail page, we use the
head pages flags instead. This results in the me_huge_page() recovery
action being called:
> Memory failure: 0x9b7969: recovery action for huge page: Delayed
For hugepages that have not yet been allocated, this allows the hugepage
to be dequeued.
Fixes: 524fca1e73 ("HWPOISON: fix misjudgement of page_action() for errors on mlocked pages")
Link: http://lkml.kernel.org/r/20170524130204.21845-1-james.morse@arm.com
Signed-off-by: James Morse <james.morse@arm.com>
Tested-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4f40c6e5627ea73b4e7c615c59631f38cc880885 ]
After much waiting I finally reproduced a KASAN issue, only to find my
trace-buffer empty of useful information because it got spooled out :/
Make kasan_report honour the /proc/sys/kernel/traceoff_on_warning
interface.
Link: http://lkml.kernel.org/r/20170125164106.3514-1-aryabinin@virtuozzo.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Acked-by: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>