Patch series "indirectly reclaimable memory", v2.
This patchset introduces the concept of indirectly reclaimable memory
and applies it to fix the issue of when a big number of dentries with
external names can significantly affect the MemAvailable value.
This patch (of 3):
Introduce a concept of indirectly reclaimable memory and adds the
corresponding memory counter and /proc/vmstat item.
Indirectly reclaimable memory is any sort of memory, used by the kernel
(except of reclaimable slabs), which is actually reclaimable, i.e. will
be released under memory pressure.
The counter is in bytes, as it's not always possible to count such
objects in pages. The name contains BYTES by analogy to
NR_KERNEL_STACK_KB.
Link: http://lkml.kernel.org/r/20180305133743.12746-2-guro@fb.com
Signed-off-by: Roman Gushchin <guro@fb.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-Commit: eb59254608bc1d42c4c6afdcdce9c0d3ce02b318
Git-Repo: git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Change-Id: Ie15abc33dcb13091e3acfa04dd55c664e1a24e70
Signed-off-by: Vijayanand Jitta <vjitta@codeaurora.org>
* refs/heads/tmp-5cc8c2e
Linux 4.4.110
kaiser: Set _PAGE_NX only if supported
x86/kasan: Clear kasan_zero_page after TLB flush
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
KPTI: Report when enabled
KPTI: Rename to PAGE_TABLE_ISOLATION
x86/kaiser: Move feature detection up
kaiser: disabled on Xen PV
x86/kaiser: Reenable PARAVIRT
x86/paravirt: Dont patch flush_tlb_single
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
kaiser: asm/tlbflush.h handle noPGE at lower level
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
x86/kaiser: Check boottime cmdline params
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
kaiser: add "nokaiser" boot option, using ALTERNATIVE
kaiser: fix unlikely error in alloc_ldt_struct()
kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
kaiser: paranoid_entry pass cr3 need to paranoid_exit
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
kaiser: PCID 0 for kernel and 128 for user
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
kaiser: enhanced by kernel and user PCIDs
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
kaiser: delete KAISER_REAL_SWITCH option
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
kaiser: cleanups while trying for gold link
kaiser: kaiser_remove_mapping() move along the pgd
kaiser: tidied up kaiser_add/remove_mapping slightly
kaiser: tidied up asm/kaiser.h somewhat
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser: fix perf crashes
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
kaiser: KAISER depends on SMP
kaiser: fix build and FIXME in alloc_ldt_struct()
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
kaiser: do not set _PAGE_NX on pgd_none
kaiser: merged update
KAISER: Kernel Address Isolation
x86/boot: Add early cmdline parsing for options with arguments
ANDROID: sdcardfs: Add default_normal option
ANDROID: sdcardfs: notify lower file of opens
Conflicts:
kernel/fork.c
Change-Id: I9c8c12e63321d79dc2c89fb470ca8de587366911
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpPj0wACgkQONu9yGCS
aT5QOhAAu3PoT3472I7zuWDUG0KQo5r0wdUO+YPW31VIHrxQ2H3sxR44rSHc5jW/
tTg2TIYNBkNoj4jJDJ9J7f6PSnN1vGFglFW4GzxE3cr2+W7u5M5ex8yCYMcBIY9U
56hbyqX5lf5KjGWJiQThwYsMBokrBJW2igAFN3cW39nNABhl0W39kiysGA9vbNrV
+QMA4+ZADA2EeIRcdJmj8uc/cez/7sGAfrSktvATkI+HFamnTs0mrx9cl0eQKvjm
y5PCxYUCbi4kqD4WM+UCYO3zpUD+r4iMDXwXBwLWkFvbumY4mVTItP+gq5M4Fb1g
MSauGUGH7BDsT9gspricCMcAmjcTn6hth7/7/ZhlNq3NZv89pOquhpE0JOSAmYbA
P4WaIRRWwpVrRt+THU7vZpAQWpFSwGmtE7tBfPMt2J7zqY3lMYmO3DoA+gejw3CV
igbvmV0UY2uYSFnjawUUJ+k+ggYfGyRkUl2DfcllPhZFqE1XEi3NyjI0wi8vtXTd
UlrU55TqsldCw1bjXH3lWrpoNybWvqUD2a249ZVs/h06Q5NKwNL8mTye+2BBQtCP
QzAqHYbkBKv/f8M6Kg+HtTzgqUbWxVCeQTWFXHMAPVo4bCwGvVGrXbGJIj15lBuQ
GWqc3dt69zxpn1tlcRHKH0P3KnkC67dARtY+8F8+D+HAHVY71Bg=
=Kpwd
-----END PGP SIGNATURE-----
Merge 4.4.110 into android-4.4
Changes in 4.4.110
x86/boot: Add early cmdline parsing for options with arguments
KAISER: Kernel Address Isolation
kaiser: merged update
kaiser: do not set _PAGE_NX on pgd_none
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
kaiser: fix build and FIXME in alloc_ldt_struct()
kaiser: KAISER depends on SMP
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
kaiser: fix perf crashes
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser: tidied up asm/kaiser.h somewhat
kaiser: tidied up kaiser_add/remove_mapping slightly
kaiser: kaiser_remove_mapping() move along the pgd
kaiser: cleanups while trying for gold link
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
kaiser: delete KAISER_REAL_SWITCH option
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
kaiser: enhanced by kernel and user PCIDs
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
kaiser: PCID 0 for kernel and 128 for user
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
kaiser: paranoid_entry pass cr3 need to paranoid_exit
kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
kaiser: fix unlikely error in alloc_ldt_struct()
kaiser: add "nokaiser" boot option, using ALTERNATIVE
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
x86/kaiser: Check boottime cmdline params
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
kaiser: asm/tlbflush.h handle noPGE at lower level
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
x86/paravirt: Dont patch flush_tlb_single
x86/kaiser: Reenable PARAVIRT
kaiser: disabled on Xen PV
x86/kaiser: Move feature detection up
KPTI: Rename to PAGE_TABLE_ISOLATION
KPTI: Report when enabled
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86/kasan: Clear kasan_zero_page after TLB flush
kaiser: Set _PAGE_NX only if supported
Linux 4.4.110
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
The kaiser update made an interesting choice, never to free any shadow
page tables. Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing. Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).
But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).
Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).
In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too). But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-f0b9d2d
Linux 4.4.101
mm/pagewalk.c: report holes in hugetlb ranges
mm/page_ext.c: check if page_ext is not prepared
mm: check the return value of lookup_page_ext for all call sites
coda: fix 'kernel memory exposure attempt' in fsync
mm/page_alloc.c: broken deferred calculation
ipmi: fix unsigned long underflow
ocfs2: should wait dio before inode lock in ocfs2_setattr()
nvme: Fix memory order on async queue deletion
arm64: fix dump_instr when PAN and UAO are in use
serial: omap: Fix EFR write on RTS deassertion
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
fealnx: Fix building error on MIPS
sctp: do not peel off an assoc from one netns to another one
af_netlink: ensure that NLMSG_DONE never fails in dumps
vlan: fix a use-after-free in vlan_device_event()
bonding: discard lowest hash bit for 802.3ad layer3+4
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
tcp: do not mangle skb->cb[] in tcp_make_synack()
Conflicts:
mm/debug-pagealloc.c
mm/page_ext.c
mm/page_owner.c
Change-Id: I551aff1b4c8a0d72f64a234abb8ac88990fbc9e5
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloXywoACgkQONu9yGCS
aT78dxAAoM0uHsL9r+ivJ4uMj81dwBL3Rd/1Lb/PMV5Yblh/LJ2WOcXriq/JgMLt
+ARoyugEpB8JzAi1Y3bq3Jku2TYcT0o55UmjRgZzQitdX5o8j1g1baNnpRuMz63z
S/g4Msh5aJyoHmwgxWZ+mWKn3SYdNwHy+r0gGwgtvlUO97iXqwM3nqQ/4tHnIv1B
sz0NtJ7cgFvWVaneUkZ4z0ZGTlKfxaQg95enyyCRWM7MJ6Be03+KnhmQZ6GEb8vP
tf9GtXiMEDJdwppDmXjtdjFW5adejBOoCF/grvbQoEdn7XPC47k6/l5Y6A3PYLMj
kqlC8IbMHbiQXvgwezxp6Mv+oc+LuSjSCVikZW2SGMacs5kF92+0MIUvBtfUwvsA
FP7q6jUcT3Or4xiG4xLDQW+RLPetidd+1Ms4jia6jaCajbMjU7ZYaBuAplT4qhIl
koJ9pn1ksna3fUyxnNFJttUN2ulGDzcSBP5EZf3bLWMXkG4daa8Cen7vBkG1VqZE
tspXCbB/mZ/eGv/rH3b7F2BVfP2RY0YqlUZzmfTXIoCwqcmX1zGi/KMfepcZTH3b
LOo8CBmTgSYXYh0/16GAUH3ds3QQt8d0oeaCEtf8BaAZnq5R3M8doZzGzTB6LGjG
Rn1KsUzJPKSqgYis3FTJNU3wmPokvV1ZVXK/ee9zMq5zOtyJyOg=
=XIwd
-----END PGP SIGNATURE-----
Merge 4.4.101 into android-4.4
Changes in 4.4.101
tcp: do not mangle skb->cb[] in tcp_make_synack()
netfilter/ipvs: clear ipvs_property flag when SKB net namespace changed
bonding: discard lowest hash bit for 802.3ad layer3+4
vlan: fix a use-after-free in vlan_device_event()
af_netlink: ensure that NLMSG_DONE never fails in dumps
sctp: do not peel off an assoc from one netns to another one
fealnx: Fix building error on MIPS
net/sctp: Always set scope_id in sctp_inet6_skb_msgname
ima: do not update security.ima if appraisal status is not INTEGRITY_PASS
serial: omap: Fix EFR write on RTS deassertion
arm64: fix dump_instr when PAN and UAO are in use
nvme: Fix memory order on async queue deletion
ocfs2: should wait dio before inode lock in ocfs2_setattr()
ipmi: fix unsigned long underflow
mm/page_alloc.c: broken deferred calculation
coda: fix 'kernel memory exposure attempt' in fsync
mm: check the return value of lookup_page_ext for all call sites
mm/page_ext.c: check if page_ext is not prepared
mm/pagewalk.c: report holes in hugetlb ranges
Linux 4.4.101
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d135e5750205a21a212a19dbb05aeb339e2cbea7 upstream.
In reset_deferred_meminit() we determine number of pages that must not
be deferred. We initialize pages for at least 2G of memory, but also
pages for reserved memory in this node.
The reserved memory is determined in this function:
memblock_reserved_memory_within(), which operates over physical
addresses, and returns size in bytes. However, reset_deferred_meminit()
assumes that that this function operates with pfns, and returns page
count.
The result is that in the best case machine boots slower than expected
due to initializing more pages than needed in single thread, and in the
worst case panics because fewer than needed pages are initialized early.
Link: http://lkml.kernel.org/r/20171021011707.15191-1-pasha.tatashin@oracle.com
Fixes: 864b9a393dcb ("mm: consider memblock reservations for deferred memory initialization sizing")
Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The information in /sys/kernel/debug/page_owner includes the migratetype
of the pageblock the page belongs to. This is also checked against the
page's migratetype (as declared by gfp_flags during its allocation), and
the page is reported as Fallback if its migratetype differs from the
pageblock's one. t This is somewhat misleading because in fact fallback
allocation is not the only reason why these two can differ. It also
doesn't direcly provide the page's migratetype, although it's possible
to derive that from the gfp_flags.
It's arguably better to print both page and pageblock's migratetype and
leave the interpretation to the consumer than to suggest fallback
allocation as the only possible reason. While at it, we can print the
migratetypes as string the same way as /proc/pagetypeinfo does, as some
of the numeric values depend on kernel configuration. For that, this
patch moves the migratetype_names array from #ifdef CONFIG_PROC_FS part
of mm/vmstat.c to mm/page_alloc.c and exports it.
With the new format strings for flags, we can now also provide symbolic
page and gfp flags in the /sys/kernel/debug/page_owner file. This
replaces the positional printing of page flags as single letters, which
might have looked nicer, but was limited to a subset of flags, and
required the user to remember the letters.
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x24213ca(GFP_HIGHUSER_MOVABLE|__GFP_COLD|__GFP_NOWARN|__GFP_NORETRY)
PFN 520 type Movable Block 1 type Movable Flags 0xfffff8001006c(referenced|uptodate|lru|active|mappedtodisk)
[<ffffffff811682c4>] __alloc_pages_nodemask+0x134/0x230
[<ffffffff811b4058>] alloc_pages_current+0x88/0x120
[<ffffffff8115e386>] __page_cache_alloc+0xe6/0x120
[<ffffffff8116ba6c>] __do_page_cache_readahead+0xdc/0x240
[<ffffffff8116bd05>] ondemand_readahead+0x135/0x260
[<ffffffff8116bfb1>] page_cache_sync_readahead+0x31/0x50
[<ffffffff81160523>] generic_file_read_iter+0x453/0x760
[<ffffffff811e0d57>] __vfs_read+0xa7/0xd0
Change-Id: I08f3412dbda9075d5534eee81444843a7679e54e
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 60f30350fd69a3e4d5f0f45937d3274c22565134
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[guptap@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
* refs/heads/tmp-e76c0fa
Linux 4.4.72
arm64: ensure extension of smp_store_release value
arm64: armv8_deprecated: ensure extension of addr
usercopy: Adjust tests to deal with SMAP/PAN
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
arm64: entry: improve data abort handling of tagged pointers
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
Make __xfs_xattr_put_listen preperly report errors.
NFSv4: Don't perform cached access checks before we've OPENed the file
NFS: Ensure we revalidate attributes before using execute_ok()
mm: consider memblock reservations for deferred memory initialization sizing
net: better skb->sender_cpu and skb->napi_id cohabitation
serial: sh-sci: Fix panic when serial console and DMA are enabled
tty: Drop krefs for interrupted tty lock
drivers: char: mem: Fix wraparound check to allow mappings up to the end
ASoC: Fix use-after-free at card unregistration
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
ALSA: timer: Fix race between read and ioctl
drm/nouveau/tmr: fully separate alarm execution/pending lists
drm/vmwgfx: Make sure backup_handle is always valid
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
perf/core: Drop kernel samples even though :u is specified
powerpc/hotplug-mem: Fix missing endian conversion of aa_index
powerpc/numa: Fix percpu allocations to be NUMA aware
powerpc/eeh: Avoid use after free in eeh_handle_special_event()
scsi: qla2xxx: don't disable a not previously enabled PCI device
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
btrfs: fix memory leak in update_space_info failure path
btrfs: use correct types for page indices in btrfs_page_exists_in_range
cxl: Fix error path on bad ioctl
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs: set correct ->s_maxsize
ufs: restore maintaining ->i_blocks
fix ufs_isblockset()
ufs: restore proper tail allocation
fs: add i_blocksize()
cpuset: consider dying css as offline
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
drm/msm: Expose our reservation object when exporting a dmabuf.
target: Re-add check to reject control WRITEs with overflow data
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
random: properly align get_random_int_hash
drivers: char: random: add get_random_long()
iio: proximity: as3935: fix AS3935_INT mask
iio: light: ltr501 Fix interchanged als/ps register field
staging/lustre/lov: remove set_fs() call from lov_getstripe()
usb: chipidea: debug: check before accessing ci_role
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
usb: gadget: f_mass_storage: Serialize wake and sleep execution
ext4: fix fdatasync(2) after extent manipulation operations
ext4: keep existing extra fields when inode expands
ext4: fix SEEK_HOLE
xen-netfront: cast grant table reference first to type int
xen-netfront: do not cast grant table reference to signed short
xen/privcmd: Support correctly 64KB page granularity when mapping memory
dmaengine: ep93xx: Always start from BASE0
dmaengine: usb-dmac: Fix DMAOR AE bit definition
KVM: async_pf: avoid async pf injection when in guest mode
arm: KVM: Allow unaligned accesses at HYP
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
nfsd: Fix up the "supattr_exclcreat" attributes
nfsd4: fix null dereference on replay
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
crypto: gcm - wait for crypto op not signal safe
KEYS: fix freeing uninitialized memory in key_update()
KEYS: fix dereferencing NULL payload with nonzero length
ptrace: Properly initialize ptracer_cred on fork
serial: ifx6x60: fix use-after-free on module unload
arch/sparc: support NR_CPUS = 4096
sparc64: delete old wrap code
sparc64: new context wrap
sparc64: add per-cpu mm of secondary contexts
sparc64: redefine first version
sparc64: combine activate_mm and switch_mm
sparc64: reset mm cpumask after wrap
sparc: Machine description indices can vary
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
net: bridge: start hello timer only if device is up
net: ethoc: enable NAPI before poll may be scheduled
net: ping: do not abuse udp_poll()
ipv6: Fix leak in ipv6_gso_segment().
vxlan: fix use-after-free on deletion
tcp: disallow cwnd undo when switching congestion control
cxgb4: avoid enabling napi twice to the same queue
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
bnx2x: Fix Multi-Cos
ANDROID: uid_sys_stats: check previous uid_entry before call find_or_register_uid
ANDROID: sdcardfs: d_splice_alias can return error values
Change-Id: I829ebf1a9271dcf0462c537e7bfcbcfde322f336
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllBIXAACgkQONu9yGCS
aT6T+w//VjXDZ+MddWJ4UeQDyIANYeFpa4tJNoqR3JsnT6yg1HODRZDR7aP5QJmN
GIoRWU/2Q2nmYbAO0c8RPxs07w2xtIZzTUn+H+i6sG7bRs5RbLM5AMg4W/A/X88L
V5c34kCvCf1HRfrdd4rXIZiibFnSZGqUv6o1YyQqCIvx15pyB6elMM714zt8uubk
iL4/WJ2M4SrmamHWA349ldEtPjQKpwpwdBcCn+M4awbimdc0pm8oZqNkAfwJ+vLO
HsuClO57I699ESU2Zt5bfEdVsW/gc7WiJOAr1Mrl2suToryrWfs2YT+sC/IQhkfC
gUsi9Cm/6YMu+tiP4o6aqYvTFoFplFErpEbC3mqAEvHGGHKhrgEDotYJ+FnvI3q7
Jaxix0B/Q/NIqsJPnqe5ONOCKFmW7rGR2e2j5+45GuiofioNVNF12HWfQkoItPOL
YeR2JB8K9aywzYM4gaJuy8ScJ1shN8TY1FKgZa5gBT2ym4pDDcQmxz7Jr7agREHe
F2sJ23zMU+o9guGA4Is2yqWCQ5yM+3kpPPISz+Pcgh8Q95o+ftCSyOeB2F5roW8I
EO22AlJPlQH0LWDQhOJ5ZuAVe+qB8EdrQqqdLbP4/oHp7MtlR5ge+idRuZc+AUsa
UoASccPsEwHyBErQmHoWNI4nPRciFrKliOqERmPLcuzewUwSatw=
=wXRR
-----END PGP SIGNATURE-----
Merge 4.4.72 into android-4.4
Changes in 4.4.72
bnx2x: Fix Multi-Cos
ipv6: xfrm: Handle errors reported by xfrm6_find_1stfragopt()
cxgb4: avoid enabling napi twice to the same queue
tcp: disallow cwnd undo when switching congestion control
vxlan: fix use-after-free on deletion
ipv6: Fix leak in ipv6_gso_segment().
net: ping: do not abuse udp_poll()
net: ethoc: enable NAPI before poll may be scheduled
net: bridge: start hello timer only if device is up
sparc64: mm: fix copy_tsb to correctly copy huge page TSBs
sparc: Machine description indices can vary
sparc64: reset mm cpumask after wrap
sparc64: combine activate_mm and switch_mm
sparc64: redefine first version
sparc64: add per-cpu mm of secondary contexts
sparc64: new context wrap
sparc64: delete old wrap code
arch/sparc: support NR_CPUS = 4096
serial: ifx6x60: fix use-after-free on module unload
ptrace: Properly initialize ptracer_cred on fork
KEYS: fix dereferencing NULL payload with nonzero length
KEYS: fix freeing uninitialized memory in key_update()
crypto: gcm - wait for crypto op not signal safe
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
nfsd4: fix null dereference on replay
nfsd: Fix up the "supattr_exclcreat" attributes
kvm: async_pf: fix rcu_irq_enter() with irqs enabled
KVM: cpuid: Fix read/write out-of-bounds vulnerability in cpuid emulation
arm: KVM: Allow unaligned accesses at HYP
KVM: async_pf: avoid async pf injection when in guest mode
dmaengine: usb-dmac: Fix DMAOR AE bit definition
dmaengine: ep93xx: Always start from BASE0
xen/privcmd: Support correctly 64KB page granularity when mapping memory
xen-netfront: do not cast grant table reference to signed short
xen-netfront: cast grant table reference first to type int
ext4: fix SEEK_HOLE
ext4: keep existing extra fields when inode expands
ext4: fix fdatasync(2) after extent manipulation operations
usb: gadget: f_mass_storage: Serialize wake and sleep execution
usb: chipidea: udc: fix NULL pointer dereference if udc_start failed
usb: chipidea: debug: check before accessing ci_role
staging/lustre/lov: remove set_fs() call from lov_getstripe()
iio: light: ltr501 Fix interchanged als/ps register field
iio: proximity: as3935: fix AS3935_INT mask
drivers: char: random: add get_random_long()
random: properly align get_random_int_hash
stackprotector: Increase the per-task stack canary's random range from 32 bits to 64 bits on 64-bit platforms
cpufreq: cpufreq_register_driver() should return -ENODEV if init fails
target: Re-add check to reject control WRITEs with overflow data
drm/msm: Expose our reservation object when exporting a dmabuf.
Input: elantech - add Fujitsu Lifebook E546/E557 to force crc_enabled
cpuset: consider dying css as offline
fs: add i_blocksize()
ufs: restore proper tail allocation
fix ufs_isblockset()
ufs: restore maintaining ->i_blocks
ufs: set correct ->s_maxsize
ufs_extend_tail(): fix the braino in calling conventions of ufs_new_fragments()
ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path
cxl: Fix error path on bad ioctl
btrfs: use correct types for page indices in btrfs_page_exists_in_range
btrfs: fix memory leak in update_space_info failure path
KVM: arm/arm64: Handle possible NULL stage2 pud when ageing pages
scsi: qla2xxx: don't disable a not previously enabled PCI device
powerpc/eeh: Avoid use after free in eeh_handle_special_event()
powerpc/numa: Fix percpu allocations to be NUMA aware
powerpc/hotplug-mem: Fix missing endian conversion of aa_index
perf/core: Drop kernel samples even though :u is specified
drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve()
drm/vmwgfx: limit the number of mip levels in vmw_gb_surface_define_ioctl()
drm/vmwgfx: Make sure backup_handle is always valid
drm/nouveau/tmr: fully separate alarm execution/pending lists
ALSA: timer: Fix race between read and ioctl
ALSA: timer: Fix missing queue indices reset at SNDRV_TIMER_IOCTL_SELECT
ASoC: Fix use-after-free at card unregistration
drivers: char: mem: Fix wraparound check to allow mappings up to the end
tty: Drop krefs for interrupted tty lock
serial: sh-sci: Fix panic when serial console and DMA are enabled
net: better skb->sender_cpu and skb->napi_id cohabitation
mm: consider memblock reservations for deferred memory initialization sizing
NFS: Ensure we revalidate attributes before using execute_ok()
NFSv4: Don't perform cached access checks before we've OPENed the file
Make __xfs_xattr_put_listen preperly report errors.
arm64: hw_breakpoint: fix watchpoint matching for tagged pointers
arm64: entry: improve data abort handling of tagged pointers
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
usercopy: Adjust tests to deal with SMAP/PAN
arm64: armv8_deprecated: ensure extension of addr
arm64: ensure extension of smp_store_release value
Linux 4.4.72
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 864b9a393dcb5aed09b8fd31b9bbda0fdda99374 upstream.
We have seen an early OOM killer invocation on ppc64 systems with
crashkernel=4096M:
kthreadd invoked oom-killer: gfp_mask=0x16040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), nodemask=7, order=0, oom_score_adj=0
kthreadd cpuset=/ mems_allowed=7
CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.4.68-1.gd7fe927-default #1
Call Trace:
dump_stack+0xb0/0xf0 (unreliable)
dump_header+0xb0/0x258
out_of_memory+0x5f0/0x640
__alloc_pages_nodemask+0xa8c/0xc80
kmem_getpages+0x84/0x1a0
fallback_alloc+0x2a4/0x320
kmem_cache_alloc_node+0xc0/0x2e0
copy_process.isra.25+0x260/0x1b30
_do_fork+0x94/0x470
kernel_thread+0x48/0x60
kthreadd+0x264/0x330
ret_from_kernel_thread+0x5c/0xa4
Mem-Info:
active_anon:0 inactive_anon:0 isolated_anon:0
active_file:0 inactive_file:0 isolated_file:0
unevictable:0 dirty:0 writeback:0 unstable:0
slab_reclaimable:5 slab_unreclaimable:73
mapped:0 shmem:0 pagetables:0 bounce:0
free:0 free_pcp:0 free_cma:0
Node 7 DMA free:0kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:52428800kB managed:110016kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:320kB slab_unreclaimable:4672kB kernel_stack:1152kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0 0
Node 7 DMA: 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 0kB
0 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap = 0kB
Total swap = 0kB
819200 pages RAM
0 pages HighMem/MovableOnly
817481 pages reserved
0 pages cma reserved
0 pages hwpoisoned
the reason is that the managed memory is too low (only 110MB) while the
rest of the the 50GB is still waiting for the deferred intialization to
be done. update_defer_init estimates the initial memoty to initialize
to 2GB at least but it doesn't consider any memory allocated in that
range. In this particular case we've had
Reserving 4096MB of memory at 128MB for crashkernel (System RAM: 51200MB)
so the low 2GB is mostly depleted.
Fix this by considering memblock allocations in the initial static
initialization estimation. Move the max_initialise to
reset_deferred_meminit and implement a simple memblock_reserved_memory
helper which iterates all reserved blocks and sums the size of all that
start below the given address. The cumulative size is than added on top
of the initial estimation. This is still not ideal because
reset_deferred_meminit doesn't consider holes and so reservation might
be above the initial estimation whihch we ignore but let's make the
logic simpler until we really need to handle more complicated cases.
Fixes: 3a80a7fa79 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set")
Link: http://lkml.kernel.org/r/20170531104010.GI27783@dhcp22.suse.cz
Signed-off-by: Michal Hocko <mhocko@suse.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* remotes/origin/tmp-2f0de51:
Linux 4.4.38
esp6: Fix integrity verification when ESN are used
esp4: Fix integrity verification when ESN are used
ipv4: Set skb->protocol properly for local output
ipv6: Set skb->protocol properly for local output
Don't feed anything but regular iovec's to blk_rq_map_user_iov
constify iov_iter_count() and iter_is_iovec()
sparc64: fix compile warning section mismatch in find_node()
sparc64: Fix find_node warning if numa node cannot be found
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
net: ping: check minimum size on ICMP header length
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
packet: fix race condition in packet_set_ring
net/dccp: fix use-after-free in dccp_invalid_packet
netlink: Do not schedule work from sk_destruct
netlink: Call cb->done from a worker thread
net/sched: pedit: make sure that offset is valid
net, sched: respect rcu grace period on cls destruction
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
rtnetlink: fix FDB size computation
af_unix: conditionally use freezable blocking calls in read
net: sky2: Fix shutdown crash
ip6_tunnel: disable caching when the traffic class is inherited
net: check dead netns for peernet2id_alloc()
virtio-net: add a missing synchronize_net()
Linux 4.4.37
arm64: suspend: Reconfigure PSTATE after resume from idle
arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call
arm64: cpufeature: Schedule enable() calls instead of calling them via IPI
pwm: Fix device reference leak
mwifiex: printk() overflow with 32-byte SSIDs
PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
PCI: Export pcie_find_root_port
rcu: Fix soft lockup for rcu_nocb_kthread
ALSA: pcm : Call kill_fasync() in stream lock
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
kasan: update kasan_global for gcc 7
zram: fix unbalanced idr management at hot removal
ARC: Don't use "+l" inline asm constraint
Linux 4.4.36
scsi: mpt3sas: Unblock device after controller reset
flow_dissect: call init_default_flow_dissectors() earlier
mei: fix return value on disconnection
mei: me: fix place for kaby point device ids.
mei: me: disable driver on SPT SPS firmware
drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
parisc: Also flush data TLB in flush_icache_page_asm
parisc: Fix race in pci-dma.c
parisc: Fix races in parisc_setup_cache_timing()
NFSv4.x: hide array-bounds warning
apparmor: fix change_hat not finding hat after policy replacement
cfg80211: limit scan results cache size
tile: avoid using clocksource_cyc2ns with absolute cycle count
scsi: mpt3sas: Fix secure erase premature termination
Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
USB: serial: cp210x: add ID for the Zone DPMX
usb: chipidea: move the lock initialization to core file
KVM: x86: check for pic and ioapic presence before use
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
iommu/vt-d: Fix PASID table allocation
sched: tune: Fix lacking spinlock initialization
UPSTREAM: trace: Update documentation for mono, mono_raw and boot clock
UPSTREAM: trace: Add an option for boot clock as trace clock
UPSTREAM: timekeeping: Add a fast and NMI safe boot clock
ANDROID: goldfish_pipe: fix allmodconfig build
ANDROID: goldfish: goldfish_pipe: fix locking errors
ANDROID: video: goldfishfb: fix platform_no_drv_owner.cocci warnings
ANDROID: goldfish_pipe: fix call_kern.cocci warnings
arm64: rename ranchu defconfig to ranchu64
ANDROID: arch: x86: disable pic for Android toolchain
ANDROID: goldfish_pipe: An implementation of more parallel pipe
ANDROID: goldfish_pipe: bugfixes and performance improvements.
ANDROID: goldfish: Add goldfish sync driver
ANDROID: goldfish: add ranchu defconfigs
ANDROID: goldfish_audio: Clear audio read buffer status after each read
ANDROID: goldfish_events: no extra EV_SYN; register goldfish
ANDROID: goldfish_fb: Set pixclock = 0
ANDROID: goldfish: Enable ACPI-based enumeration for goldfish audio
ANDROID: goldfish: Enable ACPI-based enumeration for goldfish framebuffer
ANDROID: video: goldfishfb: add devicetree bindings
BACKPORT: staging: goldfish: audio: fix compiliation on arm
BACKPORT: Input: goldfish_events - enable ACPI-based enumeration for goldfish events
BACKPORT: goldfish: Enable ACPI-based enumeration for goldfish battery
BACKPORT: drivers: tty: goldfish: Add device tree bindings
BACKPORT: tty: goldfish: support platform_device with id -1
BACKPORT: Input: goldfish_events - add devicetree bindings
BACKPORT: power: goldfish_battery: add devicetree bindings
BACKPORT: staging: goldfish: audio: add devicetree bindings
ANDROID: usb: gadget: function: cleanup: Add blank line after declaration
cpufreq: sched: Fix kernel crash on accessing sysfs file
usb: gadget: f_mtp: simplify ptp NULL pointer check
cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
writeback: initialize inode members that track writeback history
mm: page_alloc: generalize the dirty balance reserve
block: fix module reference leak on put_disk() call for cgroups throttle
Linux 4.4.35
netfilter: nft_dynset: fix element timeout for HZ != 1000
IB/cm: Mark stale CM id's whenever the mad agent was unregistered
IB/uverbs: Fix leak of XRC target QPs
IB/core: Avoid unsigned int overflow in sg_alloc_table
IB/mlx5: Fix fatal error dispatching
IB/mlx5: Use cache line size to select CQE stride
IB/mlx4: Fix create CQ error flow
IB/mlx4: Check gid_index return value
PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
PM / sleep: fix device reference leak in test_suspend
uwb: fix device reference leaks
mfd: core: Fix device reference leak in mfd_clone_cell
iwlwifi: pcie: fix SPLC structure parsing
rtc: omap: Fix selecting external osc
clk: mmp: mmp2: fix return value check in mmp2_clk_init()
clk: mmp: pxa168: fix return value check in pxa168_clk_init()
clk: mmp: pxa910: fix return value check in pxa910_clk_init()
drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
crypto: caam - do not register AES-XTS mode on LP units
ext4: sanity check the block and cluster size at mount time
kbuild: Steal gcc's pie from the very beginning
x86/kexec: add -fno-PIE
scripts/has-stack-protector: add -fno-PIE
kbuild: add -fno-PIE
i2c: mux: fix up dependencies
can: bcm: fix warning in bcm_connect/proc_register
mfd: intel-lpss: Do not put device in reset state on suspend
fuse: fix fuse_write_end() if zero bytes were copied
KVM: Disable irq while unregistering user notifier
KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
Linux 4.4.34
sparc64: Delete now unused user copy fixup functions.
sparc64: Delete now unused user copy assembler helpers.
sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.
sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting.
sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.
sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.
sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.
sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting.
sparc64: Convert copy_in_user to accurate exception reporting.
sparc64: Prepare to move to more saner user copy exception handling.
sparc64: Delete __ret_efault.
sparc64: Handle extremely large kernel TLB range flushes more gracefully.
sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.
sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.
sparc64: Fix illegal relative branches in hypervisor patched TLB code.
sparc64: Handle extremely large kernel TSB range flushes sanely.
sparc: Handle negative offsets in arch_jump_label_transform
sparc64 mm: Fix base TSB sizing when hugetlb pages are used
sparc: serial: sunhv: fix a double lock bug
sparc: Don't leak context bits into thread->fault_address
tty: Prevent ldisc drivers from re-using stale tty fields
tcp: take care of truncations done by sk_filter()
ipv4: use new_gw for redirect neigh lookup
net: __skb_flow_dissect() must cap its return value
sock: fix sendmmsg for partial sendmsg
fib_trie: Correct /proc/net/route off by one error
sctp: assign assoc_id earlier in __sctp_connect
ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
ipv6: dccp: fix out of bound access in dccp_v6_err()
dccp: fix out of bound access in dccp_v4_err()
dccp: do not send reset to already closed sockets
tcp: fix potential memory corruption
ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
bgmac: stop clearing DMA receive control register right after it is set
net: mangle zero checksum in skb_checksum_help()
net: clear sk_err_soft in sk_clone_lock()
dctcp: avoid bogus doubling of cwnd after loss
ARM: 8485/1: cpuidle: remove cpu parameter from the cpuidle_ops suspend hook
Linux 4.4.33
netfilter: fix namespace handling in nf_log_proc_dostring
btrfs: qgroup: Prevent qgroup->reserved from going subzero
mmc: mxs: Initialize the spinlock prior to using it
ASoC: sun4i-codec: return error code instead of NULL when create_card fails
ACPI / APEI: Fix incorrect return value of ghes_proc()
i40e: fix call of ndo_dflt_bridge_getlink()
hwrng: core - Don't use a stack buffer in add_early_randomness()
lib/genalloc.c: start search from start of chunk
mei: bus: fix received data size check in NFC fixup
iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
iommu/amd: Free domain id when free a domain of struct dma_ops_domain
tty/serial: at91: fix hardware handshake on Atmel platforms
dmaengine: at_xdmac: fix spurious flag status for mem2mem transfers
drm/i915: Respect alternate_ddc_pin for all DDI ports
KVM: MIPS: Precalculate MMIO load resume PC
scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver)
iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation.
clk: qoriq: Don't allow CPU clocks higher than starting value
toshiba-wmi: Fix loading the driver on non Toshiba laptops
drbd: Fix kernel_sendmsg() usage - potential NULL deref
usb: gadget: u_ether: remove interrupt throttling
USB: cdc-acm: fix TIOCMIWAIT
staging: nvec: remove managed resource from PS2 driver
Revert "staging: nvec: ps2: change serio type to passthrough"
drivers: staging: nvec: remove bogus reset command for PS/2 interface
staging: iio: ad5933: avoid uninitialized variable in error case
pinctrl: cherryview: Prevent possible interrupt storm on resume
pinctrl: cherryview: Serialize register access in suspend/resume
ARC: timer: rtc: implement read loop in "C" vs. inline asm
s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment
coredump: fix unfreezable coredumping task
swapfile: fix memory corruption via malformed swapfile
dib0700: fix nec repeat handling
ASoC: cs4270: fix DAPM stream name mismatch
ALSA: info: Limit the proc text input size
ALSA: info: Return error for invalid read/write
arm64: Enable KPROBES/HIBERNATION/CORESIGHT in defconfig
arm64: kvm: allows kvm cpu hotplug
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: KVM: Skip HYP setup when already running in HYP
arm64: hyp/kvm: Make hyp-stub reject kvm_call_hyp()
arm64: hyp/kvm: Make hyp-stub extensible
arm64: kvm: Move lr save/restore from do_el2_call into EL1
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
ANDROID: video: adf: Avoid directly referencing user pointers
ANDROID: usb: gadget: audio_source: fix comparison of distinct pointer types
android: binder: support for file-descriptor arrays.
android: binder: support for scatter-gather.
android: binder: add extra size to allocator.
android: binder: refactor binder_transact()
android: binder: support multiple /dev instances.
android: binder: deal with contexts in debugfs.
android: binder: support multiple context managers.
android: binder: split flat_binder_object.
disable aio support in recommended configuration
Linux 4.4.32
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
drm/radeon: fix DP mode validation
drm/radeon/dp: add back special handling for NUTMEG
drm/amdgpu: fix DP mode validation
drm/amdgpu/dp: add back special handling for NUTMEG
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
Revert KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
of: silence warnings due to max() usage
packet: on direct_xmit, limit tso and csum to supported devices
sctp: validate chunk len before actually using it
net sched filters: fix notification of filter delete with proper handle
udp: fix IP_CHECKSUM handling
net: sctp, forbid negative length
ipv4: use the right lock for ping_group_range
ipv4: disable BH in set_ping_group_range()
net: add recursion limit to GRO
rtnetlink: Add rtnexthop offload flag to compare mask
bridge: multicast: restore perm router ports on multicast enable
net: pktgen: remove rcu locking in pktgen_change_name()
ipv6: correctly add local routes when lo goes up
ip6_tunnel: fix ip6_tnl_lookup
ipv6: tcp: restore IP6CB for pktoptions skbs
netlink: do not enter direct reclaim from netlink_dump()
packet: call fanout_release, while UNREGISTERING a netdev
net: Add netdev all_adj_list refcnt propagation to fix panic
net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions
net: pktgen: fix pkt_size
net: fec: set mac address unconditionally
tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
tcp: fix a compile error in DBGUNDO()
tcp: fix wrong checksum calculation on MTU probing
net: avoid sk_forward_alloc overflows
tcp: fix overflow in __tcp_retransmit_skb()
arm64/kvm: fix build issue on kvm debug
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
Linux 4.4.31
HID: usbhid: add ATEN CS962 to list of quirky devices
ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
kvm: x86: Check memopp before dereference (CVE-2016-8630)
tty: vt, fix bogus division in csi_J
usb: dwc3: Fix size used in dma_free_coherent()
pwm: Unexport children before chip removal
UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header
Disable "frame-address" warning
smc91x: avoid self-comparison warning
cgroup: avoid false positive gcc-6 warning
drm/exynos: fix error handling in exynos_drm_subdrv_open
mm/cma: silence warnings due to max() usage
ARM: 8584/1: floppy: avoid gcc-6 warning
powerpc/ptrace: Fix out of bounds array access warning
x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
perf build: Fix traceevent plugins build race
drm/dp/mst: Check peer device type before attempting EDID read
drm/radeon: drop register readback in cayman_cp_int_cntl_setup
drm/radeon/si_dpm: workaround for SI kickers
drm/radeon/si_dpm: Limit clocks on HD86xx part
Revert "drm/radeon: fix DP link training issue with second 4K monitor"
mmc: dw_mmc-pltfm: fix the potential NULL pointer dereference
scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
mac80211: discard multicast and 4-addr A-MSDUs
firewire: net: fix fragmented datagram_size off-by-one
firewire: net: guard against rx buffer overflows
Input: i8042 - add XMG C504 to keyboard reset table
dm mirror: fix read error on recovery after default leg failure
virtio: console: Unlock vqs while freeing buffers
virtio_ring: Make interrupt suppression spec compliant
parisc: Ensure consistent state when switching to kernel stack at syscall entry
ovl: fsync after copy-up
KVM: MIPS: Make ERET handle ERL before EXL
KVM: x86: fix wbinvd_dirty_mask use-after-free
dm: free io_barrier after blk_cleanup_queue call
USB: serial: cp210x: fix tiocmget error handling
tty: limit terminal size to 4M chars
xhci: add restart quirk for Intel Wildcatpoint PCH
hv: do not lose pending heartbeat vmbus packets
vt: clear selection before resizing
Fix potential infoleak in older kernels
GenWQE: Fix bad page access during abort of resource allocation
usb: increase ohci watchdog delay to 275 msec
xhci: use default USB_RESUME_TIMEOUT when resuming ports.
USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7
USB: serial: fix potential NULL-dereference at probe
usb: gadget: function: u_ether: don't starve tx request queue
mei: txe: don't clean an unprocessed interrupt cause.
ubifs: Fix regression in ubifs_readdir()
ubifs: Abort readdir upon error
btrfs: fix races on root_log_ctx lists
ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
ANDROID: binder: Add strong ref checks
ALSA: hda - Fix headset mic detection problem for two Dell laptops
ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table
ALSA: hda - allow 40 bit DMA mask for NVidia devices
ALSA: hda - Raise AZX_DCAPS_RIRB_DELAY handling into top drivers
ALSA: hda - Merge RIRB_PRE_DELAY into CTX_WORKAROUND caps
ALSA: usb-audio: Add quirk for Syntek STK1160
KEYS: Fix short sprintf buffer in /proc/keys show function
mm: memcontrol: do not recurse in direct reclaim
mm/list_lru.c: avoid error-path NULL pointer deref
libxfs: clean up _calc_dquots_per_chunk
h8300: fix syscall restarting
drm/dp/mst: Clear port->pdt when tearing down the i2c adapter
i2c: core: fix NULL pointer dereference under race condition
i2c: xgene: Avoid dma_buffer overrun
arm64:cpufeature ARM64_NCAPS is the indicator of last feature
arm64: hibernate: Refuse to hibernate if the boot cpu is offline
PM / sleep: Add support for read-only sysfs attributes
arm64: kernel: Add support for hibernate/suspend-to-disk
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
PM / Hibernate: Call flush_icache_range() on pages restored in-place
arm64: Add new asm macro copy_page
arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
arm64: kernel: Include _AC definition in page.h
arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
arm64: kernel: Rework finisher callback out of __cpu_suspend_enter()
arm64: Cleanup SCTLR flags
arm64: Fold proc-macros.S into assembler.h
arm/arm64: KVM: Add hook for C-based stage2 init
arm/arm64: KVM: Detect vGIC presence at runtime
arm64: KVM: Add support for 16-bit VMID
arm: KVM: Make kvm_arm.h friendly to assembly code
arm/arm64: KVM: Remove unreferenced S2_PGD_ORDER
arm64: KVM: debug: Remove spurious inline attributes
ARM: KVM: Cleanup exception injection
arm64: KVM: Remove weak attributes
arm64: KVM: Cleanup asm-offset.c
arm64: KVM: Turn system register numbers to an enum
arm64: KVM: VHE: Patch out use of HVC
arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
arm/arm64: Add new is_kernel_in_hyp_mode predicate
arm64: KVM: Move away from the assembly version of the world switch
arm64: KVM: Map the kernel RO section into HYP
arm64: KVM: Add compatibility aliases
arm64: KVM: Implement vgic-v3 save/restore
arm64: KVM: Add panic handling
arm64: KVM: HYP mode entry points
arm64: KVM: Implement TLB handling
arm64: KVM: Implement fpsimd save/restore
arm64: KVM: Implement the core world switch
arm64: KVM: Add patchable function selector
arm64: KVM: Implement guest entry
arm64: KVM: Implement debug save/restore
arm64: KVM: Implement 32bit system register save/restore
arm64: KVM: Implement system register save/restore
arm64: KVM: Implement timer save/restore
arm64: KVM: Implement vgic-v2 save/restore
arm64: KVM: Add a HYP-specific header file
KVM: arm/arm64: vgic-v3: Make the LR indexing macro public
arm64: Add macros to read/write system registers
Linux 4.4.30
Revert "fix minor infoleak in get_user_ex()"
Revert "x86/mm: Expand the exception table logic to allow new handling options"
Linux 4.4.29
ARM: pxa: pxa_cplds: fix interrupt handling
powerpc/nvram: Fix an incorrect partition merge
mpt3sas: Don't spam logs if logging level is 0
perf symbols: Fixup symbol sizes before picking best ones
perf symbols: Check symbol_conf.allow_aliases for kallsyms loading too
perf hists browser: Fix event group display
clk: divider: Fix clk_divider_round_rate() to use clk_readl()
clk: qoriq: fix a register offset error
s390/con3270: fix insufficient space padding
s390/con3270: fix use of uninitialised data
s390/cio: fix accidental interrupt enabling during resume
x86/mm: Expand the exception table logic to allow new handling options
dmaengine: ipu: remove bogus NO_IRQ reference
power: bq24257: Fix use of uninitialized pointer bq->charger
staging: r8188eu: Fix scheduling while atomic splat
ASoC: dapm: Fix kcontrol creation for output driver widget
ASoC: dapm: Fix value setting for _ENUM_DOUBLE MUX's second channel
ASoC: dapm: Fix possible uninitialized variable in snd_soc_dapm_get_volsw()
ASoC: topology: Fix error return code in soc_tplg_dapm_widget_create()
hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
crypto: arm/ghash-ce - add missing async import/export
crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
mwifiex: correct aid value during tdls setup
spi: spi-fsl-dspi: Drop extra spi_master_put in device remove function
ARM: clk-imx35: fix name for ckil clk
uio: fix dmem_region_start computation
genirq/generic_chip: Add irq_unmap callback
perf stat: Fix interval output values
powerpc/eeh: Null check uses of eeh_pe_bus_get
tunnels: Remove encapsulation offloads on decap.
tunnels: Don't apply GRO to multiple layers of encapsulation.
ipip: Properly mark ipip GRO packets as encapsulated.
posix_acl: Clear SGID bit when setting file permissions
brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
mm/hugetlb: fix memory offline with hugepage size > memory block size
drm/i915: Unalias obj->phys_handle and obj->userptr
drm/i915: Account for TSEG size when determining 865G stolen base
Revert "drm/i915: Check live status before reading edid"
drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
xenbus: don't look up transaction IDs for ordinary writes
drm/vmwgfx: Limit the user-space command buffer size
drm/radeon: change vblank_time's calculation method to reduce computational error.
drm/radeon/si/dpm: fix phase shedding setup
drm/radeon: narrow asic_init for virtualization
drm/amdgpu: change vblank_time's calculation method to reduce computational error.
drm/amdgpu/dce11: add missing drm_mode_config_cleanup call
drm/amdgpu/dce11: disable hpd on local panels
drm/amdgpu/dce8: disable hpd on local panels
drm/amdgpu/dce10: disable hpd on local panels
drm/amdgpu: fix IB alignment for UVD
drm/prime: Pass the right module owner through to dma_buf_export()
Linux 4.4.28
target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
ubifs: Fix xattr_names length in exit paths
jbd2: fix incorrect unlock on j_list_lock
ext4: do not advertise encryption support when disabled
mmc: rtsx_usb_sdmmc: Handle runtime PM while changing the led
mmc: rtsx_usb_sdmmc: Avoid keeping the device runtime resumed when unused
mmc: core: Annotate cmd_hdr as __le32
powerpc/mm: Prevent unlikely crash in copro_calculate_slb()
ceph: fix error handling in ceph_read_iter
arm64: kernel: Init MDCR_EL2 even in the absence of a PMU
arm64: percpu: rewrite ll/sc loops in assembly
memstick: rtsx_usb_ms: Manage runtime PM when accessing the device
memstick: rtsx_usb_ms: Runtime resume the device when polling for cards
isofs: Do not return EACCES for unknown filesystems
irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
s390/mm: fix gmap tlb flush issues
Using BUG_ON() as an assert() is _never_ acceptable
mm: filemap: fix mapping->nrpages double accounting in fuse
mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
acpi, nfit: check for the correct event code in notifications
net/mlx4_core: Allow resetting VF admin mac to zero
bnx2x: Prevent false warning for lack of FC NPIV
PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures
hpsa: correct skipping masked peripherals
sd: Fix rw_max for devices that report an optimal xfer size
irqchip/gicv3: Handle loop timeout proper
kvm: x86: memset whole irq_eoi
x86/e820: Don't merge consecutive E820_PRAM ranges
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
Fix regression which breaks DFS mounting
Cleanup missing frees on some ioctls
Do not send SMB3 SET_INFO request if nothing is changing
SMB3: GUIDs should be constructed as random but valid uuids
Set previous session id correctly on SMB3 reconnect
Display number of credits available
Clarify locking of cifs file and tcon structures and make more granular
fs/cifs: keep guid when assigning fid to fileinfo
cifs: Limit the overall credit acquired
fs/super.c: fix race between freeze_super() and thaw_super()
arc: don't leak bits of kernel stack into coredump
lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
ipc/sem.c: fix complex_count vs. simple op race
mm: filemap: don't plant shadow entries without radix tree node
metag: Only define atomic_dec_if_positive conditionally
scsi: Fix use-after-free
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
NFSv4: Open state recovery must account for file permission changes
NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
sunrpc: fix write space race causing stalls
Input: elantech - add Fujitsu Lifebook E556 to force crc_enabled
Input: elantech - force needed quirks on Fujitsu H760
Input: i8042 - skip selftest on ASUS laptops
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
MIPS: ptrace: Fix regs_return_value for kernel context
MIPS: Fix -mabi=64 build of vdso.lds
ALSA: hda - Fix a failure of micmute led when having multi adcs
cx231xx: fix GPIOs for Pixelview SBTVD hybrid
cx231xx: don't return error on success
mb86a20s: fix demod settings
mb86a20s: fix the locking logic
ovl: copy_up_xattr(): use strnlen
ovl: Fix info leak in ovl_lookup_temp()
fbdev/efifb: Fix 16 color palette entry calculation
scsi: zfcp: spin_lock_irqsave() is not nestable
zfcp: trace full payload of all SAN records (req,resp,iels)
zfcp: fix payload trace length for SAN request&response
zfcp: fix D_ID field with actual value on tracing SAN responses
zfcp: restore tracing of handle for port and LUN with HBA records
zfcp: trace on request for open and close of WKA port
zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
zfcp: retain trace level for SCSI and HBA FSF response records
zfcp: close window with unblocked rport during rport gone
zfcp: fix ELS/GS request&response length for hardware data router
zfcp: fix fc_host port_type with NPIV
ubi: Deal with interrupted erasures in WL
powerpc/pseries: Fix stack corruption in htpe code
powerpc/64: Fix incorrect return value from __copy_tofrom_user
powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data()
powerpc/powernv: Use CPU-endian hub diag-data type in pnv_eeh_get_and_dump_hub_diag()
powerpc/powernv: Pass CPU-endian PE number to opal_pci_eeh_freeze_clear()
powerpc/vdso64: Use double word compare on pointers
dm crypt: fix crash on exit
dm mpath: check if path's request_queue is dying in activate_path()
dm: return correct error code in dm_resume()'s retry loop
dm: mark request_queue dead before destroying the DM device
perf intel-pt: Fix MTC timestamp calculation for large MTC periods
perf intel-pt: Fix estimated timestamps for cycle-accurate mode
perf intel-pt: Fix snapshot overlap detection decoder errors
pstore/ram: Use memcpy_fromio() to save old buffer
pstore/ram: Use memcpy_toio instead of memcpy
pstore/core: drop cmpxchg based updates
pstore/ramoops: fixup driver removal
parisc: Increase initial kernel mapping size
parisc: Fix kernel memory layout regarding position of __gp
parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
cpufreq: intel_pstate: Fix unsafe HWP MSR access
platform: don't return 0 from platform_get_irq[_byname]() on error
PCI: Mark Atheros AR9580 to avoid bus reset
mmc: sdhci: cast unsigned int to unsigned long long to avoid unexpeted error
mmc: block: don't use CMD23 with very old MMC cards
rtlwifi: Fix missing country code for Great Britain
PM / devfreq: event: remove duplicate devfreq_event_get_drvdata()
clk: imx6: initialize GPU clocks
regulator: tps65910: Work around silicon erratum SWCZ010
mei: me: add kaby point device ids
gpio: mpc8xxx: Correct irq handler function
cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions
UPSTREAM: cpu/hotplug: Handle unbalanced hotplug enable/disable
UPSTREAM: arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y
UPSTREAM: arm64: kaslr: keep modules close to the kernel when DYNAMIC_FTRACE=y
cgroup: Remove leftover instances of allow_attach
BACKPORT: lib: harden strncpy_from_user
CHROMIUM: cgroups: relax permissions on moving tasks between cgroups
CHROMIUM: remove Android's cgroup generic permissions checks
Linux 4.4.27
cfq: fix starvation of asynchronous writes
vfs: move permission checking into notify_change() for utimes(NULL)
dlm: free workqueues after the connections
crypto: vmx - Fix memory corruption caused by p8_ghash
crypto: ghash-generic - move common definitions to a new header file
ext4: release bh in make_indexed_dir
ext4: allow DAX writeback for hole punch
ext4: fix memory leak in ext4_insert_range()
ext4: reinforce check of i_dtime when clearing high fields of uid and gid
ext4: enforce online defrag restriction for encrypted files
scsi: ibmvfc: Fix I/O hang when port is not mapped
scsi: arcmsr: Simplify user_len checking
scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
async_pq_val: fix DMA memory leak
reiserfs: switch to generic_{get,set,remove}xattr()
reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
ASoC: Intel: Atom: add a missing star in a memcpy call
brcmfmac: fix memory leak in brcmf_fill_bss_param
i40e: avoid NULL pointer dereference and recursive errors on early PCI error
fuse: fix killing s[ug]id in setattr
fuse: invalidate dir dentry after chmod
fuse: listxattr: verify xattr list
drivers: base: dma-mapping: page align the size when unmap_kernel_range
btrfs: assign error values to the correct bio structs
serial: 8250_dw: Check the data->pclk when get apb_pclk
arm64: Use PoU cache instr for I/D coherency
arm64: mm: add code to safely replace TTBR1_EL1
arm64: mm: place __cpu_setup in .text
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: head.S: use memset to clear BSS
arm64: mm: specialise pagetable allocators
arm64: mm: remove pointless PAGE_MASKing
asm-generic: Fix local variable shadow in __set_fixmap_offset
arm64: mm: fold alternatives into .init
ARM: 8511/1: ARM64: kernel: PSCI: move PSCI idle management code to drivers/firmware
ARM: 8481/2: drivers: psci: replace psci firmware calls
ARM: 8480/2: arm64: add implementation for arm-smccc
ARM: 8479/2: add implementation for arm-smccc
ARM: 8478/2: arm/arm64: add arm-smccc
ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies
ARM: 8458/1: bL_switcher: add GIC dependency
Linux 4.4.26
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
x86/build: Build compressed x86 kernels as PIE
arm64: Remove stack duplicating code from jprobes
arm64: kprobes: Add KASAN instrumentation around stack accesses
arm64: kprobes: Cleanup jprobe_return
arm64: kprobes: Fix overflow when saving stack
arm64: kprobes: WARN if attempting to step with PSTATE.D=1
kprobes: Add arm64 case in kprobe example module
arm64: Add kernel return probes support (kretprobes)
arm64: Add trampoline code for kretprobes
arm64: kprobes instruction simulation support
arm64: Treat all entry code as non-kprobe-able
arm64: Blacklist non-kprobe-able symbol
arm64: Kprobes with single stepping support
arm64: add conditional instruction simulation support
arm64: Add more test functions to insn.c
arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
Linux 4.4.25
tpm_crb: fix crb_req_canceled behavior
tpm: fix a race condition in tpm2_unseal_trusted()
ima: use file_dentry()
ARM: cpuidle: Fix error return code
ARM: dts: MSM8064 remove flags from SPMI/MPP IRQs
ARM: dts: mvebu: armada-390: add missing compatibility string and bracket
x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
x86/irq: Prevent force migration of irqs which are not in the vector domain
x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
KVM: PPC: BookE: Fix a sanity check
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
mfd: wm8350-i2c: Make sure the i2c regmap functions are compiled
mfd: 88pm80x: Double shifting bug in suspend/resume
mfd: atmel-hlcdc: Do not sleep in atomic context
mfd: rtsx_usb: Avoid setting ucr->current_sg.status
ALSA: usb-line6: use the same declaration as definition in header for MIDI manufacturer ID
ALSA: usb-audio: Extend DragonFly dB scale quirk to cover other variants
ALSA: ali5451: Fix out-of-bound position reporting
timekeeping: Fix __ktime_get_fast_ns() regression
time: Add cycles to nanoseconds translation
mm: Fix build for hardened usercopy
ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
ANDROID: binder: Add strong ref checks
UPSTREAM: staging/android/ion : fix a race condition in the ion driver
ANDROID: android-base: CONFIG_HARDENED_USERCOPY=y
UPSTREAM: fs/proc/kcore.c: Add bounce buffer for ktext data
UPSTREAM: fs/proc/kcore.c: Make bounce buffer global for read
BACKPORT: arm64: Correctly bounds check virt_addr_valid
Fix a build breakage in IO latency hist code.
UPSTREAM: efi: include asm/early_ioremap.h not asm/efi.h to get early_memremap
UPSTREAM: ia64: split off early_ioremap() declarations into asm/early_ioremap.h
FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
FROMLIST: arm64: xen: Enable user access before a privcmd hvc call
FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled
FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution
FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
UPSTREAM: arm64: Handle el1 synchronous instruction aborts cleanly
UPSTREAM: arm64: include alternative handling in dcache_by_line_op
UPSTREAM: arm64: fix "dc cvau" cache operation on errata-affected core
UPSTREAM: Revert "arm64: alternatives: add enable parameter to conditional asm macros"
UPSTREAM: arm64: Add new asm macro copy_page
UPSTREAM: arm64: kill ESR_LNX_EXEC
UPSTREAM: arm64: add macro to extract ESR_ELx.EC
UPSTREAM: arm64: mm: mark fault_info table const
UPSTREAM: arm64: fix dump_instr when PAN and UAO are in use
BACKPORT: arm64: Fold proc-macros.S into assembler.h
UPSTREAM: arm64: choose memstart_addr based on minimum sparsemem section alignment
UPSTREAM: arm64/mm: ensure memstart_addr remains sufficiently aligned
UPSTREAM: arm64/kernel: fix incorrect EL0 check in inv_entry macro
UPSTREAM: arm64: Add macros to read/write system registers
UPSTREAM: arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM
UPSTREAM: arm64/efi: split off EFI init and runtime code for reuse by 32-bit ARM
UPSTREAM: arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP
BACKPORT: arm64: only consider memblocks with NOMAP cleared for linear mapping
UPSTREAM: mm/memblock: add MEMBLOCK_NOMAP attribute to memblock memory table
ANDROID: dm: android-verity: Remove fec_header location constraint
BACKPORT: audit: consistently record PIDs with task_tgid_nr()
android-base.cfg: Enable kernel ASLR
UPSTREAM: vmlinux.lds.h: allow arch specific handling of ro_after_init data section
UPSTREAM: arm64: spinlock: fix spin_unlock_wait for LSE atomics
UPSTREAM: arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE
UPSTREAM: arm64: Only select ARM64_MODULE_PLTS if MODULES=y
sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag
sched/fair: remove printk while schedule is in progress
ANDROID: fs: FS tracepoints to track IO.
sched/walt: Drop arch-specific timer access
ANDROID: fiq_debugger: Pass task parameter to unwind_frame()
eas/sched/fair: Fixing comments in find_best_target.
input: keyreset: switch to orderly_reboot
UPSTREAM: tun: fix transmit timestamp support
UPSTREAM: arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP
net: inet: diag: expose the socket mark to privileged processes.
net: diag: make udp_diag_destroy work for mapped addresses.
net: diag: support SOCK_DESTROY for UDP sockets
net: diag: allow socket bytecode filters to match socket marks
net: diag: slightly refactor the inet_diag_bc_audit error checks.
net: diag: Add support to filter on device index
UPSTREAM: brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
Linux 4.4.24
ALSA: hda - Add the top speaker pin config for HP Spectre x360
ALSA: hda - Fix headset mic detection problem for several Dell laptops
ACPICA: acpi_get_sleep_type_data: Reduce warnings
ALSA: hda - Adding one more ALC255 pin definition for headset problem
Revert "usbtmc: convert to devm_kzalloc"
USB: serial: cp210x: Add ID for a Juniper console
Staging: fbtft: Fix bug in fbtft-core
usb: misc: legousbtower: Fix NULL pointer deference
USB: serial: cp210x: fix hardware flow-control disable
dm log writes: fix bug with too large bios
clk: xgene: Add missing parenthesis when clearing divider value
aio: mark AIO pseudo-fs noexec
batman-adv: remove unused callback from batadv_algo_ops struct
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
IB/mlx4: Fix code indentation in QP1 MAD flow
IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
IB/ipoib: Don't allow MC joins during light MC flush
IB/core: Fix use after free in send_leave function
IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
dmaengine: at_xdmac: fix to pass correct device identity to free_irq()
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
ASoC: omap-mcpdm: Fix irq resource handling
sysctl: handle error writing UINT_MAX to u32 fields
powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get()
brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill
brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain
ASoC: Intel: Skylake: Fix error return code in skl_probe()
pNFS/flexfiles: Fix layoutcommit after a commit to DS
pNFS/files: Fix layoutcommit after a commit to DS
NFS: Don't drop CB requests with invalid principals
svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
dmaengine: at_xdmac: fix debug string
fnic: pci_dma_mapping_error() doesn't return an error code
avr32: off by one in at32_init_pio()
ath9k: Fix programming of minCCA power threshold
gspca: avoid unused variable warnings
em28xx-i2c: rt_mutex_trylock() returns zero on failure
NFC: fdp: Detect errors from fdp_nci_create_conn()
iwlmvm: mvm: set correct state in smart-fifo configuration
tile: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
pstore: drop file opened reference count
blk-mq: actually hook up defer list when running requests
hwrng: omap - Fix assumption that runtime_get_sync will always succeed
ARM: sa1111: fix pcmcia suspend/resume
ARM: shmobile: fix regulator quirk for Gen2
ARM: sa1100: clear reset status prior to reboot
ARM: sa1100: fix 3.6864MHz clock
ARM: sa1100: register clocks early
ARM: sun5i: Fix typo in trip point temperature
regulator: qcom_smd: Fix voltage ranges for pm8x41
regulator: qcom_spmi: Update mvs1/mvs2 switches on pm8941
regulator: qcom_spmi: Add support for get_mode/set_mode on switches
regulator: qcom_spmi: Add support for S4 supply on pm8941
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
printk: fix parsing of "brl=" option
MIPS: uprobes: fix use of uninitialised variable
MIPS: Malta: Fix IOCU disable switch read for MIPS64
MIPS: fix uretprobe implementation
MIPS: uprobes: remove incorrect set_orig_insn
arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
gpio: sa1100: fix irq probing for ucb1x00
usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
ceph: fix race during filling readdir cache
iwlwifi: mvm: don't use ret when not initialised
iwlwifi: pcie: fix access to scratch buffer
spi: sh-msiof: Avoid invalid clock generator parameters
hwmon: (adt7411) set bit 3 in CFG1 register
nvmem: Declare nvmem_cell_read() consistently
ipvs: fix bind to link-local mcast IPv6 address in backup
tools/vm/slabinfo: fix an unintentional printf
mmc: pxamci: fix potential oops
drivers/perf: arm_pmu: Fix leak in error path
pinctrl: Flag strict is a field in struct pinmux_ops
pinctrl: uniphier: fix .pin_dbg_show() callback
i40e: avoid null pointer dereference
perf/core: Fix pmu::filter_match for SW-led groups
iwlwifi: mvm: fix a few firmware capability checks
usb: musb: fix DMA for host mode
usb: musb: Fix DMA desired mode for Mentor DMA engine
ARM: 8617/1: dma: fix dma_max_pfn()
ARM: 8616/1: dt: Respect property size when parsing CPUs
drm/radeon/si/dpm: add workaround for for Jet parts
drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion
x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
x86/init: Fix cr4_init_shadow() on CR4-less machines
can: dev: fix deadlock reported after bus-off
mm,ksm: fix endless looping in allocating memory when ksm enable
mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
cpuset: handle race between CPU hotplug and cpuset_hotplug_work
usercopy: fold builtin_const check into inline function
Linux 4.4.23
hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
qxl: check for kmap failures
power: supply: max17042_battery: fix model download bug.
power_supply: tps65217-charger: fix missing platform_set_drvdata()
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
PM / hibernate: Restore processor state before using per-CPU variables
MIPS: paravirt: Fix undefined reference to smp_bootstrap
MIPS: Add a missing ".set pop" in an early commit
MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
MIPS: Remove compact branch policy Kconfig entries
MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
MIPS: Fix pre-r6 emulation FPU initialisation
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
i2c-eg20t: fix race between i2c init and interrupt enable
btrfs: ensure that file descriptor used with subvol ioctls is a dir
nl80211: validate number of probe response CSA counters
can: flexcan: fix resume function
mm: delete unnecessary and unsafe init_tlb_ubc()
tracing: Move mutex to protect against resetting of seq data
fix memory leaks in tracing_buffers_splice_read()
power: reset: hisi-reboot: Unmap region obtained by of_iomap
mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
mtd: maps: sa1100-flash: potential NULL dereference
fix fault_in_multipages_...() on architectures with no-op access_ok()
fanotify: fix list corruption in fanotify_get_response()
fsnotify: add a way to stop queueing events on group shutdown
xfs: prevent dropping ioend completions during buftarg wait
autofs: use dentry flags to block walks during expire
autofs races
pwm: Mark all devices as "might sleep"
bridge: re-introduce 'fix parsing of MLDv2 reports'
net: smc91x: fix SMC accesses
Revert "phy: IRQ cannot be shared"
net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
net/mlx5: Added missing check of msg length in verifying its signature
tipc: fix NULL pointer dereference in shutdown()
net/irda: handle iriap_register_lsap() allocation failure
vti: flush x-netns xfrm cache when vti interface is removed
af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
Revert "af_unix: Fix splice-bind deadlock"
bonding: Fix bonding crash
megaraid: fix null pointer check in megasas_detach_one().
nouveau: fix nv40_perfctr_next() cleanup regression
Staging: iio: adc: fix indent on break statement
iwlegacy: avoid warning about missing braces
ath9k: fix misleading indentation
am437x-vfpe: fix typo in vpfe_get_app_input_index
Add braces to avoid "ambiguous ‘else’" compiler warnings
net: caif: fix misleading indentation
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
Disable "frame-address" warning
Disable "maybe-uninitialized" warning globally
gcov: disable -Wmaybe-uninitialized warning
Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
kbuild: forbid kernel directory to contain spaces and colons
tools: Support relative directory path for 'O='
Makefile: revert "Makefile: Document ability to make file.lst and file.S" partially
kbuild: Do not run modules_install and install in paralel
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
ocfs2/dlm: fix race between convert and migration
crypto: echainiv - Replace chaining with multiplication
crypto: skcipher - Fix blkcipher walk OOM crash
crypto: arm/aes-ctr - fix NULL dereference in tail processing
crypto: arm64/aes-ctr - fix NULL dereference in tail processing
tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
tcp: fix use after free in tcp_xmit_retransmit_queue()
tcp: cwnd does not increase in TCP YeAH
ipv6: release dst in ping_v6_sendmsg
ipv4: panic in leaf_walk_rcu due to stale node pointer
reiserfs: fix "new_insert_key may be used uninitialized ..."
Fix build warning in kernel/cpuset.c
include/linux/kernel.h: change abs() macro so it uses consistent return type
Linux 4.4.22
openrisc: fix the fix of copy_from_user()
avr32: fix 'undefined reference to `___copy_from_user'
ia64: copy_from_user() should zero the destination on access_ok() failure
genirq/msi: Fix broken debug output
ppc32: fix copy_from_user()
sparc32: fix copy_from_user()
mn10300: copy_from_user() should zero on access_ok() failure...
nios2: copy_from_user() should zero the tail of destination
openrisc: fix copy_from_user()
parisc: fix copy_from_user()
metag: copy_from_user() should zero the destination on access_ok() failure
alpha: fix copy_from_user()
asm-generic: make copy_from_user() zero the destination properly
mips: copy_from_user() must zero the destination on access_ok() failure
hexagon: fix strncpy_from_user() error return
sh: fix copy_from_user()
score: fix copy_from_user() and friends
blackfin: fix copy_from_user()
cris: buggered copy_from_user/copy_to_user/clear_user
frv: fix clear_user()
asm-generic: make get_user() clear the destination on errors
ARC: uaccess: get_user to zero out dest in cause of fault
s390: get_user() should zero on failure
score: fix __get_user/get_user
nios2: fix __get_user()
sh64: failing __get_user() should zero
m32r: fix __get_user()
mn10300: failing __get_user() and get_user() should zero
fix minor infoleak in get_user_ex()
microblaze: fix copy_from_user()
avr32: fix copy_from_user()
microblaze: fix __get_user()
fix iov_iter_fault_in_readable()
irqchip/atmel-aic: Fix potential deadlock in ->xlate()
genirq: Provide irq_gc_{lock_irqsave,unlock_irqrestore}() helpers
drm: Only use compat ioctl for addfb2 on X86/IA64
drm: atmel-hlcdc: Fix vertical scaling
net: simplify napi_synchronize() to avoid warnings
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
soc: qcom/spm: shut up uninitialized variable warning
pinctrl: at91-pio4: use %pr format string for resource
mmc: dw_mmc: use resource_size_t to store physical address
drm/i915: Avoid pointer arithmetic in calculating plane surface offset
mpssd: fix buffer overflow warning
gma500: remove annoying deprecation warning
ipv6: addrconf: fix dev refcont leak when DAD failed
sched/core: Fix a race between try_to_wake_up() and a woken up task
Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"
ath9k: fix using sta->drv_priv before initializing it
md-cluster: make md-cluster also can work when compiled into kernel
xhci: fix null pointer dereference in stop command timeout function
fuse: direct-io: don't dirty ITER_BVEC pages
Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
crypto: cryptd - initialize child shash_desc on import
arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
pinctrl: sunxi: fix uart1 CTS/RTS pins at PG on A23/A33
pinctrl: pistachio: fix mfio pll_lock pinmux
dm crypt: fix error with too large bios
dm log writes: move IO accounting earlier to fix error path
dm log writes: fix check of kthread_run() return value
bus: arm-ccn: Fix XP watchpoint settings bitmask
bus: arm-ccn: Do not attempt to configure XPs for cycle counter
bus: arm-ccn: Fix PMU handling of MN
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: overo: fix gpmc nand on boards with ethernet
ARM: dts: overo: fix gpmc nand cs0 range
ARM: dts: imx6qdl: Fix SPDIF regression
ARM: OMAP3: hwmod data: Add sysc information for DSI
ARM: kirkwood: ib62x0: fix size of u-boot environment partition
ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
ARM: imx6: add missing BM_CLPCR_BYP_MMDC_CH0_LPM_HS setting for imx6ul
ARM: AM43XX: hwmod: Fix RSTST register offset for pruss
cpuset: make sure new tasks conform to the current config of the cpuset
net: thunderx: Fix OOPs with ethtool --register-dump
USB: change bInterval default to 10 ms
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
USB: serial: simple: add support for another Infineon flashloader
serial: 8250: added acces i/o products quad and octal serial cards
serial: 8250_mid: fix divide error bug if baud rate is 0
iio: ensure ret is initialized to zero before entering do loop
iio:core: fix IIO_VAL_FRACTIONAL sign handling
iio: accel: kxsd9: Fix scaling bug
iio: fix pressure data output unit in hid-sensor-attributes
iio: accel: bmc150: reset chip at init time
iio: adc: at91: unbreak channel adc channel 3
iio: ad799x: Fix buffered capture for ad7991/ad7995/ad7999
iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
iio: adc: rockchip_saradc: reset saradc controller before programming it
iio: proximity: as3935: set up buffer timestamps for non-zero values
iio: accel: kxsd9: Fix raw read return
kvm-arm: Unmap shadow pagetables properly
x86/AMD: Apply erratum 665 on machines without a BIOS fix
x86/paravirt: Do not trace _paravirt_ident_*() functions
ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
IB/uverbs: Fix race between uverbs_close and remove_one
dm flakey: fix reads to be issued if drop_writes configured
audit: fix exe_file access in audit_exe_compare
mm: introduce get_task_exe_file
kexec: fix double-free when failing to relocate the purgatory
NFSv4.1: Fix the CREATE_SESSION slot number accounting
pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
NFSv4.x: Fix a refcount leak in nfs_callback_up_net
pNFS: The client must not do I/O to the DS if it's lease has expired
kernfs: don't depend on d_find_any_alias() when generating notifications
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/powernv : Drop reference added by kset_find_obj()
powerpc/tm: do not use r13 for tabort_syscall
tipc: move linearization of buffers to generic code
lightnvm: put bio before return
fscrypto: require write access to mount to set encryption policy
Revert "KVM: x86: fix missed hardware breakpoints"
MIPS: KVM: Check for pfn noslot case
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
fscrypto: add authorization check for setting encryption policy
ext4: use __GFP_NOFAIL in ext4_free_blocks()
Conflicts:
arch/arm/kernel/devtree.c
arch/arm64/Kconfig
arch/arm64/kernel/arm64ksyms.c
arch/arm64/kernel/psci.c
arch/arm64/mm/fault.c
drivers/android/binder.c
drivers/usb/host/xhci-hub.c
fs/ext4/readpage.c
include/linux/mmc/core.h
include/linux/mmzone.h
mm/memcontrol.c
net/core/filter.c
net/netlink/af_netlink.c
net/netlink/af_netlink.h
Change-Id: I99fe7a0914e83e284b11b33185b71448a8999d1f
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
Memory compaction can be currently performed in several contexts:
- kswapd balancing a zone after a high-order allocation failure
- direct compaction to satisfy a high-order allocation, including THP
page fault attemps
- khugepaged trying to collapse a hugepage
- manually from /proc
The purpose of compaction is two-fold. The obvious purpose is to
satisfy a (pending or future) high-order allocation, and is easy to
evaluate. The other purpose is to keep overal memory fragmentation low
and help the anti-fragmentation mechanism. The success wrt the latter
purpose is more
The current situation wrt the purposes has a few drawbacks:
- compaction is invoked only when a high-order page or hugepage is not
available (or manually). This might be too late for the purposes of
keeping memory fragmentation low.
- direct compaction increases latency of allocations. Again, it would
be better if compaction was performed asynchronously to keep
fragmentation low, before the allocation itself comes.
- (a special case of the previous) the cost of compaction during THP
page faults can easily offset the benefits of THP.
- kswapd compaction appears to be complex, fragile and not working in
some scenarios. It could also end up compacting for a high-order
allocation request when it should be reclaiming memory for a later
order-0 request.
To improve the situation, we should be able to benefit from an
equivalent of kswapd, but for compaction - i.e. a background thread
which responds to fragmentation and the need for high-order allocations
(including hugepages) somewhat proactively.
One possibility is to extend the responsibilities of kswapd, which could
however complicate its design too much. It should be better to let
kswapd handle reclaim, as order-0 allocations are often more critical
than high-order ones.
Another possibility is to extend khugepaged, but this kthread is a
single instance and tied to THP configs.
This patch goes with the option of a new set of per-node kthreads called
kcompactd, and lays the foundations, without introducing any new
tunables. The lifecycle mimics kswapd kthreads, including the memory
hotplug hooks.
For compaction, kcompactd uses the standard compaction_suitable() and
ompact_finished() criteria and the deferred compaction functionality.
Unlike direct compaction, it uses only sync compaction, as there's no
allocation latency to minimize.
This patch doesn't yet add a call to wakeup_kcompactd. The kswapd
compact/reclaim loop for high-order pages will be replaced by waking up
kcompactd in the next patch with the description of what's wrong with
the old approach.
Waking up of the kcompactd threads is also tied to kswapd activity and
follows these rules:
- we don't want to affect any fastpaths, so wake up kcompactd only from
the slowpath, as it's done for kswapd
- if kswapd is doing reclaim, it's more important than compaction, so
don't invoke kcompactd until kswapd goes to sleep
- the target order used for kswapd is passed to kcompactd
Future possible future uses for kcompactd include the ability to wake up
kcompactd on demand in special situations, such as when hugepages are
not available (currently not done due to __GFP_NO_KSWAPD) or when a
fragmentation event (i.e. __rmqueue_fallback()) occurs. It's also
possible to perform periodic compaction with kcompactd.
[arnd@arndb.de: fix build errors with kcompactd]
[paul.gortmaker@windriver.com: don't use modular references for non modular code]
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 698b1b30642f1ff0ea10ef1de9745ab633031377
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I987ae548cba936987b8479dc02de67d0f88b9cb6
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
* v4.4-16.09-android-tmp:
unsafe_[get|put]_user: change interface to use a error target label
usercopy: remove page-spanning test for now
usercopy: fix overlap check for kernel text
mm/slub: support left redzone
Linux 4.4.21
lib/mpi: mpi_write_sgl(): fix skipping of leading zero limbs
regulator: anatop: allow regulator to be in bypass mode
hwrng: exynos - Disable runtime PM on probe failure
cpufreq: Fix GOV_LIMITS handling for the userspace governor
metag: Fix atomic_*_return inline asm constraints
scsi: fix upper bounds check of sense key in scsi_sense_key_string()
ALSA: timer: fix NULL pointer dereference on memory allocation failure
ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE
ALSA: timer: fix NULL pointer dereference in read()/ioctl() race
ALSA: hda - Enable subwoofer on Dell Inspiron 7559
ALSA: hda - Add headset mic quirk for Dell Inspiron 5468
ALSA: rawmidi: Fix possible deadlock with virmidi registration
ALSA: fireworks: accessing to user space outside spinlock
ALSA: firewire-tascam: accessing to user space outside spinlock
ALSA: usb-audio: Add sample rate inquiry quirk for B850V3 CP2114
crypto: caam - fix IV loading for authenc (giv)decryption
uprobes: Fix the memcg accounting
x86/apic: Do not init irq remapping if ioapic is disabled
vhost/scsi: fix reuse of &vq->iov[out] in response
bcache: RESERVE_PRIO is too small by one when prio_buckets() is a power of two.
ubifs: Fix assertion in layout_in_gaps()
ovl: fix workdir creation
ovl: listxattr: use strnlen()
ovl: remove posix_acl_default from workdir
ovl: don't copy up opaqueness
wrappers for ->i_mutex access
lustre: remove unused declaration
timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING
timekeeping: Cap array access in timekeeping_debug
xfs: fix superblock inprogress check
ASoC: atmel_ssc_dai: Don't unconditionally reset SSC on stream startup
drm/msm: fix use of copy_from_user() while holding spinlock
drm: Reject page_flip for !DRIVER_MODESET
drm/radeon: fix radeon_move_blit on 32bit systems
s390/sclp_ctl: fix potential information leak with /dev/sclp
rds: fix an infoleak in rds_inc_info_copy
powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
nvme: Call pci_disable_device on the error path.
cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
block: make sure a big bio is split into at most 256 bvecs
block: Fix race triggered by blk_set_queue_dying()
ext4: avoid modifying checksum fields directly during checksum verification
ext4: avoid deadlock when expanding inode size
ext4: properly align shifted xattrs when expanding inodes
ext4: fix xattr shifting when expanding inodes part 2
ext4: fix xattr shifting when expanding inodes
ext4: validate that metadata blocks do not overlap superblock
net: Use ns_capable_noaudit() when determining net sysctl permissions
kernel: Add noaudit variant of ns_capable()
KEYS: Fix ASN.1 indefinite length object parsing
drivers:hv: Lock access to hyperv_mmio resource tree
cxlflash: Move to exponential back-off when cmd_room is not available
netfilter: x_tables: check for size overflow
drm/amdgpu/cz: enable/disable vce dpm even if vce pg is disabled
cred: Reject inodes with invalid ids in set_create_file_as()
fs: Check for invalid i_uid in may_follow_link()
IB/IPoIB: Do not set skb truesize since using one linearskb
udp: properly support MSG_PEEK with truncated buffers
crypto: nx-842 - Mask XERS0 bit in return value
cxlflash: Fix to avoid virtual LUN failover failure
cxlflash: Fix to escalate LINK_RESET also on port 1
tipc: fix nl compat regression for link statistics
tipc: fix an infoleak in tipc_nl_compat_link_dump
netfilter: x_tables: check for size overflow
Bluetooth: Add support for Intel Bluetooth device 8265 [8087:0a2b]
drm/i915: Check VBT for port presence in addition to the strap on VLV/CHV
drm/i915: Only ignore eDP ports that are connected
Input: xpad - move pending clear to the correct location
net: thunderx: Fix link status reporting
x86/hyperv: Avoid reporting bogus NMI status for Gen2 instances
crypto: vmx - IV size failing on skcipher API
tda10071: Fix dependency to REGMAP_I2C
crypto: vmx - Fix ABI detection
crypto: vmx - comply with ABIs that specify vrsave as reserved.
HID: core: prevent out-of-bound readings
lpfc: Fix DMA faults observed upon plugging loopback connector
block: fix blk_rq_get_max_sectors for driver private requests
irqchip/gicv3-its: numa: Enable workaround for Cavium thunderx erratum 23144
clocksource: Allow unregistering the watchdog
btrfs: Continue write in case of can_not_nocow
blk-mq: End unstarted requests on dying queue
cxlflash: Fix to resolve dead-lock during EEH recovery
drm/radeon/mst: fix regression in lane/link handling.
ecryptfs: fix handling of directory opening
ALSA: hda: add AMD Polaris-10/11 AZ PCI IDs with proper driver caps
drm: Balance error path for GEM handle allocation
ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO
time: Verify time values in adjtimex ADJ_SETOFFSET to avoid overflow
Input: xpad - correctly handle concurrent LED and FF requests
net: thunderx: Fix receive packet stats
net: thunderx: Fix for multiqset not configured upon interface toggle
perf/x86/cqm: Fix CQM memory leak and notifier leak
perf/x86/cqm: Fix CQM handling of grouping events into a cache_group
s390/crypto: provide correct file mode at device register.
proc: revert /proc/<pid>/maps [stack:TID] annotation
intel_idle: Support for Intel Xeon Phi Processor x200 Product Family
cxlflash: Fix to avoid unnecessary scan with internal LUNs
Drivers: hv: vmbus: don't manipulate with clocksources on crash
Drivers: hv: vmbus: avoid scheduling in interrupt context in vmbus_initiate_unload()
Drivers: hv: vmbus: avoid infinite loop in init_vp_index()
arcmsr: fixes not release allocated resource
arcmsr: fixed getting wrong configuration data
s390/pci_dma: fix DMA table corruption with > 4 TB main memory
net/mlx5e: Don't modify CQ before it was created
net/mlx5e: Don't try to modify CQ moderation if it is not supported
mmc: sdhci: Do not BUG on invalid vdd
UVC: Add support for R200 depth camera
sched/numa: Fix use-after-free bug in the task_numa_compare
ALSA: hda - add codec support for Kabylake display audio codec
drm/i915: Fix hpd live status bits for g4x
tipc: fix nullptr crash during subscription cancel
arm64: Add workaround for Cavium erratum 27456
net: thunderx: Fix for Qset error due to CQ full
drm/radeon: fix dp link rate selection (v2)
drm/amdgpu: fix dp link rate selection (v2)
qla2xxx: Use ATIO type to send correct tmr response
mmc: sdhci: 64-bit DMA actually has 4-byte alignment
drm/atomic: Do not unset crtc when an encoder is stolen
drm/i915/skl: Add missing SKL ids
drm/i915/bxt: update list of PCIIDs
hrtimer: Catch illegal clockids
i40e/i40evf: Fix RSS rx-flow-hash configuration through ethtool
mpt3sas: Fix for Asynchronous completion of timedout IO and task abort of timedout IO.
mpt3sas: A correction in unmap_resources
net: cavium: liquidio: fix check for in progress flag
arm64: KVM: Configure TCR_EL2.PS at runtime
irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor
pwm: lpc32xx: fix and simplify duty cycle and period calculations
pwm: lpc32xx: correct number of PWM channels from 2 to 1
pwm: fsl-ftm: Fix clock enable/disable when using PM
megaraid_sas: Add an i/o barrier
megaraid_sas: Fix SMAP issue
megaraid_sas: Do not allow PCI access during OCR
s390/cio: update measurement characteristics
s390/cio: ensure consistent measurement state
s390/cio: fix measurement characteristics memleak
qeth: initialize net_device with carrier off
lpfc: Fix external loopback failure.
lpfc: Fix mbox reuse in PLOGI completion
lpfc: Fix RDP Speed reporting.
lpfc: Fix crash in fcp command completion path.
lpfc: Fix driver crash when module parameter lpfc_fcp_io_channel set to 16
lpfc: Fix RegLogin failed error seen on Lancer FC during port bounce
lpfc: Fix the FLOGI discovery logic to comply with T11 standards
lpfc: Fix FCF Infinite loop in lpfc_sli4_fcf_rr_next_index_get.
cxl: Enable PCI device ID for future IBM CXL adapter
cxl: fix build for GCC 4.6.x
cxlflash: Enable device id for future IBM CXL adapter
cxlflash: Resolve oops in wait_port_offline
cxlflash: Fix to resolve cmd leak after host reset
cxl: Fix DSI misses when the context owning task exits
cxl: Fix possible idr warning when contexts are released
Drivers: hv: vmbus: fix rescind-offer handling for device without a driver
Drivers: hv: vmbus: serialize process_chn_event() and vmbus_close_internal()
Drivers: hv: vss: run only on supported host versions
drivers/hv: cleanup synic msrs if vmbus connect failed
Drivers: hv: util: catch allocation errors
tools: hv: report ENOSPC errors in hv_fcopy_daemon
Drivers: hv: utils: run polling callback always in interrupt context
Drivers: hv: util: Increase the timeout for util services
lightnvm: fix missing grown bad block type
lightnvm: fix locking and mempool in rrpc_lun_gc
lightnvm: unlock rq and free ppa_list on submission fail
lightnvm: add check after mempool allocation
lightnvm: fix incorrect nr_free_blocks stat
lightnvm: fix bio submission issue
cxlflash: a couple off by one bugs
fm10k: Cleanup exception handling for mailbox interrupt
fm10k: Cleanup MSI-X interrupts in case of failure
fm10k: reinitialize queuing scheme after calling init_hw
fm10k: always check init_hw for errors
fm10k: reset max_queues on init_hw_vf failure
fm10k: Fix handling of NAPI budget when multiple queues are enabled per vector
fm10k: Correct MTU for jumbo frames
fm10k: do not assume VF always has 1 queue
clk: xgene: Fix divider with non-zero shift value
e1000e: fix division by zero on jumbo MTUs
e1000: fix data race between tx_ring->next_to_clean
ixgbe: Fix handling of NAPI budget when multiple queues are enabled per vector
igb: fix NULL derefs due to skipped SR-IOV enabling
igb: use the correct i210 register for EEMNGCTL
igb: don't unmap NULL hw_addr
i40e: Fix Rx hash reported to the stack by our driver
i40e: clean whole mac filter list
i40evf: check rings before freeing resources
i40e: don't add zero MAC filter
i40e: properly delete VF MAC filters
i40e: Fix memory leaks, sideband filter programming
i40e: fix: do not sleep in netdev_ops
i40e/i40evf: Fix RS bit update in Tx path and disable force WB workaround
i40evf: handle many MAC filters correctly
i40e: Workaround fix for mss < 256 issue
UPSTREAM: audit: fix a double fetch in audit_log_single_execve_arg()
UPSTREAM: ARM: 8494/1: mm: Enable PXN when running non-LPAE kernel on LPAE processor
FIXUP: sched/tune: update accouting before CPU capacity
FIXUP: sched/tune: add fixes missing from a previous patch
arm: Fix #if/#ifdef typo in topology.c
arm: Fix build error "conflicting types for 'scale_cpu_capacity'"
sched/walt: use do_div instead of division operator
DEBUG: cpufreq: fix cpu_capacity tracing build for non-smp systems
sched/walt: include missing header for arm_timer_read_counter()
cpufreq: Kconfig: Fixup incorrect selection by CPU_FREQ_DEFAULT_GOV_SCHED
sched/fair: Avoid redundant idle_cpu() call in update_sg_lb_stats()
FIXUP: sched: scheduler-driven cpu frequency selection
sched/rt: Add Kconfig option to enable panicking for RT throttling
sched/rt: print RT tasks when RT throttling is activated
UPSTREAM: sched: Fix a race between __kthread_bind() and sched_setaffinity()
sched/fair: Favor higher cpus only for boosted tasks
vmstat: make vmstat_updater deferrable again and shut down on idle
sched/fair: call OPP update when going idle after migration
sched/cpufreq_sched: fix thermal capping events
sched/fair: Picking cpus with low OPPs for tasks that prefer idle CPUs
FIXUP: sched/tune: do initialization as a postcore_initicall
DEBUG: sched: add tracepoint for RD overutilized
sched/tune: Introducing a new schedtune attribute prefer_idle
sched: use util instead of capacity to select busy cpu
arch_timer: add error handling when the MPM global timer is cleared
FIXUP: sched: Fix double-release of spinlock in move_queued_task
FIXUP: sched/fair: Fix hang during suspend in sched_group_energy
FIXUP: sched: fix SchedFreq integration for both PELT and WALT
sched: EAS: Avoid causing spikes to max-freq unnecessarily
FIXUP: sched: fix set_cfs_cpu_capacity when WALT is in use
sched/walt: Accounting for number of irqs pending on each core
sched: Introduce Window Assisted Load Tracking (WALT)
sched/tune: fix PB and PC cuts indexes definition
sched/fair: optimize idle cpu selection for boosted tasks
FIXUP: sched/tune: fix accounting for runnable tasks
sched/tune: use a single initialisation function
sched/{fair,tune}: simplify fair.c code
FIXUP: sched/tune: fix payoff calculation for boost region
sched/tune: Add support for negative boost values
FIX: sched/tune: move schedtune_nornalize_energy into fair.c
FIX: sched/tune: update usage of boosted task utilisation on CPU selection
sched/fair: add tunable to set initial task load
sched/fair: add tunable to force selection at cpu granularity
sched: EAS: take cstate into account when selecting idle core
sched/cpufreq_sched: Consolidated update
FIXUP: sched: fix build for non-SMP target
DEBUG: sched/tune: add tracepoint on P-E space filtering
DEBUG: sched/tune: add tracepoint for energy_diff() values
DEBUG: sched/tune: add tracepoint for task boost signal
arm: topology: Define TC2 energy and provide it to the scheduler
CHROMIUM: sched: update the average of nr_running
DEBUG: schedtune: add tracepoint for schedtune_tasks_update() values
DEBUG: schedtune: add tracepoint for CPU boost signal
DEBUG: schedtune: add tracepoint for SchedTune configuration update
DEBUG: sched: add energy procfs interface
DEBUG: sched,cpufreq: add cpu_capacity change tracepoint
DEBUG: sched: add tracepoint for CPU load/util signals
DEBUG: sched: add tracepoint for task load/util signals
DEBUG: sched: add tracepoint for cpu/freq scale invariance
sched/fair: filter energy_diff() based on energy_payoff value
sched/tune: add support to compute normalized energy
sched/fair: keep track of energy/capacity variations
sched/fair: add boosted task utilization
sched/{fair,tune}: track RUNNABLE tasks impact on per CPU boost value
sched/tune: compute and keep track of per CPU boost value
sched/tune: add initial support for CGroups based boosting
sched/fair: add boosted CPU usage
sched/fair: add function to convert boost value into "margin"
sched/tune: add sysctl interface to define a boost value
sched/tune: add detailed documentation
fixup! sched/fair: jump to max OPP when crossing UP threshold
fixup! sched: scheduler-driven cpu frequency selection
sched: rt scheduler sets capacity requirement
sched: deadline: use deadline bandwidth in scale_rt_capacity
sched: remove call of sched_avg_update from sched_rt_avg_update
sched/cpufreq_sched: add trace events
sched/fair: jump to max OPP when crossing UP threshold
sched/fair: cpufreq_sched triggers for load balancing
sched/{core,fair}: trigger OPP change request on fork()
sched/fair: add triggers for OPP change requests
sched: scheduler-driven cpu frequency selection
cpufreq: introduce cpufreq_driver_is_slow
sched: Consider misfit tasks when load-balancing
sched: Add group_misfit_task load-balance type
sched: Add per-cpu max capacity to sched_group_capacity
sched: Do eas idle balance regardless of the rq avg idle value
arm64: Enable max freq invariant scheduler load-tracking and capacity support
arm: Enable max freq invariant scheduler load-tracking and capacity support
sched: Update max cpu capacity in case of max frequency constraints
cpufreq: Max freq invariant scheduler load-tracking and cpu capacity support
arm64, topology: Updates to use DT bindings for EAS costing data
sched: Support for extracting EAS energy costs from DT
Documentation: DT bindings for energy model cost data required by EAS
sched: Disable energy-unfriendly nohz kicks
sched: Consider a not over-utilized energy-aware system as balanced
sched: Energy-aware wake-up task placement
sched: Determine the current sched_group idle-state
sched, cpuidle: Track cpuidle state index in the scheduler
sched: Add over-utilization/tipping point indicator
sched: Estimate energy impact of scheduling decisions
sched: Extend sched_group_energy to test load-balancing decisions
sched: Calculate energy consumption of sched_group
sched: Highest energy aware balancing sched_domain level pointer
sched: Relocated cpu_util() and change return type
sched: Compute cpu capacity available at current frequency
arm64: Cpu invariant scheduler load-tracking and capacity support
arm: Cpu invariant scheduler load-tracking and capacity support
sched: Introduce SD_SHARE_CAP_STATES sched_domain flag
sched: Initialize energy data structures
sched: Introduce energy data structures
sched: Make energy awareness a sched feature
sched: Documentation for scheduler energy cost model
sched: Prevent unnecessary active balance of single task in sched group
sched: Enable idle balance to pull single task towards cpu with higher capacity
sched: Consider spare cpu capacity at task wake-up
sched: Add cpu capacity awareness to wakeup balancing
sched: Store system-wide maximum cpu capacity in root domain
arm: Update arch_scale_cpu_capacity() to reflect change to define
arm64: Enable frequency invariant scheduler load-tracking support
arm: Enable frequency invariant scheduler load-tracking support
cpufreq: Frequency invariant scheduler load-tracking support
sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task()
FROMLIST: pstore: drop pmsg bounce buffer
UPSTREAM: usercopy: remove page-spanning test for now
UPSTREAM: usercopy: force check_object_size() inline
BACKPORT: usercopy: fold builtin_const check into inline function
UPSTREAM: x86/uaccess: force copy_*_user() to be inlined
UPSTREAM: HID: core: prevent out-of-bound readings
Android: Fix build breakages.
UPSTREAM: tty: Prevent ldisc drivers from re-using stale tty fields
UPSTREAM: netfilter: nfnetlink: correctly validate length of batch messages
cpuset: Make cpusets restore on hotplug
UPSTREAM: mm/slub: support left redzone
UPSTREAM: Make the hardened user-copy code depend on having a hardened allocator
Android: MMC/UFS IO Latency Histograms.
UPSTREAM: usercopy: fix overlap check for kernel text
UPSTREAM: usercopy: avoid potentially undefined behavior in pointer math
UPSTREAM: unsafe_[get|put]_user: change interface to use a error target label
BACKPORT: arm64: mm: fix location of _etext
BACKPORT: ARM: 8583/1: mm: fix location of _etext
BACKPORT: Don't show empty tag stats for unprivileged uids
UPSTREAM: tcp: fix use after free in tcp_xmit_retransmit_queue()
ANDROID: base-cfg: drop SECCOMP_FILTER config
UPSTREAM: [media] xc2028: unlock on error in xc2028_set_config()
UPSTREAM: [media] xc2028: avoid use after free
ANDROID: base-cfg: enable SECCOMP config
ANDROID: rcu_sync: Export rcu_sync_lockdep_assert
RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork
RFC: FROMLIST: cgroup: avoid synchronize_sched() in __cgroup_procs_write()
RFC: FROMLIST: locking/percpu-rwsem: Optimize readers and reduce global impact
net: ipv6: Fix ping to link-local addresses.
ipv6: fix endianness error in icmpv6_err
ANDROID: dm: android-verity: Allow android-verity to be compiled as an independent module
backporting: a brief introduce of backported feautures on 4.4
Linux 4.4.20
sysfs: correctly handle read offset on PREALLOC attrs
hwmon: (iio_hwmon) fix memory leak in name attribute
ALSA: line6: Fix POD sysfs attributes segfault
ALSA: line6: Give up on the lock while URBs are released.
ALSA: line6: Remove double line6_pcm_release() after failed acquire.
ACPI / SRAT: fix SRAT parsing order with both LAPIC and X2APIC present
ACPI / sysfs: fix error code in get_status()
ACPI / drivers: replace acpi_probe_lock spinlock with mutex
ACPI / drivers: fix typo in ACPI_DECLARE_PROBE_ENTRY macro
staging: comedi: ni_mio_common: fix wrong insn_write handler
staging: comedi: ni_mio_common: fix AO inttrig backwards compatibility
staging: comedi: comedi_test: fix timer race conditions
staging: comedi: daqboard2000: bug fix board type matching code
USB: serial: option: add WeTelecom 0x6802 and 0x6803 products
USB: serial: option: add WeTelecom WM-D200
USB: serial: mos7840: fix non-atomic allocation in write path
USB: serial: mos7720: fix non-atomic allocation in write path
USB: fix typo in wMaxPacketSize validation
usb: chipidea: udc: don't touch DP when controller is in host mode
USB: avoid left shift by -1
dmaengine: usb-dmac: check CHCR.DE bit in usb_dmac_isr_channel()
crypto: qat - fix aes-xts key sizes
crypto: nx - off by one bug in nx_of_update_msc()
Input: i8042 - set up shared ps2_cmd_mutex for AUX ports
Input: i8042 - break load dependency between atkbd/psmouse and i8042
Input: tegra-kbc - fix inverted reset logic
btrfs: properly track when rescan worker is running
btrfs: waiting on qgroup rescan should not always be interruptible
fs/seq_file: fix out-of-bounds read
gpio: Fix OF build problem on UM
usb: renesas_usbhs: gadget: fix return value check in usbhs_mod_gadget_probe()
megaraid_sas: Fix probing cards without io port
mpt3sas: Fix resume on WarpDrive flash cards
cdc-acm: fix wrong pipe type on rx interrupt xfers
i2c: cros-ec-tunnel: Fix usage of cros_ec_cmd_xfer()
mfd: cros_ec: Add cros_ec_cmd_xfer_status() helper
aacraid: Check size values after double-fetch from user
ARC: Elide redundant setup of DMA callbacks
ARC: Call trace_hardirqs_on() before enabling irqs
ARC: use correct offset in pt_regs for saving/restoring user mode r25
ARC: build: Better way to detect ISA compatible toolchain
drm/i915: fix aliasing_ppgtt leak
drm/amdgpu: record error code when ring test failed
drm/amd/amdgpu: sdma resume fail during S4 on CI
drm/amdgpu: skip TV/CV in display parsing
drm/amdgpu: avoid a possible array overflow
drm/amdgpu: fix amdgpu_move_blit on 32bit systems
drm/amdgpu: Change GART offset to 64-bit
iio: fix sched WARNING "do not call blocking ops when !TASK_RUNNING"
sched/nohz: Fix affine unpinned timers mess
sched/cputime: Fix NO_HZ_FULL getrusage() monotonicity regression
of: fix reference counting in of_graph_get_endpoint_by_regs
arm64: dts: rockchip: add reset saradc node for rk3368 SoCs
mac80211: fix purging multicast PS buffer queue
s390/dasd: fix hanging device after clear subchannel
EDAC: Increment correct counter in edac_inc_ue_error()
pinctrl/amd: Remove the default de-bounce time
iommu/arm-smmu: Don't BUG() if we find aborting STEs with disable_bypass
iommu/arm-smmu: Fix CMDQ error handling
iommu/dma: Don't put uninitialised IOVA domains
xhci: Make sure xhci handles USB_SPEED_SUPER_PLUS devices.
USB: serial: ftdi_sio: add PIDs for Ivium Technologies devices
USB: serial: ftdi_sio: add device ID for WICED USB UART dev board
USB: serial: option: add support for Telit LE920A4
USB: serial: option: add D-Link DWM-156/A3
USB: serial: fix memleak in driver-registration error path
xhci: don't dereference a xhci member after removing xhci
usb: xhci: Fix panic if disconnect
xhci: always handle "Command Ring Stopped" events
usb/gadget: fix gadgetfs aio support.
usb: gadget: fsl_qe_udc: off by one in setup_received_handle()
USB: validate wMaxPacketValue entries in endpoint descriptors
usb: renesas_usbhs: Use dmac only if the pipe type is bulk
usb: renesas_usbhs: clear the BRDYSTS in usbhsg_ep_enable()
USB: hub: change the locking in hub_activate
USB: hub: fix up early-exit pathway in hub_activate
usb: hub: Fix unbalanced reference count/memory leak/deadlocks
usb: define USB_SPEED_SUPER_PLUS speed for SuperSpeedPlus USB3.1 devices
usb: dwc3: gadget: increment request->actual once
usb: dwc3: pci: add Intel Kabylake PCI ID
usb: misc: usbtest: add fix for driver hang
usb: ehci: change order of register cleanup during shutdown
crypto: caam - defer aead_set_sh_desc in case of zero authsize
crypto: caam - fix echainiv(authenc) encrypt shared descriptor
crypto: caam - fix non-hmac hashes
genirq/msi: Make sure PCI MSIs are activated early
genirq/msi: Remove unused MSI_FLAG_IDENTITY_MAP
um: Don't discard .text.exit section
ACPI / CPPC: Prevent cpc_desc_ptr points to the invalid data
ACPI: CPPC: Return error if _CPC is invalid on a CPU
mmc: sdhci-acpi: Reduce Baytrail eMMC/SD/SDIO hangs
PCI: Limit config space size for Netronome NFP4000
PCI: Add Netronome NFP4000 PF device ID
PCI: Limit config space size for Netronome NFP6000 family
PCI: Add Netronome vendor and device IDs
PCI: Support PCIe devices with short cfg_size
NVMe: Don't unmap controller registers on reset
ALSA: hda - Manage power well properly for resume
libnvdimm, nd_blk: mask off reserved status bits
perf intel-pt: Fix occasional decoding errors when tracing system-wide
vfio/pci: Fix NULL pointer oops in error interrupt setup handling
virtio: fix memory leak in virtqueue_add()
parisc: Fix order of EREFUSED define in errno.h
arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
ALSA: usb-audio: Add quirk for ELP HD USB Camera
ALSA: usb-audio: Add a sample rate quirk for Creative Live! Cam Socialize HD (VF0610)
powerpc/eeh: eeh_pci_enable(): fix checking of post-request state
SUNRPC: allow for upcalls for same uid but different gss service
SUNRPC: Handle EADDRNOTAVAIL on connection failures
tools/testing/nvdimm: fix SIGTERM vs hotplug crash
uprobes/x86: Fix RIP-relative handling of EVEX-encoded instructions
x86/mm: Disable preemption during CR3 read+write
hugetlb: fix nr_pmds accounting with shared page tables
mm: SLUB hardened usercopy support
mm: SLAB hardened usercopy support
s390/uaccess: Enable hardened usercopy
sparc/uaccess: Enable hardened usercopy
powerpc/uaccess: Enable hardened usercopy
ia64/uaccess: Enable hardened usercopy
arm64/uaccess: Enable hardened usercopy
ARM: uaccess: Enable hardened usercopy
x86/uaccess: Enable hardened usercopy
x86: remove more uaccess_32.h complexity
x86: remove pointless uaccess_32.h complexity
x86: fix SMAP in 32-bit environments
Use the new batched user accesses in generic user string handling
Add 'unsafe' user access functions for batched accesses
x86: reorganize SMAP handling in user space accesses
mm: Hardened usercopy
mm: Implement stack frame object validation
mm: Add is_migrate_cma_page
Linux 4.4.19
Documentation/module-signing.txt: Note need for version info if reusing a key
module: Invalidate signatures on force-loaded modules
dm flakey: error READ bios during the down_interval
rtc: s3c: Add s3c_rtc_{enable/disable}_clk in s3c_rtc_setfreq()
lpfc: fix oops in lpfc_sli4_scmd_to_wqidx_distr() from lpfc_send_taskmgmt()
ACPI / EC: Work around method reentrancy limit in ACPICA for _Qxx
x86/platform/intel_mid_pci: Rework IRQ0 workaround
PCI: Mark Atheros AR9485 and QCA9882 to avoid bus reset
MIPS: hpet: Increase HPET_MIN_PROG_DELTA and decrease HPET_MIN_CYCLES
MIPS: Don't register r4k sched clock when CPUFREQ enabled
MIPS: mm: Fix definition of R6 cache instruction
SUNRPC: Don't allocate a full sockaddr_storage for tracing
Input: elan_i2c - properly wake up touchpad on ASUS laptops
target: Fix ordered task CHECK_CONDITION early exception handling
target: Fix max_unmap_lba_count calc overflow
target: Fix race between iscsi-target connection shutdown + ABORT_TASK
target: Fix missing complete during ABORT_TASK + CMD_T_FABRIC_STOP
target: Fix ordered task target_setup_cmd_from_cdb exception hang
iscsi-target: Fix panic when adding second TCP connection to iSCSI session
ubi: Fix race condition between ubi device creation and udev
ubi: Fix early logging
ubi: Make volume resize power cut aware
of: fix memory leak related to safe_name()
IB/mlx4: Fix memory leak if QP creation failed
IB/mlx4: Fix error flow when sending mads under SRIOV
IB/mlx4: Fix the SQ size of an RC QP
IB/IWPM: Fix a potential skb leak
IB/IPoIB: Don't update neigh validity for unresolved entries
IB/SA: Use correct free function
IB/mlx5: Return PORT_ERR in Active to Initializing tranisition
IB/mlx5: Fix post send fence logic
IB/mlx5: Fix entries check in mlx5_ib_resize_cq
IB/mlx5: Fix returned values of query QP
IB/mlx5: Fix entries checks in mlx5_ib_create_cq
IB/mlx5: Fix MODIFY_QP command input structure
ALSA: hda - Fix headset mic detection problem for two dell machines
ALSA: hda: add AMD Bonaire AZ PCI ID with proper driver caps
ALSA: hda/realtek - Can't adjust speaker's volume on a Dell AIO
ALSA: hda: Fix krealloc() with __GFP_ZERO usage
mm/hugetlb: avoid soft lockup in set_max_huge_pages()
mtd: nand: fix bug writing 1 byte less than page size
block: fix bdi vs gendisk lifetime mismatch
block: add missing group association in bio-cloning functions
metag: Fix __cmpxchg_u32 asm constraint for CMP
ftrace/recordmcount: Work around for addition of metag magic but not relocations
balloon: check the number of available pages in leak balloon
drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown"
drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB
drm/edid: Add 6 bpc quirk for display AEO model 0.
drm: Restore double clflush on the last partial cacheline
drm/nouveau/fbcon: fix font width not divisible by 8
drm/nouveau/gr/nv3x: fix instobj write offsets in gr setup
drm/nouveau: check for supported chipset before booting fbdev off the hw
drm/radeon: support backlight control for UNIPHY3
drm/radeon: fix firmware info version checks
drm/radeon: Poll for both connect/disconnect on analog connectors
drm/radeon: add a delay after ATPX dGPU power off
drm/amdgpu/gmc7: add missing mullins case
drm/amdgpu: fix firmware info version checks
drm/amdgpu: Disable RPM helpers while reprobing connectors on resume
drm/amdgpu: support backlight control for UNIPHY3
drm/amdgpu: Poll for both connect/disconnect on analog connectors
drm/amdgpu: add a delay after ATPX dGPU power off
w1:omap_hdq: fix regression
netlabel: add address family checks to netlbl_{sock,req}_delattr()
ARM: dts: sunxi: Add a startup delay for fixed regulator enabled phys
audit: fix a double fetch in audit_log_single_execve_arg()
iommu/amd: Update Alias-DTE in update_device_table()
iommu/amd: Init unity mappings only for dma_ops domains
iommu/amd: Handle IOMMU_DOMAIN_DMA in ops->domain_free call-back
iommu/vt-d: Return error code in domain_context_mapping_one()
iommu/exynos: Suppress unbinding to prevent system failure
drm/i915: Don't complain about lack of ACPI video bios
nfsd: don't return an unhashed lock stateid after taking mutex
nfsd: Fix race between FREE_STATEID and LOCK
nfs: don't create zero-length requests
MIPS: KVM: Propagate kseg0/mapped tlb fault errors
MIPS: KVM: Fix gfn range check in kseg0 tlb faults
MIPS: KVM: Add missing gfn range check
MIPS: KVM: Fix mapped fault broken commpage handling
random: add interrupt callback to VMBus IRQ handler
random: print a warning for the first ten uninitialized random users
random: initialize the non-blocking pool via add_hwgenerator_randomness()
CIFS: Fix a possible invalid memory access in smb2_query_symlink()
cifs: fix crash due to race in hmac(md5) handling
cifs: Check for existing directory when opening file with O_CREAT
fs/cifs: make share unaccessible at root level mountable
jbd2: make journal y2038 safe
ARC: mm: don't loose PTE_SPECIAL in pte_modify()
remoteproc: Fix potential race condition in rproc_add
ovl: disallow overlayfs as upperdir
HID: uhid: fix timeout when probe races with IO
EDAC: Correct channel count limit
Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU
spi: pxa2xx: Clear all RFT bits in reset_sccr1() on Intel Quark
i2c: efm32: fix a failure path in efm32_i2c_probe()
s5p-mfc: Add release callback for memory region devs
s5p-mfc: Set device name for reserved memory region devs
hp-wmi: Fix wifi cannot be hard-unblocked
dm: set DMF_SUSPENDED* _before_ clearing DMF_NOFLUSH_SUSPENDING
sur40: fix occasional oopses on device close
sur40: lower poll interval to fix occasional FPS drops to ~56 FPS
Fix RC5 decoding with Fintek CIR chipset
vb2: core: Skip planes array verification if pb is NULL
videobuf2-v4l2: Verify planes array in buffer dequeueing
media: dvb_ringbuffer: Add memory barriers
media: usbtv: prevent access to free'd resources
mfd: qcom_rpm: Parametrize also ack selector size
mfd: qcom_rpm: Fix offset error for msm8660
intel_pstate: Fix MSR_CONFIG_TDP_x addressing in core_get_max_pstate()
s390/cio: allow to reset channel measurement block
KVM: nVMX: Fix memory corruption when using VMCS shadowing
KVM: VMX: handle PML full VMEXIT that occurs during event delivery
KVM: MTRR: fix kvm_mtrr_check_gfn_range_consistency page fault
KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE
KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures
arm64: mm: avoid fdt_check_header() before the FDT is fully mapped
arm64: dts: rockchip: fixes the gic400 2nd region size for rk3368
pinctrl: cherryview: prevent concurrent access to GPIO controllers
Bluetooth: hci_intel: Fix null gpio desc pointer dereference
gpio: intel-mid: Remove potentially harmful code
gpio: pca953x: Fix NBANK calculation for PCA9536
tty/serial: atmel: fix RS485 half duplex with DMA
serial: samsung: Fix ERR pointer dereference on deferred probe
tty: serial: msm: Don't read off end of tx fifo
arm64: Fix incorrect per-cpu usage for boot CPU
arm64: debug: unmask PSTATE.D earlier
arm64: kernel: Save and restore UAO and addr_limit on exception entry
USB: usbfs: fix potential infoleak in devio
usb: renesas_usbhs: fix NULL pointer dereference in xfer_work()
USB: serial: option: add support for Telit LE910 PID 0x1206
usb: dwc3: fix for the isoc transfer EP_BUSY flag
usb: quirks: Add no-lpm quirk for Elan
usb: renesas_usbhs: protect the CFIFOSEL setting in usbhsg_ep_enable()
usb: f_fs: off by one bug in _ffs_func_bind()
usb: gadget: avoid exposing kernel stack
UPSTREAM: usb: gadget: configfs: add mutex lock before unregister gadget
ANDROID: dm-verity: adopt changes made to dm callbacks
UPSTREAM: ecryptfs: fix handling of directory opening
ANDROID: net: core: fix UID-based routing
ANDROID: net: fib: remove duplicate assignment
FROMLIST: proc: Fix timerslack_ns CAP_SYS_NICE check when adjusting self
ANDROID: dm verity fec: pack the fec_header structure
ANDROID: dm: android-verity: Verify header before fetching table
ANDROID: dm: allow adb disable-verity only in userdebug
ANDROID: dm: mount as linear target if eng build
ANDROID: dm: use default verity public key
ANDROID: dm: fix signature verification flag
ANDROID: dm: use name_to_dev_t
ANDROID: dm: rename dm-linear methods for dm-android-verity
ANDROID: dm: Minor cleanup
ANDROID: dm: Mounting root as linear device when verity disabled
ANDROID: dm-android-verity: Rebase on top of 4.1
ANDROID: dm: Add android verity target
ANDROID: dm: fix dm_substitute_devices()
ANDROID: dm: Rebase on top of 4.1
CHROMIUM: dm: boot time specification of dm=
Implement memory_state_time, used by qcom,cpubw
Revert "panic: Add board ID to panic output"
usb: gadget: f_accessory: remove duplicate endpoint alloc
BACKPORT: brcmfmac: defer DPC processing during probe
FROMLIST: proc: Add LSM hook checks to /proc/<tid>/timerslack_ns
FROMLIST: proc: Relax /proc/<tid>/timerslack_ns capability requirements
UPSTREAM: ppp: defer netns reference release for ppp channel
cpuset: Add allow_attach hook for cpusets on android.
UPSTREAM: KEYS: Fix ASN.1 indefinite length object parsing
ANDROID: sdcardfs: fix itnull.cocci warnings
android-recommended.cfg: enable fstack-protector-strong
Linux 4.4.18
mm: memcontrol: fix memcg id ref counter on swap charge move
mm: memcontrol: fix swap counter leak on swapout from offline cgroup
mm: memcontrol: fix cgroup creation failure after many small jobs
ext4: fix reference counting bug on block allocation error
ext4: short-cut orphan cleanup on error
ext4: validate s_reserved_gdt_blocks on mount
ext4: don't call ext4_should_journal_data() on the journal inode
ext4: fix deadlock during page writeback
ext4: check for extents that wrap around
crypto: scatterwalk - Fix test in scatterwalk_done
crypto: gcm - Filter out async ghash if necessary
fs/dcache.c: avoid soft-lockup in dput()
fuse: fix wrong assignment of ->flags in fuse_send_init()
fuse: fuse_flush must check mapping->flags for errors
fuse: fsync() did not return IO errors
sysv, ipc: fix security-layer leaking
block: fix use-after-free in seq file
x86/syscalls/64: Add compat_sys_keyctl for 32-bit userspace
drm/i915: Pretend cursor is always on for ILK-style WM calculations (v2)
x86/mm/pat: Fix BUG_ON() in mmap_mem() on QEMU/i386
x86/pat: Document the PAT initialization sequence
x86/xen, pat: Remove PAT table init code from Xen
x86/mtrr: Fix PAT init handling when MTRR is disabled
x86/mtrr: Fix Xorg crashes in Qemu sessions
x86/mm/pat: Replace cpu_has_pat with boot_cpu_has()
x86/mm/pat: Add pat_disable() interface
x86/mm/pat: Add support of non-default PAT MSR setting
devpts: clean up interface to pty drivers
random: strengthen input validation for RNDADDTOENTCNT
apparmor: fix ref count leak when profile sha1 hash is read
Revert "s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL"
KEYS: 64-bit MIPS needs to use compat_sys_keyctl for 32-bit userspace
arm: oabi compat: add missing access checks
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
i2c: i801: Allow ACPI SystemIO OpRegion to conflict with PCI BAR
x86/mm/32: Enable full randomization on i386 and X86_32
HID: sony: do not bail out when the sixaxis refuses the output report
PNP: Add Broadwell to Intel MCH size workaround
PNP: Add Haswell-ULT to Intel MCH size workaround
scsi: ignore errors from scsi_dh_add_device()
ipath: Restrict use of the write() interface
tcp: consider recv buf for the initial window scale
qed: Fix setting/clearing bit in completion bitmap
net/irda: fix NULL pointer dereference on memory allocation failure
net: bgmac: Fix infinite loop in bgmac_dma_tx_add()
bonding: set carrier off for devices created through netlink
ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space
tcp: enable per-socket rate limiting of all 'challenge acks'
tcp: make challenge acks less predictable
arm64: relocatable: suppress R_AARCH64_ABS64 relocations in vmlinux
arm64: vmlinux.lds: make __rela_offset and __dynsym_offset ABSOLUTE
Linux 4.4.17
vfs: fix deadlock in file_remove_privs() on overlayfs
intel_th: Fix a deadlock in modprobing
intel_th: pci: Add Kaby Lake PCH-H support
net: mvneta: set real interrupt per packet for tx_done
libceph: apply new_state before new_up_client on incrementals
libata: LITE-ON CX1-JB256-HP needs lower max_sectors
i2c: mux: reg: wrong condition checked for of_address_to_resource return value
posix_cpu_timer: Exit early when process has been reaped
media: fix airspy usb probe error path
ipr: Clear interrupt on croc/crocodile when running with LSI
SCSI: fix new bug in scsi_dev_info_list string matching
RDS: fix rds_tcp_init() error path
can: fix oops caused by wrong rtnl dellink usage
can: fix handling of unmodifiable configuration options fix
can: c_can: Update D_CAN TX and RX functions to 32 bit - fix Altera Cyclone access
can: at91_can: RX queue could get stuck at high bus load
perf/x86: fix PEBS issues on Intel Atom/Core2
ovl: handle ATTR_KILL*
sched/fair: Fix effective_load() to consistently use smoothed load
mmc: block: fix packed command header endianness
block: fix use-after-free in sys_ioprio_get()
qeth: delete napi struct when removing a qeth device
platform/chrome: cros_ec_dev - double fetch bug in ioctl
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
spi: sun4i: fix FIFO limit
spi: sunxi: fix transfer timeout
namespace: update event counter when umounting a deleted dentry
9p: use file_dentry()
ext4: verify extent header depth
ecryptfs: don't allow mmap when the lower fs doesn't support it
Revert "ecryptfs: forbid opening files without mmap handler"
locks: use file_inode()
power_supply: power_supply_read_temp only if use_cnt > 0
cgroup: set css->id to -1 during init
pinctrl: imx: Do not treat a PIN without MUX register as an error
pinctrl: single: Fix missing flush of posted write for a wakeirq
pvclock: Add CPU barriers to get correct version value
Input: tsc200x - report proper input_dev name
Input: xpad - validate USB endpoint count during probe
Input: wacom_w8001 - w8001_MAX_LENGTH should be 13
Input: xpad - fix oops when attaching an unknown Xbox One gamepad
Input: elantech - add more IC body types to the list
Input: vmmouse - remove port reservation
ALSA: timer: Fix leak in events via snd_timer_user_tinterrupt
ALSA: timer: Fix leak in events via snd_timer_user_ccallback
ALSA: timer: Fix leak in SNDRV_TIMER_IOCTL_PARAMS
xenbus: don't bail early from xenbus_dev_request_and_reply()
xenbus: don't BUG() on user mode induced condition
xen/pciback: Fix conf_space read/write overlap check.
ARC: unwind: ensure that .debug_frame is generated (vs. .eh_frame)
arc: unwind: warn only once if DW2_UNWIND is disabled
kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w
pps: do not crash when failed to register
vmlinux.lds: account for destructor sections
mm, meminit: ensure node is online before checking whether pages are uninitialised
mm, meminit: always return a valid node from early_pfn_to_nid
mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
fs/nilfs2: fix potential underflow in call to crc32_le
mm, compaction: abort free scanner if split fails
mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask
dmaengine: at_xdmac: double FIFO flush needed to compute residue
dmaengine: at_xdmac: fix residue corruption
dmaengine: at_xdmac: align descriptors on 64 bits
x86/quirks: Add early quirk to reset Apple AirPort card
x86/quirks: Reintroduce scanning of secondary buses
x86/quirks: Apply nvidia_bugs quirk only on root bus
USB: OHCI: Don't mark EDs as ED_OPER if scheduling fails
Conflicts:
arch/arm/kernel/topology.c
arch/arm64/include/asm/arch_gicv3.h
arch/arm64/kernel/topology.c
block/bio.c
drivers/cpufreq/Kconfig
drivers/md/Makefile
drivers/media/dvb-core/dvb_ringbuffer.c
drivers/media/tuners/tuner-xc2028.c
drivers/misc/Kconfig
drivers/misc/Makefile
drivers/mmc/core/host.c
drivers/scsi/ufs/ufshcd.c
drivers/scsi/ufs/ufshcd.h
drivers/usb/dwc3/gadget.c
drivers/usb/gadget/configfs.c
fs/ecryptfs/file.c
include/linux/mmc/core.h
include/linux/mmc/host.h
include/linux/mmzone.h
include/linux/sched.h
include/linux/sched/sysctl.h
include/trace/events/power.h
include/trace/events/sched.h
init/Kconfig
kernel/cpuset.c
kernel/exit.c
kernel/sched/Makefile
kernel/sched/core.c
kernel/sched/cputime.c
kernel/sched/fair.c
kernel/sched/features.h
kernel/sched/rt.c
kernel/sched/sched.h
kernel/sched/stop_task.c
kernel/sched/tune.c
lib/Kconfig.debug
mm/Makefile
mm/vmstat.c
Change-Id: I243a43231ca56a6362076fa6301827e1b0493be5
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
The dirty balance reserve that dirty throttling has to consider is
merely memory not available to userspace allocations. There is nothing
writeback-specific about it. Generalize the name so that it's reusable
outside of that context.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit a8d0143730d7b42c9fe6d1435d92ecce6863a62a)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Code such as hardened user copy[1] needs a way to tell if a
page is CMA or not. Add is_migrate_cma_page in a similar way
to is_migrate_isolate_page.
[1]http://article.gmane.org/gmane.linux.kernel.mm/155238
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1f9aa13d8d063038fa70b93282a836648fbb4f6d
(cherry picked from commit 7c15d9bb8231f998ae7dc0b72415f5215459f7fb)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Code such as hardened user copy[1] needs a way to tell if a
page is CMA or not. Add is_migrate_cma_page in a similar way
to is_migrate_isolate_page.
[1]http://article.gmane.org/gmane.linux.kernel.mm/155238
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
(cherry picked from commit 7c15d9bb8231f998ae7dc0b72415f5215459f7fb)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Add a cma pcp list in order to increase cma memory utilization.
Increased cma memory utilization will improve overall memory
utilization because free cma pages are ignored when memory reclaim
is done with gfp mask GFP_KERNEL.
Since most memory reclaim is done by kswapd, which uses a gfp mask
of GFP_KERNEL, by increasing cma memory utilization we are therefore
ensuring that less aggressive memory reclaim takes place.
Increased cma memory utilization will improve performance,
for example it will increase app concurrency.
Change-Id: I809589a25c6abca51f1c963f118adfc78e955cf9
Signed-off-by: Liam Mark <lmark@codeaurora.org>
CMA pages are designed to be used as fallback for movable allocations
and cannot be used for non-movable allocations. If CMA pages are
utilized poorly, non-movable allocations may end up getting starved if
all regular movable pages are allocated and the only pages left are
CMA. Always using CMA pages first creates unacceptable performance
problems. As a midway alternative, use CMA pages for certain
userspace allocations. The userspace pages can be migrated or dropped
quickly which giving decent utilization.
Change-Id: I6165dda01b705309eebabc6dfa67146b7a95c174
CRs-Fixed: 452508
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Heesub Shin <heesub.shin@samsung.com
[lauraa@codeaurora.org: Missing CONFIG_CMA guards, add commit text]
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
[lmark@codeaurora.org: resolve conflicts relating to
MIGRATE_HIGHATOMIC and some other trivial merge conflicts]
Signed-off-by: Liam Mark <lmark@codeaurora.org>
The lowmem_shrink function discounts all the swap cache pages from
the file cache count. The zone aware code also discounts all file
cache pages from a certain zone. This results in some swap cache
pages being discounted twice, which can result in the low memory
killer being unnecessarily aggressive.
Fix the low memory killer to only discount the swap cache pages
once.
Change-Id: I650bbfbf0fbbabd01d82bdb3502b57ff59c3e14f
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
In certain memory configurations there can be a large number of
CMA pages which are not suitable to satisfy certain memory
requests.
This large number of unsuitable pages can cause the
lowmemorykiller to not kill any tasks because the
lowmemorykiller counts all free pages.
In order to ensure the lowmemorykiller properly evaluates the
free memory only count the free CMA pages if they are suitable
for satisfying the memory request.
Change-Id: I7f06d53e2d8cfe7439e5561fe6e5209ce73b1c90
CRs-fixed: 437016
Signed-off-by: Liam Mark <lmark@codeaurora.org>
Bring back the is_cma_pageblock definition for determining if a
page is CMA or not.
Change-Id: I39fd546e22e240b752244832c79514f109c8e84b
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
Someone has an 86 column display.
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
High-order watermark checking exists for two reasons -- kswapd high-order
awareness and protection for high-order atomic requests. Historically the
kernel depended on MIGRATE_RESERVE to preserve min_free_kbytes as
high-order free pages for as long as possible. This patch introduces
MIGRATE_HIGHATOMIC that reserves pageblocks for high-order atomic
allocations on demand and avoids using those blocks for order-0
allocations. This is more flexible and reliable than MIGRATE_RESERVE was.
A MIGRATE_HIGHORDER pageblock is created when an atomic high-order
allocation request steals a pageblock but limits the total number to 1% of
the zone. Callers that speculatively abuse atomic allocations for
long-lived high-order allocations to access the reserve will quickly fail.
Note that SLUB is currently not such an abuser as it reclaims at least
once. It is possible that the pageblock stolen has few suitable
high-order pages and will need to steal again in the near future but there
would need to be strong justification to search all pageblocks for an
ideal candidate.
The pageblocks are unreserved if an allocation fails after a direct
reclaim attempt.
The watermark checks account for the reserved pageblocks when the
allocation request is not a high-order atomic allocation.
The reserved pageblocks can not be used for order-0 allocations. This may
allow temporary wastage until a failed reclaim reassigns the pageblock.
This is deliberate as the intent of the reservation is to satisfy a
limited number of atomic high-order short-lived requests if the system
requires them.
The stutter benchmark was used to evaluate this but while it was running
there was a systemtap script that randomly allocated between 1 high-order
page and 12.5% of memory's worth of order-3 pages using GFP_ATOMIC. This
is much larger than the potential reserve and it does not attempt to be
realistic. It is intended to stress random high-order allocations from an
unknown source, show that there is a reduction in failures without
introducing an anomaly where atomic allocations are more reliable than
regular allocations. The amount of memory reserved varied throughout the
workload as reserves were created and reclaimed under memory pressure.
The allocation failures once the workload warmed up were as follows;
4.2-rc5-vanilla 70%
4.2-rc5-atomic-reserve 56%
The failure rate was also measured while building multiple kernels. The
failure rate was 14% but is 6% with this patch applied.
Overall, this is a small reduction but the reserves are small relative to
the number of allocation requests. In early versions of the patch, the
failure rate reduced by a much larger amount but that required much larger
reserves and perversely made atomic allocations seem more reliable than
regular allocations.
[yalin.wang2010@gmail.com: fix redundant check and a memory leak]
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: yalin wang <yalin.wang2010@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
MIGRATE_RESERVE preserves an old property of the buddy allocator that
existed prior to fragmentation avoidance -- min_free_kbytes worth of pages
tended to remain contiguous until the only alternative was to fail the
allocation. At the time it was discovered that high-order atomic
allocations relied on this property so MIGRATE_RESERVE was introduced. A
later patch will introduce an alternative MIGRATE_HIGHATOMIC so this patch
deletes MIGRATE_RESERVE and supporting code so it'll be easier to review.
Note that this patch in isolation may look like a false regression if
someone was bisecting high-order atomic allocation failures.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The zonelist cache (zlc) was introduced to skip over zones that were
recently known to be full. This avoided expensive operations such as the
cpuset checks, watermark calculations and zone_reclaim. The situation
today is different and the complexity of zlc is harder to justify.
1) The cpuset checks are no-ops unless a cpuset is active and in general
are a lot cheaper.
2) zone_reclaim is now disabled by default and I suspect that was a large
source of the cost that zlc wanted to avoid. When it is enabled, it's
known to be a major source of stalling when nodes fill up and it's
unwise to hit every other user with the overhead.
3) Watermark checks are expensive to calculate for high-order
allocation requests. Later patches in this series will reduce the cost
of the watermark checking.
4) The most important issue is that in the current implementation it
is possible for a failed THP allocation to mark a zone full for order-0
allocations and cause a fallback to remote nodes.
The last issue could be addressed with additional complexity but as the
benefit of zlc is questionable, it is better to remove it. If stalls due
to zone_reclaim are ever reported then an alternative would be to
introduce deferring logic based on a timeout inside zone_reclaim itself
and leave the page allocator fast paths alone.
The impact on page-allocator microbenchmarks is negligible as they don't
hit the paths where the zlc comes into play. Most page-reclaim related
workloads showed no noticeable difference as a result of the removal.
The impact was noticeable in a workload called "stutter". One part uses a
lot of anonymous memory, a second measures mmap latency and a third copies
a large file. In an ideal world the latency application would not notice
the mmap latency. On a 2-node machine the results of this patch are
stutter
4.3.0-rc1 4.3.0-rc1
baseline nozlc-v4
Min mmap 20.9243 ( 0.00%) 20.7716 ( 0.73%)
1st-qrtle mmap 22.0612 ( 0.00%) 22.0680 ( -0.03%)
2nd-qrtle mmap 22.3291 ( 0.00%) 22.3809 ( -0.23%)
3rd-qrtle mmap 25.2244 ( 0.00%) 25.2396 ( -0.06%)
Max-90% mmap 48.0995 ( 0.00%) 28.3713 ( 41.02%)
Max-93% mmap 52.5557 ( 0.00%) 36.0170 ( 31.47%)
Max-95% mmap 55.8173 ( 0.00%) 47.3163 ( 15.23%)
Max-99% mmap 67.3781 ( 0.00%) 70.1140 ( -4.06%)
Max mmap 24447.6375 ( 0.00%) 12915.1356 ( 47.17%)
Mean mmap 33.7883 ( 0.00%) 27.7944 ( 17.74%)
Best99%Mean mmap 27.7825 ( 0.00%) 25.2767 ( 9.02%)
Best95%Mean mmap 26.3912 ( 0.00%) 23.7994 ( 9.82%)
Best90%Mean mmap 24.9886 ( 0.00%) 23.2251 ( 7.06%)
Best50%Mean mmap 22.0157 ( 0.00%) 22.0261 ( -0.05%)
Best10%Mean mmap 21.6705 ( 0.00%) 21.6083 ( 0.29%)
Best5%Mean mmap 21.5581 ( 0.00%) 21.4611 ( 0.45%)
Best1%Mean mmap 21.3079 ( 0.00%) 21.1631 ( 0.68%)
Note that the maximum stall latency went from 24 seconds to 12 which is
still bad but an improvement. The milage varies considerably 2-node
machine on an earlier test went from 494 seconds to 47 seconds and a
4-node machine that tested an earlier version of this patch went from a
worst case stall time of 6 seconds to 67ms. The nature of the benchmark
is inherently unpredictable as it is hammering the system and the milage
will vary between machines.
There is a secondary impact with potentially more direct reclaim because
zones are now being considered instead of being skipped by zlc. In this
particular test run it did not occur so will not be described. However,
in at least one test the following was observed
1. Direct reclaim rates were higher. This was likely due to direct reclaim
being entered instead of the zlc disabling a zone and busy looping.
Busy looping may have the effect of allowing kswapd to make more
progress and in some cases may be better overall. If this is found then
the correct action is to put direct reclaimers to sleep on a waitqueue
and allow kswapd make forward progress. Busy looping on the zlc is even
worse than when the allocator used to blindly call congestion_wait().
2. There was higher swap activity as direct reclaim was active.
3. Direct reclaim efficiency was lower. This is related to 1 as more
scanning activity also encountered more pages that could not be
immediately reclaimed
In that case, the direct page scan and reclaim rates are noticeable but
it is not considered a problem for a few reasons
1. The test is primarily concerned with latency. The mmap attempts are also
faulted which means there are THP allocation requests. The ZLC could
cause zones to be disabled causing the process to busy loop instead
of reclaiming. This looks like elevated direct reclaim activity but
it's the correct action to take based on what processes requested.
2. The test hammers reclaim and compaction heavily. The number of successful
THP faults is highly variable but affects the reclaim stats. It's not a
realistic or reasonable measure of page reclaim activity.
3. No other page-reclaim intensive workload that was tested showed a problem.
4. If a workload is identified that benefitted from the busy looping then it
should be fixed by having direct reclaimers sleep on a wait queue until
woken by kswapd instead of busy looping. We had this class of problem before
when congestion_waits() with a fixed timeout was a brain damaged decision
but happened to benefit some workloads.
If a workload is identified that relied on the zlc to busy loop then it
should be fixed correctly and have a direct reclaimer sleep on a waitqueue
until woken by kswapd.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch redefines which GFP bits are used for specifying mobility and
the order of the migrate types. Once redefined it's possible to convert
GFP flags to a migrate type with a simple mask and shift. The only
downside is that readers of OOM kill messages and allocation failures may
have been used to the existing values but scripts/gfp-translate will help.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vitaly Wool <vitalywool@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Overall, the intent of this series is to remove the zonelist cache which
was introduced to avoid high overhead in the page allocator. Once this is
done, it is necessary to reduce the cost of watermark checks.
The series starts with minor micro-optimisations.
Next it notes that GFP flags that affect watermark checks are abused.
__GFP_WAIT historically identified callers that could not sleep and could
access reserves. This was later abused to identify callers that simply
prefer to avoid sleeping and have other options. A patch distinguishes
between atomic callers, high-priority callers and those that simply wish
to avoid sleep.
The zonelist cache has been around for a long time but it is of dubious
merit with a lot of complexity and some issues that are explained. The
most important issue is that a failed THP allocation can cause a zone to
be treated as "full". This potentially causes unnecessary stalls, reclaim
activity or remote fallbacks. The issues could be fixed but it's not
worth it. The series places a small number of other micro-optimisations
on top before examining GFP flags watermarks.
High-order watermarks enforcement can cause high-order allocations to fail
even though pages are free. The watermark checks both protect high-order
atomic allocations and make kswapd aware of high-order pages but there is
a much better way that can be handled using migrate types. This series
uses page grouping by mobility to reserve pageblocks for high-order
allocations with the size of the reservation depending on demand. kswapd
awareness is maintained by examining the free lists. By patch 12 in this
series, there are no high-order watermark checks while preserving the
properties that motivated the introduction of the watermark checks.
This patch (of 10):
No user of zone_watermark_ok_safe() specifies alloc_flags. This patch
removes the unnecessary parameter.
Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Christoph Lameter <cl@linux.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit a2f3aa0257 ("[PATCH] Fix sparsemem on Cell") fixed an oops
experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime by introducing an 'enum memmap_context'
parameter to memmap_init_zone() and init_currently_empty_zone(). This
parameter is intended to be used to tell whether the call of these two
functions is being made on behalf of a hotplug event, or happening at
boot-time. However, init_currently_empty_zone() does not use this
parameter at all, so remove it.
Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1/ Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map. This facility is used by the pmem driver to
enable pfn_to_page() operations on the page frames returned by DAX
('direct_access' in 'struct block_device_operations'). For now, the
'memmap' allocation for these "device" pages comes from "System
RAM". Support for allocating the memmap from device memory will
arrive in a later kernel.
2/ Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3. Completion of
the conversion is targeted for v4.4.
3/ Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
4/ Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
5/ Miscellaneous updates and fixes to libnvdimm including support
for issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABAgAGBQJV6Nx7AAoJEB7SkWpmfYgCWyYQAI5ju6Gvw27RNFtPovHcZUf5
JGnxXejI6/AqeTQ+IulgprxtEUCrXOHjCDA5dkjr1qvsoqK1qxug+vJHOZLgeW0R
OwDtmdW4Qrgeqm+CPoxETkorJ8wDOc8mol81kTiMgeV3UqbYeeHIiTAmwe7VzZ0C
nNdCRDm5g8dHCjTKcvK3rvozgyoNoWeBiHkPe76EbnxDICxCB5dak7XsVKNMIVFQ
NuYlnw6IYN7+rMHgpgpRux38NtIW8VlYPWTmHExejc2mlioWMNBG/bmtwLyJ6M3e
zliz4/cnonTMUaizZaVozyinTa65m7wcnpjK+vlyGV2deDZPJpDRvSOtB0lH30bR
1gy+qrKzuGKpaN6thOISxFLLjmEeYwzYd7SvC9n118r32qShz+opN9XX0WmWSFlA
sajE1ehm4M7s5pkMoa/dRnAyR8RUPu4RNINdQ/Z9jFfAOx+Q26rLdQXwf9+uqbEb
bIeSQwOteK5vYYCstvpAcHSMlJAglzIX5UfZBvtEIJN7rlb0VhmGWfxAnTu+ktG1
o9cqAt+J4146xHaFwj5duTsyKhWb8BL9+xqbKPNpXEp+PbLsrnE/+WkDLFD67jxz
dgIoK60mGnVXp+16I2uMqYYDgAyO5zUdmM4OygOMnZNa1mxesjbDJC6Wat1Wsndn
slsw6DkrWT60CRE42nbK
=o57/
-----END PGP SIGNATURE-----
Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().
Summary:
- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.
This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').
For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.
- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.
Completion of the conversion is targeted for v4.4.
- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.
- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.
- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"
* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...
struct node_active_region is not used anymore. Remove it.
Signed-off-by: minkyung88.kim <minkyung88.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While pmem is usable as a block device or via DAX mappings to userspace
there are several usage scenarios that can not target pmem due to its
lack of struct page coverage. In preparation for "hot plugging" pmem
into the vmemmap add ZONE_DEVICE as a new zone to tag these pages
separately from the ones that are subject to standard page allocations.
Importantly "device memory" can be removed at will by userspace
unbinding the driver of the device.
Having a separate zone prevents allocation and otherwise marks these
pages that are distinct from typical uniform memory. Device memory has
different lifetime and performance characteristics than RAM. However,
since we have run out of ZONES_SHIFT bits this functionality currently
depends on sacrificing ZONE_DMA.
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jerome Glisse <j.glisse@gmail.com>
[hch: various simplifications in the arch interface]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
This patch initalises all low memory struct pages and 2G of the highest
zone on each node during memory initialisation if
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. That config option cannot be set
but will be available in a later patch. Parallel initialisation of struct
page depends on some features from memory hotplug and it is necessary to
alter alter section annotations.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are
unnecessarily visible outside memory initialisation. As well as
unnecessary visibility, it's unnecessary function call overhead when
initialising pages. This patch moves the helpers inline.
[akpm@linux-foundation.org: fix build]
[mhocko@suse.cz: fix build]
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__early_pfn_to_nid() use static variables to cache recent lookups as
memblock lookups are very expensive but it assumes that memory
initialisation is single-threaded. Parallel initialisation of struct
pages will break that assumption so this patch makes __early_pfn_to_nid()
SMP-safe by requiring the caller to cache recent search information.
early_pfn_to_nid() keeps the same interface but is only safe to use early
in boot due to the use of a global static variable. meminit_pfn_in_nid()
is an SMP-safe version that callers must maintain their own state for.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Tested-by: Nate Zimmer <nzimmer@sgi.com>
Tested-by: Waiman Long <waiman.long@hp.com>
Tested-by: Daniel J Blueman <daniel@numascale.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Robin Holt <robinmholt@gmail.com>
Cc: Nate Zimmer <nzimmer@sgi.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All callers of zone_movable_is_highmem are under #ifdef CONFIG_HIGHMEM,
so the else branch return 0 is not needed.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Ying reported the following problem due to commit 3484b2de94 ("mm:
rearrange zone fields into read-only, page alloc, statistics and page
reclaim lines") from the Intel performance tests
24b7e5819a3484b2de94
---------------- --------------------------
%stddev %change %stddev
\ | \
152288 \261 0% -46.2% 81911 \261 0% aim7.jobs-per-min
237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time
237 \261 0% +85.6% 440 \261 0% aim7.time.elapsed_time.max
25026 \261 0% +70.7% 42712 \261 0% aim7.time.system_time
2186645 \261 5% +32.0% 2885949 \261 4% aim7.time.voluntary_context_switches
4576561 \261 1% +24.9% 5715773 \261 0% aim7.time.involuntary_context_switches
The problem is specific to very large machines under stress. It was not
reproducible with the machines I had used to justify the original patch
because large numbers of CPUs are required. When pressure is high enough,
the cache line is bouncing between CPUs trying to acquire the lock and the
holder of the lock adjusting free lists. The intention was that the
acquirer of the lock would automatically have the cache line holding the
free lists but according to Huang, this is not a universal win.
One possibility is to move the zone lock to its own cache line but it
increases the size of the zone. This patch moves the lock to the other
end of the free lists where they do not contend under high pressure. It
does mean the page allocator paths now require more cache lines but Huang
reports that it restores performance to previous levels on large machines
%stddev %change %stddev
\ | \
84568 \261 1% +94.3% 164280 \261 1% aim7.jobs-per-min
2881944 \261 2% -35.1% 1870386 \261 8% aim7.time.voluntary_context_switches
681 \261 1% -3.4% 658 \261 0% aim7.time.user_time
5538139 \261 0% -12.1% 4867884 \261 0% aim7.time.involuntary_context_switches
44174 \261 1% -46.0% 23848 \261 1% aim7.time.system_time
426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time
426 \261 1% -48.4% 219 \261 1% aim7.time.elapsed_time.max
468 \261 1% -43.1% 266 \261 2% uptime.boot
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Huang Ying <ying.huang@intel.com>
Tested-by: Huang Ying <ying.huang@intel.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
next_zones_zonelist() returns a zoneref pointer, as well as a zone pointer
via extra parameter. Since the latter can be trivially obtained by
dereferencing the former, the overhead of the extra parameter is
unjustified.
This patch thus removes the zone parameter from next_zones_zonelist().
Both callers happen to be in the same header file, so it's simple to add
the zoneref dereference inline. We save some bytes of code size.
add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-105 (-105)
function old new delta
nr_free_zone_pages 129 115 -14
__alloc_pages_nodemask 2300 2285 -15
get_page_from_freelist 2652 2576 -76
add/remove: 0/0 grow/shrink: 1/0 up/down: 10/0 (10)
function old new delta
try_to_compact_pages 569 579 +10
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Found it when I want to jump to the definition of MIGRATE_RESERVE ctags.
Signed-off-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When we debug something, we'd like to insert some information to every
page. For this purpose, we sometimes modify struct page itself. But,
this has drawbacks. First, it requires re-compile. This makes us
hesitate to use the powerful debug feature so development process is
slowed down. And, second, sometimes it is impossible to rebuild the
kernel due to third party module dependency. At third, system behaviour
would be largely different after re-compile, because it changes size of
struct page greatly and this structure is accessed by every part of
kernel. Keeping this as it is would be better to reproduce errornous
situation.
This feature is intended to overcome above mentioned problems. This
feature allocates memory for extended data per page in certain place
rather than the struct page itself. This memory can be accessed by the
accessor functions provided by this code. During the boot process, it
checks whether allocation of huge chunk of memory is needed or not. If
not, it avoids allocating memory at all. With this advantage, we can
include this feature into the kernel in default and can avoid rebuild and
solve related problems.
Until now, memcg uses this technique. But, now, memcg decides to embed
their variable to struct page itself and it's code to extend struct page
has been removed. I'd like to use this code to develop debug feature, so
this patch resurrect it.
To help these things to work well, this patch introduces two callbacks for
clients. One is the need callback which is mandatory if user wants to
avoid useless memory allocation at boot-time. The other is optional, init
callback, which is used to do proper initialization after memory is
allocated. Detailed explanation about purpose of these functions is in
code comment. Please refer it.
Others are completely same with previous extension code in memcg.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Dave Hansen <dave@sr71.net>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Jungsoo Son <jungsoo.son@lge.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory cgroups used to have 5 per-page pointers. To allow users to
disable that amount of overhead during runtime, those pointers were
allocated in a separate array, with a translation layer between them and
struct page.
There is now only one page pointer remaining: the memcg pointer, that
indicates which cgroup the page is associated with when charged. The
complexity of runtime allocation and the runtime translation overhead is
no longer justified to save that *potential* 0.19% of memory. With
CONFIG_SLUB, page->mem_cgroup actually sits in the doubleword padding
after the page->private member and doesn't even increase struct page,
and then this patch actually saves space. Remaining users that care can
still compile their kernels without CONFIG_MEMCG.
text data bss dec hex filename
8828345 1725264 983040 11536649 b00909 vmlinux.old
8827425 1725264 966656 11519345 afc571 vmlinux.new
[mhocko@suse.cz: update Documentation/cgroups/memory.txt]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Vladimir Davydov <vdavydov@parallels.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Konstantin Khlebnikov <koct9i@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Before describing bugs itself, I first explain definition of freepage.
1. pages on buddy list are counted as freepage.
2. pages on isolate migratetype buddy list are *not* counted as freepage.
3. pages on cma buddy list are counted as CMA freepage, too.
Now, I describe problems and related patch.
Patch 1: There is race conditions on getting pageblock migratetype that
it results in misplacement of freepages on buddy list, incorrect
freepage count and un-availability of freepage.
Patch 2: Freepages on pcp list could have stale cached information to
determine migratetype of buddy list to go. This causes misplacement of
freepages on buddy list and incorrect freepage count.
Patch 4: Merging between freepages on different migratetype of
pageblocks will cause freepages accouting problem. This patch fixes it.
Without patchset [3], above problem doesn't happens on my CMA allocation
test, because CMA reserved pages aren't used at all. So there is no
chance for above race.
With patchset [3], I did simple CMA allocation test and get below
result:
- Virtual machine, 4 cpus, 1024 MB memory, 256 MB CMA reservation
- run kernel build (make -j16) on background
- 30 times CMA allocation(8MB * 30 = 240MB) attempts in 5 sec interval
- Result: more than 5000 freepage count are missed
With patchset [3] and this patchset, I found that no freepage count are
missed so that I conclude that problems are solved.
On my simple memory offlining test, these problems also occur on that
environment, too.
This patch (of 4):
There are two paths to reach core free function of buddy allocator,
__free_one_page(), one is free_one_page()->__free_one_page() and the
other is free_hot_cold_page()->free_pcppages_bulk()->__free_one_page().
Each paths has race condition causing serious problems. At first, this
patch is focused on first type of freepath. And then, following patch
will solve the problem in second type of freepath.
In the first type of freepath, we got migratetype of freeing page
without holding the zone lock, so it could be racy. There are two cases
of this race.
1. pages are added to isolate buddy list after restoring orignal
migratetype
CPU1 CPU2
get migratetype => return MIGRATE_ISOLATE
call free_one_page() with MIGRATE_ISOLATE
grab the zone lock
unisolate pageblock
release the zone lock
grab the zone lock
call __free_one_page() with MIGRATE_ISOLATE
freepage go into isolate buddy list,
although pageblock is already unisolated
This may cause two problems. One is that we can't use this page anymore
until next isolation attempt of this pageblock, because freepage is on
isolate buddy list. The other is that freepage accouting could be wrong
due to merging between different buddy list. Freepages on isolate buddy
list aren't counted as freepage, but ones on normal buddy list are
counted as freepage. If merge happens, buddy freepage on normal buddy
list is inevitably moved to isolate buddy list without any consideration
of freepage accouting so it could be incorrect.
2. pages are added to normal buddy list while pageblock is isolated.
It is similar with above case.
This also may cause two problems. One is that we can't keep these
freepages from being allocated. Although this pageblock is isolated,
freepage would be added to normal buddy list so that it could be
allocated without any restriction. And the other problem is same as
case 1, that it, incorrect freepage accouting.
This race condition would be prevented by checking migratetype again
with holding the zone lock. Because it is somewhat heavy operation and
it isn't needed in common case, we want to avoid rechecking as much as
possible. So this patch introduce new variable, nr_isolate_pageblock in
struct zone to check if there is isolated pageblock. With this, we can
avoid to re-check migratetype in common case and do it only if there is
isolated pageblock or migratetype is MIGRATE_ISOLATE. This solve above
mentioned problems.
Changes from v3:
Add one more check in free_one_page() that checks whether migratetype is
MIGRATE_ISOLATE or not. Without this, abovementioned case 1 could happens.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Laura Abbott <lauraa@codeaurora.org>
Cc: Heesub Shin <heesub.shin@samsung.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ritesh Harjani <ritesh.list@gmail.com>
Cc: Gioh Kim <gioh.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Page reclaim tests zone_is_reclaim_dirty(), but the site that actually
sets this state does zone_set_flag(zone, ZONE_TAIL_LRU_DIRTY), sending the
reader through layers indirection just to track down a simple bit.
Remove all zone flag wrappers and just use bitops against zone->flags
directly. It's just as readable and the lines are barely any longer.
Also rename ZONE_TAIL_LRU_DIRTY to ZONE_DIRTY to match ZONE_WRITEBACK, and
remove the zone_flags_t typedef.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The fair zone allocation policy round-robins allocations between zones
within a node to avoid age inversion problems during reclaim. If the
first allocation fails, the batch counts are reset and a second attempt
made before entering the slow path.
One assumption made with this scheme is that batches expire at roughly
the same time and the resets each time are justified. This assumption
does not hold when zones reach their low watermark as the batches will
be consumed at uneven rates. Allocation failure due to watermark
depletion result in additional zonelist scans for the reset and another
watermark check before hitting the slowpath.
On UMA, the benefit is negligible -- around 0.25%. On 4-socket NUMA
machine it's variable due to the variability of measuring overhead with
the vmstat changes. The system CPU overhead comparison looks like
3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanilla vmstat-v5 lowercost-v5
User 746.94 774.56 802.00
System 65336.22 32847.27 40852.33
Elapsed 27553.52 27415.04 27368.46
However it is worth noting that the overall benchmark still completed
faster and intuitively it makes sense to take as few passes as possible
through the zonelists.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
zone->pages_scanned is a write-intensive cache line during page reclaim
and it's also updated during page free. Move the counter into vmstat to
take advantage of the per-cpu updates and do not update it in the free
paths unless necessary.
On a small UMA machine running tiobench the difference is marginal. On
a 4-node machine the overhead is more noticable. Note that automatic
NUMA balancing was disabled for this test as otherwise the system CPU
overhead is unpredictable.
3.16.0-rc3 3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5 vmstat-v5
User 746.94 759.78 774.56
System 65336.22 58350.98 32847.27
Elapsed 27553.52 27282.02 27415.04
Note that the overhead reduction will vary depending on where exactly
pages are allocated and freed.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arrangement of struct zone has changed over time and now it has
reached the point where there is some inappropriate sharing going on.
On x86-64 for example
o The zone->node field is shared with the zone lock and zone->node is
accessed frequently from the page allocator due to the fair zone
allocation policy.
o span_seqlock is almost never used by shares a line with free_area
o Some zone statistics share a cache line with the LRU lock so
reclaim-intensive and allocator-intensive workloads can bounce the cache
line on a stat update
This patch rearranges struct zone to put read-only and read-mostly
fields together and then splits the page allocator intensive fields, the
zone statistics and the page reclaim intensive fields into their own
cache lines. Note that the type of lowmem_reserve changes due to the
watermark calculations being signed and avoiding a signed/unsigned
conversion there.
On the test configuration I used the overall size of struct zone shrunk
by one cache line. On smaller machines, this is not likely to be
noticable. However, on a 4-node NUMA machine running tiobench the
system CPU overhead is reduced by this patch.
3.16.0-rc3 3.16.0-rc3
vanillarearrange-v5r9
User 746.94 759.78
System 65336.22 58350.98
Elapsed 27553.52 27282.02
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In original code, zone_movable_is_highmem() assumes ZONE_MOVABLE not
highmem if CONFIG_HAVE_MEMBLOCK_NODE_MAP is not set. In online_pages,
it extracts pages from the previous zone before ZONE_MOVABLE. Which is
logically inconsistent:
If HAVE_MEMBLOCK_NODE_MAP is turned off but HIGHMEM is on,
zone_movable_is_highmem() makes movable zone not highmem, but
online_pages() extracts pages from ZONE_HIGHMEM.
This inconsistency doesn't cause real problem currently, because all
architectures support online_pages also have HAVE_MEMBLOCK_NODE_MAP.
However, fixing it makes code clear, and also helps futher coding.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Zhang Zhen <zhangzhen@huawei.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Jiang Liu <liuj97@gmail.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
X86 prefers the use of unsigned types for iterators and there is a
tendency to mix whether a signed or unsigned type if used for page order.
This converts a number of sites in mm/page_alloc.c to use unsigned int for
order where possible.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>