* remotes/origin/tmp-2f0de51:
Linux 4.4.38
esp6: Fix integrity verification when ESN are used
esp4: Fix integrity verification when ESN are used
ipv4: Set skb->protocol properly for local output
ipv6: Set skb->protocol properly for local output
Don't feed anything but regular iovec's to blk_rq_map_user_iov
constify iov_iter_count() and iter_is_iovec()
sparc64: fix compile warning section mismatch in find_node()
sparc64: Fix find_node warning if numa node cannot be found
sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
net: ping: check minimum size on ICMP header length
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
packet: fix race condition in packet_set_ring
net/dccp: fix use-after-free in dccp_invalid_packet
netlink: Do not schedule work from sk_destruct
netlink: Call cb->done from a worker thread
net/sched: pedit: make sure that offset is valid
net, sched: respect rcu grace period on cls destruction
net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change
l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
rtnetlink: fix FDB size computation
af_unix: conditionally use freezable blocking calls in read
net: sky2: Fix shutdown crash
ip6_tunnel: disable caching when the traffic class is inherited
net: check dead netns for peernet2id_alloc()
virtio-net: add a missing synchronize_net()
Linux 4.4.37
arm64: suspend: Reconfigure PSTATE after resume from idle
arm64: mm: Set PSTATE.PAN from the cpu_enable_pan() call
arm64: cpufeature: Schedule enable() calls instead of calling them via IPI
pwm: Fix device reference leak
mwifiex: printk() overflow with 32-byte SSIDs
PCI: Set Read Completion Boundary to 128 iff Root Port supports it (_HPX)
PCI: Export pcie_find_root_port
rcu: Fix soft lockup for rcu_nocb_kthread
ALSA: pcm : Call kill_fasync() in stream lock
x86/traps: Ignore high word of regs->cs in early_fixup_exception()
kasan: update kasan_global for gcc 7
zram: fix unbalanced idr management at hot removal
ARC: Don't use "+l" inline asm constraint
Linux 4.4.36
scsi: mpt3sas: Unblock device after controller reset
flow_dissect: call init_default_flow_dissectors() earlier
mei: fix return value on disconnection
mei: me: fix place for kaby point device ids.
mei: me: disable driver on SPT SPS firmware
drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on
mpi: Fix NULL ptr dereference in mpi_powm() [ver #3]
parisc: Also flush data TLB in flush_icache_page_asm
parisc: Fix race in pci-dma.c
parisc: Fix races in parisc_setup_cache_timing()
NFSv4.x: hide array-bounds warning
apparmor: fix change_hat not finding hat after policy replacement
cfg80211: limit scan results cache size
tile: avoid using clocksource_cyc2ns with absolute cycle count
scsi: mpt3sas: Fix secure erase premature termination
Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y
USB: serial: ftdi_sio: add support for TI CC3200 LaunchPad
USB: serial: cp210x: add ID for the Zone DPMX
usb: chipidea: move the lock initialization to core file
KVM: x86: check for pic and ioapic presence before use
KVM: x86: drop error recovery in em_jmp_far and em_ret_far
iommu/vt-d: Fix IOMMU lookup for SR-IOV Virtual Functions
iommu/vt-d: Fix PASID table allocation
sched: tune: Fix lacking spinlock initialization
UPSTREAM: trace: Update documentation for mono, mono_raw and boot clock
UPSTREAM: trace: Add an option for boot clock as trace clock
UPSTREAM: timekeeping: Add a fast and NMI safe boot clock
ANDROID: goldfish_pipe: fix allmodconfig build
ANDROID: goldfish: goldfish_pipe: fix locking errors
ANDROID: video: goldfishfb: fix platform_no_drv_owner.cocci warnings
ANDROID: goldfish_pipe: fix call_kern.cocci warnings
arm64: rename ranchu defconfig to ranchu64
ANDROID: arch: x86: disable pic for Android toolchain
ANDROID: goldfish_pipe: An implementation of more parallel pipe
ANDROID: goldfish_pipe: bugfixes and performance improvements.
ANDROID: goldfish: Add goldfish sync driver
ANDROID: goldfish: add ranchu defconfigs
ANDROID: goldfish_audio: Clear audio read buffer status after each read
ANDROID: goldfish_events: no extra EV_SYN; register goldfish
ANDROID: goldfish_fb: Set pixclock = 0
ANDROID: goldfish: Enable ACPI-based enumeration for goldfish audio
ANDROID: goldfish: Enable ACPI-based enumeration for goldfish framebuffer
ANDROID: video: goldfishfb: add devicetree bindings
BACKPORT: staging: goldfish: audio: fix compiliation on arm
BACKPORT: Input: goldfish_events - enable ACPI-based enumeration for goldfish events
BACKPORT: goldfish: Enable ACPI-based enumeration for goldfish battery
BACKPORT: drivers: tty: goldfish: Add device tree bindings
BACKPORT: tty: goldfish: support platform_device with id -1
BACKPORT: Input: goldfish_events - add devicetree bindings
BACKPORT: power: goldfish_battery: add devicetree bindings
BACKPORT: staging: goldfish: audio: add devicetree bindings
ANDROID: usb: gadget: function: cleanup: Add blank line after declaration
cpufreq: sched: Fix kernel crash on accessing sysfs file
usb: gadget: f_mtp: simplify ptp NULL pointer check
cgroup: replace unified-hierarchy.txt with a proper cgroup v2 documentation
cgroup: rename Documentation/cgroups/ to Documentation/cgroup-legacy/
cgroup: replace __DEVEL__sane_behavior with cgroup2 fs type
writeback: initialize inode members that track writeback history
mm: page_alloc: generalize the dirty balance reserve
block: fix module reference leak on put_disk() call for cgroups throttle
Linux 4.4.35
netfilter: nft_dynset: fix element timeout for HZ != 1000
IB/cm: Mark stale CM id's whenever the mad agent was unregistered
IB/uverbs: Fix leak of XRC target QPs
IB/core: Avoid unsigned int overflow in sg_alloc_table
IB/mlx5: Fix fatal error dispatching
IB/mlx5: Use cache line size to select CQE stride
IB/mlx4: Fix create CQ error flow
IB/mlx4: Check gid_index return value
PM / sleep: don't suspend parent when async child suspend_{noirq, late} fails
PM / sleep: fix device reference leak in test_suspend
uwb: fix device reference leaks
mfd: core: Fix device reference leak in mfd_clone_cell
iwlwifi: pcie: fix SPLC structure parsing
rtc: omap: Fix selecting external osc
clk: mmp: mmp2: fix return value check in mmp2_clk_init()
clk: mmp: pxa168: fix return value check in pxa168_clk_init()
clk: mmp: pxa910: fix return value check in pxa910_clk_init()
drm/amdgpu: Attach exclusive fence to prime exported bo's. (v5)
crypto: caam - do not register AES-XTS mode on LP units
ext4: sanity check the block and cluster size at mount time
kbuild: Steal gcc's pie from the very beginning
x86/kexec: add -fno-PIE
scripts/has-stack-protector: add -fno-PIE
kbuild: add -fno-PIE
i2c: mux: fix up dependencies
can: bcm: fix warning in bcm_connect/proc_register
mfd: intel-lpss: Do not put device in reset state on suspend
fuse: fix fuse_write_end() if zero bytes were copied
KVM: Disable irq while unregistering user notifier
KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
Linux 4.4.34
sparc64: Delete now unused user copy fixup functions.
sparc64: Delete now unused user copy assembler helpers.
sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.
sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting.
sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.
sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.
sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.
sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting.
sparc64: Convert copy_in_user to accurate exception reporting.
sparc64: Prepare to move to more saner user copy exception handling.
sparc64: Delete __ret_efault.
sparc64: Handle extremely large kernel TLB range flushes more gracefully.
sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.
sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.
sparc64: Fix illegal relative branches in hypervisor patched TLB code.
sparc64: Handle extremely large kernel TSB range flushes sanely.
sparc: Handle negative offsets in arch_jump_label_transform
sparc64 mm: Fix base TSB sizing when hugetlb pages are used
sparc: serial: sunhv: fix a double lock bug
sparc: Don't leak context bits into thread->fault_address
tty: Prevent ldisc drivers from re-using stale tty fields
tcp: take care of truncations done by sk_filter()
ipv4: use new_gw for redirect neigh lookup
net: __skb_flow_dissect() must cap its return value
sock: fix sendmmsg for partial sendmsg
fib_trie: Correct /proc/net/route off by one error
sctp: assign assoc_id earlier in __sctp_connect
ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
ipv6: dccp: fix out of bound access in dccp_v6_err()
dccp: fix out of bound access in dccp_v4_err()
dccp: do not send reset to already closed sockets
tcp: fix potential memory corruption
ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
bgmac: stop clearing DMA receive control register right after it is set
net: mangle zero checksum in skb_checksum_help()
net: clear sk_err_soft in sk_clone_lock()
dctcp: avoid bogus doubling of cwnd after loss
ARM: 8485/1: cpuidle: remove cpu parameter from the cpuidle_ops suspend hook
Linux 4.4.33
netfilter: fix namespace handling in nf_log_proc_dostring
btrfs: qgroup: Prevent qgroup->reserved from going subzero
mmc: mxs: Initialize the spinlock prior to using it
ASoC: sun4i-codec: return error code instead of NULL when create_card fails
ACPI / APEI: Fix incorrect return value of ghes_proc()
i40e: fix call of ndo_dflt_bridge_getlink()
hwrng: core - Don't use a stack buffer in add_early_randomness()
lib/genalloc.c: start search from start of chunk
mei: bus: fix received data size check in NFC fixup
iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path
iommu/amd: Free domain id when free a domain of struct dma_ops_domain
tty/serial: at91: fix hardware handshake on Atmel platforms
dmaengine: at_xdmac: fix spurious flag status for mem2mem transfers
drm/i915: Respect alternate_ddc_pin for all DDI ports
KVM: MIPS: Precalculate MMIO load resume PC
scsi: mpt3sas: Fix for block device of raid exists even after deleting raid disk
scsi: qla2xxx: Fix scsi scan hang triggered if adapter fails during init
iio: orientation: hid-sensor-rotation: Add PM function (fix non working driver)
iio: hid-sensors: Increase the precision of scale to fix wrong reading interpretation.
clk: qoriq: Don't allow CPU clocks higher than starting value
toshiba-wmi: Fix loading the driver on non Toshiba laptops
drbd: Fix kernel_sendmsg() usage - potential NULL deref
usb: gadget: u_ether: remove interrupt throttling
USB: cdc-acm: fix TIOCMIWAIT
staging: nvec: remove managed resource from PS2 driver
Revert "staging: nvec: ps2: change serio type to passthrough"
drivers: staging: nvec: remove bogus reset command for PS/2 interface
staging: iio: ad5933: avoid uninitialized variable in error case
pinctrl: cherryview: Prevent possible interrupt storm on resume
pinctrl: cherryview: Serialize register access in suspend/resume
ARC: timer: rtc: implement read loop in "C" vs. inline asm
s390/hypfs: Use get_free_page() instead of kmalloc to ensure page alignment
coredump: fix unfreezable coredumping task
swapfile: fix memory corruption via malformed swapfile
dib0700: fix nec repeat handling
ASoC: cs4270: fix DAPM stream name mismatch
ALSA: info: Limit the proc text input size
ALSA: info: Return error for invalid read/write
arm64: Enable KPROBES/HIBERNATION/CORESIGHT in defconfig
arm64: kvm: allows kvm cpu hotplug
arm64: KVM: Register CPU notifiers when the kernel runs at HYP
arm64: KVM: Skip HYP setup when already running in HYP
arm64: hyp/kvm: Make hyp-stub reject kvm_call_hyp()
arm64: hyp/kvm: Make hyp-stub extensible
arm64: kvm: Move lr save/restore from do_el2_call into EL1
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
ANDROID: video: adf: Avoid directly referencing user pointers
ANDROID: usb: gadget: audio_source: fix comparison of distinct pointer types
android: binder: support for file-descriptor arrays.
android: binder: support for scatter-gather.
android: binder: add extra size to allocator.
android: binder: refactor binder_transact()
android: binder: support multiple /dev instances.
android: binder: deal with contexts in debugfs.
android: binder: support multiple context managers.
android: binder: split flat_binder_object.
disable aio support in recommended configuration
Linux 4.4.32
scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression
drm/radeon: fix DP mode validation
drm/radeon/dp: add back special handling for NUTMEG
drm/amdgpu: fix DP mode validation
drm/amdgpu/dp: add back special handling for NUTMEG
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
Revert KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
of: silence warnings due to max() usage
packet: on direct_xmit, limit tso and csum to supported devices
sctp: validate chunk len before actually using it
net sched filters: fix notification of filter delete with proper handle
udp: fix IP_CHECKSUM handling
net: sctp, forbid negative length
ipv4: use the right lock for ping_group_range
ipv4: disable BH in set_ping_group_range()
net: add recursion limit to GRO
rtnetlink: Add rtnexthop offload flag to compare mask
bridge: multicast: restore perm router ports on multicast enable
net: pktgen: remove rcu locking in pktgen_change_name()
ipv6: correctly add local routes when lo goes up
ip6_tunnel: fix ip6_tnl_lookup
ipv6: tcp: restore IP6CB for pktoptions skbs
netlink: do not enter direct reclaim from netlink_dump()
packet: call fanout_release, while UNREGISTERING a netdev
net: Add netdev all_adj_list refcnt propagation to fix panic
net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions
net: pktgen: fix pkt_size
net: fec: set mac address unconditionally
tg3: Avoid NULL pointer dereference in tg3_io_error_detected()
ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route
ip6_gre: fix flowi6_proto value in ip6gre_xmit_other()
tcp: fix a compile error in DBGUNDO()
tcp: fix wrong checksum calculation on MTU probing
net: avoid sk_forward_alloc overflows
tcp: fix overflow in __tcp_retransmit_skb()
arm64/kvm: fix build issue on kvm debug
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
Linux 4.4.31
HID: usbhid: add ATEN CS962 to list of quirky devices
ubi: fastmap: Fix add_vol() return value test in ubi_attach_fastmap()
kvm: x86: Check memopp before dereference (CVE-2016-8630)
tty: vt, fix bogus division in csi_J
usb: dwc3: Fix size used in dma_free_coherent()
pwm: Unexport children before chip removal
UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header
Disable "frame-address" warning
smc91x: avoid self-comparison warning
cgroup: avoid false positive gcc-6 warning
drm/exynos: fix error handling in exynos_drm_subdrv_open
mm/cma: silence warnings due to max() usage
ARM: 8584/1: floppy: avoid gcc-6 warning
powerpc/ptrace: Fix out of bounds array access warning
x86/xen: fix upper bound of pmd loop in xen_cleanhighmap()
perf build: Fix traceevent plugins build race
drm/dp/mst: Check peer device type before attempting EDID read
drm/radeon: drop register readback in cayman_cp_int_cntl_setup
drm/radeon/si_dpm: workaround for SI kickers
drm/radeon/si_dpm: Limit clocks on HD86xx part
Revert "drm/radeon: fix DP link training issue with second 4K monitor"
mmc: dw_mmc-pltfm: fix the potential NULL pointer dereference
scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware
scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded
scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices
mac80211: discard multicast and 4-addr A-MSDUs
firewire: net: fix fragmented datagram_size off-by-one
firewire: net: guard against rx buffer overflows
Input: i8042 - add XMG C504 to keyboard reset table
dm mirror: fix read error on recovery after default leg failure
virtio: console: Unlock vqs while freeing buffers
virtio_ring: Make interrupt suppression spec compliant
parisc: Ensure consistent state when switching to kernel stack at syscall entry
ovl: fsync after copy-up
KVM: MIPS: Make ERET handle ERL before EXL
KVM: x86: fix wbinvd_dirty_mask use-after-free
dm: free io_barrier after blk_cleanup_queue call
USB: serial: cp210x: fix tiocmget error handling
tty: limit terminal size to 4M chars
xhci: add restart quirk for Intel Wildcatpoint PCH
hv: do not lose pending heartbeat vmbus packets
vt: clear selection before resizing
Fix potential infoleak in older kernels
GenWQE: Fix bad page access during abort of resource allocation
usb: increase ohci watchdog delay to 275 msec
xhci: use default USB_RESUME_TIMEOUT when resuming ports.
USB: serial: ftdi_sio: add support for Infineon TriBoard TC2X7
USB: serial: fix potential NULL-dereference at probe
usb: gadget: function: u_ether: don't starve tx request queue
mei: txe: don't clean an unprocessed interrupt cause.
ubifs: Fix regression in ubifs_readdir()
ubifs: Abort readdir upon error
btrfs: fix races on root_log_ctx lists
ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
ANDROID: binder: Add strong ref checks
ALSA: hda - Fix headset mic detection problem for two Dell laptops
ALSA: hda - Adding a new group of pin cfg into ALC295 pin quirk table
ALSA: hda - allow 40 bit DMA mask for NVidia devices
ALSA: hda - Raise AZX_DCAPS_RIRB_DELAY handling into top drivers
ALSA: hda - Merge RIRB_PRE_DELAY into CTX_WORKAROUND caps
ALSA: usb-audio: Add quirk for Syntek STK1160
KEYS: Fix short sprintf buffer in /proc/keys show function
mm: memcontrol: do not recurse in direct reclaim
mm/list_lru.c: avoid error-path NULL pointer deref
libxfs: clean up _calc_dquots_per_chunk
h8300: fix syscall restarting
drm/dp/mst: Clear port->pdt when tearing down the i2c adapter
i2c: core: fix NULL pointer dereference under race condition
i2c: xgene: Avoid dma_buffer overrun
arm64:cpufeature ARM64_NCAPS is the indicator of last feature
arm64: hibernate: Refuse to hibernate if the boot cpu is offline
PM / sleep: Add support for read-only sysfs attributes
arm64: kernel: Add support for hibernate/suspend-to-disk
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
PM / Hibernate: Call flush_icache_range() on pages restored in-place
arm64: Add new asm macro copy_page
arm64: Promote KERNEL_START/KERNEL_END definitions to a header file
arm64: kernel: Include _AC definition in page.h
arm64: Change cpu_resume() to enable mmu early then access sleep_sp by va
arm64: kernel: Rework finisher callback out of __cpu_suspend_enter()
arm64: Cleanup SCTLR flags
arm64: Fold proc-macros.S into assembler.h
arm/arm64: KVM: Add hook for C-based stage2 init
arm/arm64: KVM: Detect vGIC presence at runtime
arm64: KVM: Add support for 16-bit VMID
arm: KVM: Make kvm_arm.h friendly to assembly code
arm/arm64: KVM: Remove unreferenced S2_PGD_ORDER
arm64: KVM: debug: Remove spurious inline attributes
ARM: KVM: Cleanup exception injection
arm64: KVM: Remove weak attributes
arm64: KVM: Cleanup asm-offset.c
arm64: KVM: Turn system register numbers to an enum
arm64: KVM: VHE: Patch out use of HVC
arm64: Add ARM64_HAS_VIRT_HOST_EXTN feature
arm/arm64: Add new is_kernel_in_hyp_mode predicate
arm64: KVM: Move away from the assembly version of the world switch
arm64: KVM: Map the kernel RO section into HYP
arm64: KVM: Add compatibility aliases
arm64: KVM: Implement vgic-v3 save/restore
arm64: KVM: Add panic handling
arm64: KVM: HYP mode entry points
arm64: KVM: Implement TLB handling
arm64: KVM: Implement fpsimd save/restore
arm64: KVM: Implement the core world switch
arm64: KVM: Add patchable function selector
arm64: KVM: Implement guest entry
arm64: KVM: Implement debug save/restore
arm64: KVM: Implement 32bit system register save/restore
arm64: KVM: Implement system register save/restore
arm64: KVM: Implement timer save/restore
arm64: KVM: Implement vgic-v2 save/restore
arm64: KVM: Add a HYP-specific header file
KVM: arm/arm64: vgic-v3: Make the LR indexing macro public
arm64: Add macros to read/write system registers
Linux 4.4.30
Revert "fix minor infoleak in get_user_ex()"
Revert "x86/mm: Expand the exception table logic to allow new handling options"
Linux 4.4.29
ARM: pxa: pxa_cplds: fix interrupt handling
powerpc/nvram: Fix an incorrect partition merge
mpt3sas: Don't spam logs if logging level is 0
perf symbols: Fixup symbol sizes before picking best ones
perf symbols: Check symbol_conf.allow_aliases for kallsyms loading too
perf hists browser: Fix event group display
clk: divider: Fix clk_divider_round_rate() to use clk_readl()
clk: qoriq: fix a register offset error
s390/con3270: fix insufficient space padding
s390/con3270: fix use of uninitialised data
s390/cio: fix accidental interrupt enabling during resume
x86/mm: Expand the exception table logic to allow new handling options
dmaengine: ipu: remove bogus NO_IRQ reference
power: bq24257: Fix use of uninitialized pointer bq->charger
staging: r8188eu: Fix scheduling while atomic splat
ASoC: dapm: Fix kcontrol creation for output driver widget
ASoC: dapm: Fix value setting for _ENUM_DOUBLE MUX's second channel
ASoC: dapm: Fix possible uninitialized variable in snd_soc_dapm_get_volsw()
ASoC: topology: Fix error return code in soc_tplg_dapm_widget_create()
hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
crypto: arm/ghash-ce - add missing async import/export
crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
mwifiex: correct aid value during tdls setup
spi: spi-fsl-dspi: Drop extra spi_master_put in device remove function
ARM: clk-imx35: fix name for ckil clk
uio: fix dmem_region_start computation
genirq/generic_chip: Add irq_unmap callback
perf stat: Fix interval output values
powerpc/eeh: Null check uses of eeh_pe_bus_get
tunnels: Remove encapsulation offloads on decap.
tunnels: Don't apply GRO to multiple layers of encapsulation.
ipip: Properly mark ipip GRO packets as encapsulated.
posix_acl: Clear SGID bit when setting file permissions
brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
mm/hugetlb: fix memory offline with hugepage size > memory block size
drm/i915: Unalias obj->phys_handle and obj->userptr
drm/i915: Account for TSEG size when determining 865G stolen base
Revert "drm/i915: Check live status before reading edid"
drm/i915/gen9: fix the WaWmMemoryReadLatency implementation
xenbus: don't look up transaction IDs for ordinary writes
drm/vmwgfx: Limit the user-space command buffer size
drm/radeon: change vblank_time's calculation method to reduce computational error.
drm/radeon/si/dpm: fix phase shedding setup
drm/radeon: narrow asic_init for virtualization
drm/amdgpu: change vblank_time's calculation method to reduce computational error.
drm/amdgpu/dce11: add missing drm_mode_config_cleanup call
drm/amdgpu/dce11: disable hpd on local panels
drm/amdgpu/dce8: disable hpd on local panels
drm/amdgpu/dce10: disable hpd on local panels
drm/amdgpu: fix IB alignment for UVD
drm/prime: Pass the right module owner through to dma_buf_export()
Linux 4.4.28
target: Don't override EXTENDED_COPY xcopy_pt_cmd SCSI status code
target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE
target: Re-add missing SCF_ACK_KREF assignment in v4.1.y
ubifs: Fix xattr_names length in exit paths
jbd2: fix incorrect unlock on j_list_lock
ext4: do not advertise encryption support when disabled
mmc: rtsx_usb_sdmmc: Handle runtime PM while changing the led
mmc: rtsx_usb_sdmmc: Avoid keeping the device runtime resumed when unused
mmc: core: Annotate cmd_hdr as __le32
powerpc/mm: Prevent unlikely crash in copro_calculate_slb()
ceph: fix error handling in ceph_read_iter
arm64: kernel: Init MDCR_EL2 even in the absence of a PMU
arm64: percpu: rewrite ll/sc loops in assembly
memstick: rtsx_usb_ms: Manage runtime PM when accessing the device
memstick: rtsx_usb_ms: Runtime resume the device when polling for cards
isofs: Do not return EACCES for unknown filesystems
irqchip/gic-v3-its: Fix entry size mask for GITS_BASER
s390/mm: fix gmap tlb flush issues
Using BUG_ON() as an assert() is _never_ acceptable
mm: filemap: fix mapping->nrpages double accounting in fuse
mm: workingset: fix crash in shadow node shrinker caused by replace_page_cache_page()
acpi, nfit: check for the correct event code in notifications
net/mlx4_core: Allow resetting VF admin mac to zero
bnx2x: Prevent false warning for lack of FC NPIV
PKCS#7: Don't require SpcSpOpusInfo in Authenticode pkcs7 signatures
hpsa: correct skipping masked peripherals
sd: Fix rw_max for devices that report an optimal xfer size
irqchip/gicv3: Handle loop timeout proper
kvm: x86: memset whole irq_eoi
x86/e820: Don't merge consecutive E820_PRAM ranges
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
Fix regression which breaks DFS mounting
Cleanup missing frees on some ioctls
Do not send SMB3 SET_INFO request if nothing is changing
SMB3: GUIDs should be constructed as random but valid uuids
Set previous session id correctly on SMB3 reconnect
Display number of credits available
Clarify locking of cifs file and tcon structures and make more granular
fs/cifs: keep guid when assigning fid to fileinfo
cifs: Limit the overall credit acquired
fs/super.c: fix race between freeze_super() and thaw_super()
arc: don't leak bits of kernel stack into coredump
lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM
ipc/sem.c: fix complex_count vs. simple op race
mm: filemap: don't plant shadow entries without radix tree node
metag: Only define atomic_dec_if_positive conditionally
scsi: Fix use-after-free
NFSv4.2: Fix a reference leak in nfs42_proc_layoutstats_generic
NFSv4: Open state recovery must account for file permission changes
NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid
NFSv4: Don't report revoked delegations as valid in nfs_have_delegation()
sunrpc: fix write space race causing stalls
Input: elantech - add Fujitsu Lifebook E556 to force crc_enabled
Input: elantech - force needed quirks on Fujitsu H760
Input: i8042 - skip selftest on ASUS laptops
lib: add "on"/"off" support to kstrtobool
lib: update single-char callers of strtobool()
lib: move strtobool() to kstrtobool()
MIPS: ptrace: Fix regs_return_value for kernel context
MIPS: Fix -mabi=64 build of vdso.lds
ALSA: hda - Fix a failure of micmute led when having multi adcs
cx231xx: fix GPIOs for Pixelview SBTVD hybrid
cx231xx: don't return error on success
mb86a20s: fix demod settings
mb86a20s: fix the locking logic
ovl: copy_up_xattr(): use strnlen
ovl: Fix info leak in ovl_lookup_temp()
fbdev/efifb: Fix 16 color palette entry calculation
scsi: zfcp: spin_lock_irqsave() is not nestable
zfcp: trace full payload of all SAN records (req,resp,iels)
zfcp: fix payload trace length for SAN request&response
zfcp: fix D_ID field with actual value on tracing SAN responses
zfcp: restore tracing of handle for port and LUN with HBA records
zfcp: trace on request for open and close of WKA port
zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace
zfcp: retain trace level for SCSI and HBA FSF response records
zfcp: close window with unblocked rport during rport gone
zfcp: fix ELS/GS request&response length for hardware data router
zfcp: fix fc_host port_type with NPIV
ubi: Deal with interrupted erasures in WL
powerpc/pseries: Fix stack corruption in htpe code
powerpc/64: Fix incorrect return value from __copy_tofrom_user
powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data()
powerpc/powernv: Use CPU-endian hub diag-data type in pnv_eeh_get_and_dump_hub_diag()
powerpc/powernv: Pass CPU-endian PE number to opal_pci_eeh_freeze_clear()
powerpc/vdso64: Use double word compare on pointers
dm crypt: fix crash on exit
dm mpath: check if path's request_queue is dying in activate_path()
dm: return correct error code in dm_resume()'s retry loop
dm: mark request_queue dead before destroying the DM device
perf intel-pt: Fix MTC timestamp calculation for large MTC periods
perf intel-pt: Fix estimated timestamps for cycle-accurate mode
perf intel-pt: Fix snapshot overlap detection decoder errors
pstore/ram: Use memcpy_fromio() to save old buffer
pstore/ram: Use memcpy_toio instead of memcpy
pstore/core: drop cmpxchg based updates
pstore/ramoops: fixup driver removal
parisc: Increase initial kernel mapping size
parisc: Fix kernel memory layout regarding position of __gp
parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
cpufreq: intel_pstate: Fix unsafe HWP MSR access
platform: don't return 0 from platform_get_irq[_byname]() on error
PCI: Mark Atheros AR9580 to avoid bus reset
mmc: sdhci: cast unsigned int to unsigned long long to avoid unexpeted error
mmc: block: don't use CMD23 with very old MMC cards
rtlwifi: Fix missing country code for Great Britain
PM / devfreq: event: remove duplicate devfreq_event_get_drvdata()
clk: imx6: initialize GPU clocks
regulator: tps65910: Work around silicon erratum SWCZ010
mei: me: add kaby point device ids
gpio: mpc8xxx: Correct irq handler function
cgroup: Change from CAP_SYS_NICE to CAP_SYS_RESOURCE for cgroup migration permissions
UPSTREAM: cpu/hotplug: Handle unbalanced hotplug enable/disable
UPSTREAM: arm64: kaslr: fix breakage with CONFIG_MODVERSIONS=y
UPSTREAM: arm64: kaslr: keep modules close to the kernel when DYNAMIC_FTRACE=y
cgroup: Remove leftover instances of allow_attach
BACKPORT: lib: harden strncpy_from_user
CHROMIUM: cgroups: relax permissions on moving tasks between cgroups
CHROMIUM: remove Android's cgroup generic permissions checks
Linux 4.4.27
cfq: fix starvation of asynchronous writes
vfs: move permission checking into notify_change() for utimes(NULL)
dlm: free workqueues after the connections
crypto: vmx - Fix memory corruption caused by p8_ghash
crypto: ghash-generic - move common definitions to a new header file
ext4: release bh in make_indexed_dir
ext4: allow DAX writeback for hole punch
ext4: fix memory leak in ext4_insert_range()
ext4: reinforce check of i_dtime when clearing high fields of uid and gid
ext4: enforce online defrag restriction for encrypted files
scsi: ibmvfc: Fix I/O hang when port is not mapped
scsi: arcmsr: Simplify user_len checking
scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer()
async_pq_val: fix DMA memory leak
reiserfs: switch to generic_{get,set,remove}xattr()
reiserfs: Unlock superblock before calling reiserfs_quota_on_mount()
ASoC: Intel: Atom: add a missing star in a memcpy call
brcmfmac: fix memory leak in brcmf_fill_bss_param
i40e: avoid NULL pointer dereference and recursive errors on early PCI error
fuse: fix killing s[ug]id in setattr
fuse: invalidate dir dentry after chmod
fuse: listxattr: verify xattr list
drivers: base: dma-mapping: page align the size when unmap_kernel_range
btrfs: assign error values to the correct bio structs
serial: 8250_dw: Check the data->pclk when get apb_pclk
arm64: Use PoU cache instr for I/D coherency
arm64: mm: add code to safely replace TTBR1_EL1
arm64: mm: place __cpu_setup in .text
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: head.S: use memset to clear BSS
arm64: mm: specialise pagetable allocators
arm64: mm: remove pointless PAGE_MASKing
asm-generic: Fix local variable shadow in __set_fixmap_offset
arm64: mm: fold alternatives into .init
ARM: 8511/1: ARM64: kernel: PSCI: move PSCI idle management code to drivers/firmware
ARM: 8481/2: drivers: psci: replace psci firmware calls
ARM: 8480/2: arm64: add implementation for arm-smccc
ARM: 8479/2: add implementation for arm-smccc
ARM: 8478/2: arm/arm64: add arm-smccc
ARM: 8510/1: rework ARM_CPU_SUSPEND dependencies
ARM: 8458/1: bL_switcher: add GIC dependency
Linux 4.4.26
mm: remove gup_flags FOLL_WRITE games from __get_user_pages()
x86/build: Build compressed x86 kernels as PIE
arm64: Remove stack duplicating code from jprobes
arm64: kprobes: Add KASAN instrumentation around stack accesses
arm64: kprobes: Cleanup jprobe_return
arm64: kprobes: Fix overflow when saving stack
arm64: kprobes: WARN if attempting to step with PSTATE.D=1
kprobes: Add arm64 case in kprobe example module
arm64: Add kernel return probes support (kretprobes)
arm64: Add trampoline code for kretprobes
arm64: kprobes instruction simulation support
arm64: Treat all entry code as non-kprobe-able
arm64: Blacklist non-kprobe-able symbol
arm64: Kprobes with single stepping support
arm64: add conditional instruction simulation support
arm64: Add more test functions to insn.c
arm64: Add HAVE_REGS_AND_STACK_ACCESS_API feature
Linux 4.4.25
tpm_crb: fix crb_req_canceled behavior
tpm: fix a race condition in tpm2_unseal_trusted()
ima: use file_dentry()
ARM: cpuidle: Fix error return code
ARM: dts: MSM8064 remove flags from SPMI/MPP IRQs
ARM: dts: mvebu: armada-390: add missing compatibility string and bracket
x86/dumpstack: Fix x86_32 kernel_stack_pointer() previous stack access
x86/irq: Prevent force migration of irqs which are not in the vector domain
x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation
KVM: PPC: BookE: Fix a sanity check
KVM: MIPS: Drop other CPU ASIDs on guest MMU changes
KVM: PPC: Book3s PR: Allow access to unprivileged MMCR2 register
mfd: wm8350-i2c: Make sure the i2c regmap functions are compiled
mfd: 88pm80x: Double shifting bug in suspend/resume
mfd: atmel-hlcdc: Do not sleep in atomic context
mfd: rtsx_usb: Avoid setting ucr->current_sg.status
ALSA: usb-line6: use the same declaration as definition in header for MIDI manufacturer ID
ALSA: usb-audio: Extend DragonFly dB scale quirk to cover other variants
ALSA: ali5451: Fix out-of-bound position reporting
timekeeping: Fix __ktime_get_fast_ns() regression
time: Add cycles to nanoseconds translation
mm: Fix build for hardened usercopy
ANDROID: binder: Clear binder and cookie when setting handle in flat binder struct
ANDROID: binder: Add strong ref checks
UPSTREAM: staging/android/ion : fix a race condition in the ion driver
ANDROID: android-base: CONFIG_HARDENED_USERCOPY=y
UPSTREAM: fs/proc/kcore.c: Add bounce buffer for ktext data
UPSTREAM: fs/proc/kcore.c: Make bounce buffer global for read
BACKPORT: arm64: Correctly bounds check virt_addr_valid
Fix a build breakage in IO latency hist code.
UPSTREAM: efi: include asm/early_ioremap.h not asm/efi.h to get early_memremap
UPSTREAM: ia64: split off early_ioremap() declarations into asm/early_ioremap.h
FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
FROMLIST: arm64: xen: Enable user access before a privcmd hvc call
FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled
FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution
FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
UPSTREAM: arm64: Handle el1 synchronous instruction aborts cleanly
UPSTREAM: arm64: include alternative handling in dcache_by_line_op
UPSTREAM: arm64: fix "dc cvau" cache operation on errata-affected core
UPSTREAM: Revert "arm64: alternatives: add enable parameter to conditional asm macros"
UPSTREAM: arm64: Add new asm macro copy_page
UPSTREAM: arm64: kill ESR_LNX_EXEC
UPSTREAM: arm64: add macro to extract ESR_ELx.EC
UPSTREAM: arm64: mm: mark fault_info table const
UPSTREAM: arm64: fix dump_instr when PAN and UAO are in use
BACKPORT: arm64: Fold proc-macros.S into assembler.h
UPSTREAM: arm64: choose memstart_addr based on minimum sparsemem section alignment
UPSTREAM: arm64/mm: ensure memstart_addr remains sufficiently aligned
UPSTREAM: arm64/kernel: fix incorrect EL0 check in inv_entry macro
UPSTREAM: arm64: Add macros to read/write system registers
UPSTREAM: arm64/efi: refactor EFI init and runtime code for reuse by 32-bit ARM
UPSTREAM: arm64/efi: split off EFI init and runtime code for reuse by 32-bit ARM
UPSTREAM: arm64/efi: mark UEFI reserved regions as MEMBLOCK_NOMAP
BACKPORT: arm64: only consider memblocks with NOMAP cleared for linear mapping
UPSTREAM: mm/memblock: add MEMBLOCK_NOMAP attribute to memblock memory table
ANDROID: dm: android-verity: Remove fec_header location constraint
BACKPORT: audit: consistently record PIDs with task_tgid_nr()
android-base.cfg: Enable kernel ASLR
UPSTREAM: vmlinux.lds.h: allow arch specific handling of ro_after_init data section
UPSTREAM: arm64: spinlock: fix spin_unlock_wait for LSE atomics
UPSTREAM: arm64: avoid TLB conflict with CONFIG_RANDOMIZE_BASE
UPSTREAM: arm64: Only select ARM64_MODULE_PLTS if MODULES=y
sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag
sched/fair: remove printk while schedule is in progress
ANDROID: fs: FS tracepoints to track IO.
sched/walt: Drop arch-specific timer access
ANDROID: fiq_debugger: Pass task parameter to unwind_frame()
eas/sched/fair: Fixing comments in find_best_target.
input: keyreset: switch to orderly_reboot
UPSTREAM: tun: fix transmit timestamp support
UPSTREAM: arch/arm/include/asm/pgtable-3level.h: add pmd_mkclean for THP
net: inet: diag: expose the socket mark to privileged processes.
net: diag: make udp_diag_destroy work for mapped addresses.
net: diag: support SOCK_DESTROY for UDP sockets
net: diag: allow socket bytecode filters to match socket marks
net: diag: slightly refactor the inet_diag_bc_audit error checks.
net: diag: Add support to filter on device index
UPSTREAM: brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap()
Linux 4.4.24
ALSA: hda - Add the top speaker pin config for HP Spectre x360
ALSA: hda - Fix headset mic detection problem for several Dell laptops
ACPICA: acpi_get_sleep_type_data: Reduce warnings
ALSA: hda - Adding one more ALC255 pin definition for headset problem
Revert "usbtmc: convert to devm_kzalloc"
USB: serial: cp210x: Add ID for a Juniper console
Staging: fbtft: Fix bug in fbtft-core
usb: misc: legousbtower: Fix NULL pointer deference
USB: serial: cp210x: fix hardware flow-control disable
dm log writes: fix bug with too large bios
clk: xgene: Add missing parenthesis when clearing divider value
aio: mark AIO pseudo-fs noexec
batman-adv: remove unused callback from batadv_algo_ops struct
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
IB/mlx4: Fix code indentation in QP1 MAD flow
IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
IB/ipoib: Don't allow MC joins during light MC flush
IB/core: Fix use after free in send_leave function
IB/ipoib: Fix memory corruption in ipoib cm mode connect flow
KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write
dmaengine: at_xdmac: fix to pass correct device identity to free_irq()
kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
ASoC: omap-mcpdm: Fix irq resource handling
sysctl: handle error writing UINT_MAX to u32 fields
powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support
brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get()
brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill
brcmfmac: Fix glob_skb leak in brcmf_sdiod_recv_chain
ASoC: Intel: Skylake: Fix error return code in skl_probe()
pNFS/flexfiles: Fix layoutcommit after a commit to DS
pNFS/files: Fix layoutcommit after a commit to DS
NFS: Don't drop CB requests with invalid principals
svc: Avoid garbage replies when pc_func() returns rpc_drop_reply
dmaengine: at_xdmac: fix debug string
fnic: pci_dma_mapping_error() doesn't return an error code
avr32: off by one in at32_init_pio()
ath9k: Fix programming of minCCA power threshold
gspca: avoid unused variable warnings
em28xx-i2c: rt_mutex_trylock() returns zero on failure
NFC: fdp: Detect errors from fdp_nci_create_conn()
iwlmvm: mvm: set correct state in smart-fifo configuration
tile: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO
pstore: drop file opened reference count
blk-mq: actually hook up defer list when running requests
hwrng: omap - Fix assumption that runtime_get_sync will always succeed
ARM: sa1111: fix pcmcia suspend/resume
ARM: shmobile: fix regulator quirk for Gen2
ARM: sa1100: clear reset status prior to reboot
ARM: sa1100: fix 3.6864MHz clock
ARM: sa1100: register clocks early
ARM: sun5i: Fix typo in trip point temperature
regulator: qcom_smd: Fix voltage ranges for pm8x41
regulator: qcom_spmi: Update mvs1/mvs2 switches on pm8941
regulator: qcom_spmi: Add support for get_mode/set_mode on switches
regulator: qcom_spmi: Add support for S4 supply on pm8941
tpm: fix byte-order for the value read by tpm2_get_tpm_pt
printk: fix parsing of "brl=" option
MIPS: uprobes: fix use of uninitialised variable
MIPS: Malta: Fix IOCU disable switch read for MIPS64
MIPS: fix uretprobe implementation
MIPS: uprobes: remove incorrect set_orig_insn
arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP
ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7
irqchip/gicv3: Silence noisy DEBUG_PER_CPU_MAPS warning
gpio: sa1100: fix irq probing for ucb1x00
usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame()
ceph: fix race during filling readdir cache
iwlwifi: mvm: don't use ret when not initialised
iwlwifi: pcie: fix access to scratch buffer
spi: sh-msiof: Avoid invalid clock generator parameters
hwmon: (adt7411) set bit 3 in CFG1 register
nvmem: Declare nvmem_cell_read() consistently
ipvs: fix bind to link-local mcast IPv6 address in backup
tools/vm/slabinfo: fix an unintentional printf
mmc: pxamci: fix potential oops
drivers/perf: arm_pmu: Fix leak in error path
pinctrl: Flag strict is a field in struct pinmux_ops
pinctrl: uniphier: fix .pin_dbg_show() callback
i40e: avoid null pointer dereference
perf/core: Fix pmu::filter_match for SW-led groups
iwlwifi: mvm: fix a few firmware capability checks
usb: musb: fix DMA for host mode
usb: musb: Fix DMA desired mode for Mentor DMA engine
ARM: 8617/1: dma: fix dma_max_pfn()
ARM: 8616/1: dt: Respect property size when parsing CPUs
drm/radeon/si/dpm: add workaround for for Jet parts
drm/nouveau/fifo/nv04: avoid ramht race against cookie insertion
x86/boot: Initialize FPU and X86_FEATURE_ALWAYS even if we don't have CPUID
x86/init: Fix cr4_init_shadow() on CR4-less machines
can: dev: fix deadlock reported after bus-off
mm,ksm: fix endless looping in allocating memory when ksm enable
mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl
cpuset: handle race between CPU hotplug and cpuset_hotplug_work
usercopy: fold builtin_const check into inline function
Linux 4.4.23
hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common()
qxl: check for kmap failures
power: supply: max17042_battery: fix model download bug.
power_supply: tps65217-charger: fix missing platform_set_drvdata()
PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
PM / hibernate: Restore processor state before using per-CPU variables
MIPS: paravirt: Fix undefined reference to smp_bootstrap
MIPS: Add a missing ".set pop" in an early commit
MIPS: Avoid a BUG warning during prctl(PR_SET_FP_MODE, ...)
MIPS: Remove compact branch policy Kconfig entries
MIPS: vDSO: Fix Malta EVA mapping to vDSO page structs
MIPS: SMP: Fix possibility of deadlock when bringing CPUs online
MIPS: Fix pre-r6 emulation FPU initialisation
i2c: qup: skip qup_i2c_suspend if the device is already runtime suspended
i2c-eg20t: fix race between i2c init and interrupt enable
btrfs: ensure that file descriptor used with subvol ioctls is a dir
nl80211: validate number of probe response CSA counters
can: flexcan: fix resume function
mm: delete unnecessary and unsafe init_tlb_ubc()
tracing: Move mutex to protect against resetting of seq data
fix memory leaks in tracing_buffers_splice_read()
power: reset: hisi-reboot: Unmap region obtained by of_iomap
mtd: pmcmsp-flash: Allocating too much in init_msp_flash()
mtd: maps: sa1100-flash: potential NULL dereference
fix fault_in_multipages_...() on architectures with no-op access_ok()
fanotify: fix list corruption in fanotify_get_response()
fsnotify: add a way to stop queueing events on group shutdown
xfs: prevent dropping ioend completions during buftarg wait
autofs: use dentry flags to block walks during expire
autofs races
pwm: Mark all devices as "might sleep"
bridge: re-introduce 'fix parsing of MLDv2 reports'
net: smc91x: fix SMC accesses
Revert "phy: IRQ cannot be shared"
net: dsa: bcm_sf2: Fix race condition while unmasking interrupts
net/mlx5: Added missing check of msg length in verifying its signature
tipc: fix NULL pointer dereference in shutdown()
net/irda: handle iriap_register_lsap() allocation failure
vti: flush x-netns xfrm cache when vti interface is removed
af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'
Revert "af_unix: Fix splice-bind deadlock"
bonding: Fix bonding crash
megaraid: fix null pointer check in megasas_detach_one().
nouveau: fix nv40_perfctr_next() cleanup regression
Staging: iio: adc: fix indent on break statement
iwlegacy: avoid warning about missing braces
ath9k: fix misleading indentation
am437x-vfpe: fix typo in vpfe_get_app_input_index
Add braces to avoid "ambiguous ‘else’" compiler warnings
net: caif: fix misleading indentation
Makefile: Mute warning for __builtin_return_address(>0) for tracing only
Disable "frame-address" warning
Disable "maybe-uninitialized" warning globally
gcov: disable -Wmaybe-uninitialized warning
Kbuild: disable 'maybe-uninitialized' warning for CONFIG_PROFILE_ALL_BRANCHES
kbuild: forbid kernel directory to contain spaces and colons
tools: Support relative directory path for 'O='
Makefile: revert "Makefile: Document ability to make file.lst and file.S" partially
kbuild: Do not run modules_install and install in paralel
ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
ocfs2/dlm: fix race between convert and migration
crypto: echainiv - Replace chaining with multiplication
crypto: skcipher - Fix blkcipher walk OOM crash
crypto: arm/aes-ctr - fix NULL dereference in tail processing
crypto: arm64/aes-ctr - fix NULL dereference in tail processing
tcp: properly scale window in tcp_v[46]_reqsk_send_ack()
tcp: fix use after free in tcp_xmit_retransmit_queue()
tcp: cwnd does not increase in TCP YeAH
ipv6: release dst in ping_v6_sendmsg
ipv4: panic in leaf_walk_rcu due to stale node pointer
reiserfs: fix "new_insert_key may be used uninitialized ..."
Fix build warning in kernel/cpuset.c
include/linux/kernel.h: change abs() macro so it uses consistent return type
Linux 4.4.22
openrisc: fix the fix of copy_from_user()
avr32: fix 'undefined reference to `___copy_from_user'
ia64: copy_from_user() should zero the destination on access_ok() failure
genirq/msi: Fix broken debug output
ppc32: fix copy_from_user()
sparc32: fix copy_from_user()
mn10300: copy_from_user() should zero on access_ok() failure...
nios2: copy_from_user() should zero the tail of destination
openrisc: fix copy_from_user()
parisc: fix copy_from_user()
metag: copy_from_user() should zero the destination on access_ok() failure
alpha: fix copy_from_user()
asm-generic: make copy_from_user() zero the destination properly
mips: copy_from_user() must zero the destination on access_ok() failure
hexagon: fix strncpy_from_user() error return
sh: fix copy_from_user()
score: fix copy_from_user() and friends
blackfin: fix copy_from_user()
cris: buggered copy_from_user/copy_to_user/clear_user
frv: fix clear_user()
asm-generic: make get_user() clear the destination on errors
ARC: uaccess: get_user to zero out dest in cause of fault
s390: get_user() should zero on failure
score: fix __get_user/get_user
nios2: fix __get_user()
sh64: failing __get_user() should zero
m32r: fix __get_user()
mn10300: failing __get_user() and get_user() should zero
fix minor infoleak in get_user_ex()
microblaze: fix copy_from_user()
avr32: fix copy_from_user()
microblaze: fix __get_user()
fix iov_iter_fault_in_readable()
irqchip/atmel-aic: Fix potential deadlock in ->xlate()
genirq: Provide irq_gc_{lock_irqsave,unlock_irqrestore}() helpers
drm: Only use compat ioctl for addfb2 on X86/IA64
drm: atmel-hlcdc: Fix vertical scaling
net: simplify napi_synchronize() to avoid warnings
kconfig: tinyconfig: provide whole choice blocks to avoid warnings
soc: qcom/spm: shut up uninitialized variable warning
pinctrl: at91-pio4: use %pr format string for resource
mmc: dw_mmc: use resource_size_t to store physical address
drm/i915: Avoid pointer arithmetic in calculating plane surface offset
mpssd: fix buffer overflow warning
gma500: remove annoying deprecation warning
ipv6: addrconf: fix dev refcont leak when DAD failed
sched/core: Fix a race between try_to_wake_up() and a woken up task
Revert "wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel"
ath9k: fix using sta->drv_priv before initializing it
md-cluster: make md-cluster also can work when compiled into kernel
xhci: fix null pointer dereference in stop command timeout function
fuse: direct-io: don't dirty ITER_BVEC pages
Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns
crypto: cryptd - initialize child shash_desc on import
arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb()
pinctrl: sunxi: fix uart1 CTS/RTS pins at PG on A23/A33
pinctrl: pistachio: fix mfio pll_lock pinmux
dm crypt: fix error with too large bios
dm log writes: move IO accounting earlier to fix error path
dm log writes: fix check of kthread_run() return value
bus: arm-ccn: Fix XP watchpoint settings bitmask
bus: arm-ccn: Do not attempt to configure XPs for cycle counter
bus: arm-ccn: Fix PMU handling of MN
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: overo: fix gpmc nand on boards with ethernet
ARM: dts: overo: fix gpmc nand cs0 range
ARM: dts: imx6qdl: Fix SPDIF regression
ARM: OMAP3: hwmod data: Add sysc information for DSI
ARM: kirkwood: ib62x0: fix size of u-boot environment partition
ARM: imx6: add missing BM_CLPCR_BYPASS_PMIC_READY setting for imx6sx
ARM: imx6: add missing BM_CLPCR_BYP_MMDC_CH0_LPM_HS setting for imx6ul
ARM: AM43XX: hwmod: Fix RSTST register offset for pruss
cpuset: make sure new tasks conform to the current config of the cpuset
net: thunderx: Fix OOPs with ethtool --register-dump
USB: change bInterval default to 10 ms
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
usb: chipidea: udc: fix NULL ptr dereference in isr_setup_status_phase
usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition
USB: serial: simple: add support for another Infineon flashloader
serial: 8250: added acces i/o products quad and octal serial cards
serial: 8250_mid: fix divide error bug if baud rate is 0
iio: ensure ret is initialized to zero before entering do loop
iio:core: fix IIO_VAL_FRACTIONAL sign handling
iio: accel: kxsd9: Fix scaling bug
iio: fix pressure data output unit in hid-sensor-attributes
iio: accel: bmc150: reset chip at init time
iio: adc: at91: unbreak channel adc channel 3
iio: ad799x: Fix buffered capture for ad7991/ad7995/ad7999
iio: adc: ti_am335x_adc: Increase timeout value waiting for ADC sample
iio: adc: ti_am335x_adc: Protect FIFO1 from concurrent access
iio: adc: rockchip_saradc: reset saradc controller before programming it
iio: proximity: as3935: set up buffer timestamps for non-zero values
iio: accel: kxsd9: Fix raw read return
kvm-arm: Unmap shadow pagetables properly
x86/AMD: Apply erratum 665 on machines without a BIOS fix
x86/paravirt: Do not trace _paravirt_ident_*() functions
ARC: mm: fix build breakage with STRICT_MM_TYPECHECKS
IB/uverbs: Fix race between uverbs_close and remove_one
dm flakey: fix reads to be issued if drop_writes configured
audit: fix exe_file access in audit_exe_compare
mm: introduce get_task_exe_file
kexec: fix double-free when failing to relocate the purgatory
NFSv4.1: Fix the CREATE_SESSION slot number accounting
pNFS: Ensure LAYOUTGET and LAYOUTRETURN are properly serialised
nfsd: Close race between nfsd4_release_lockowner and nfsd4_lock
NFSv4.x: Fix a refcount leak in nfs_callback_up_net
pNFS: The client must not do I/O to the DS if it's lease has expired
kernfs: don't depend on d_find_any_alias() when generating notifications
powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET
powerpc/powernv : Drop reference added by kset_find_obj()
powerpc/tm: do not use r13 for tabort_syscall
tipc: move linearization of buffers to generic code
lightnvm: put bio before return
fscrypto: require write access to mount to set encryption policy
Revert "KVM: x86: fix missed hardware breakpoints"
MIPS: KVM: Check for pfn noslot case
clocksource/drivers/sun4i: Clear interrupts after stopping timer in probe function
fscrypto: add authorization check for setting encryption policy
ext4: use __GFP_NOFAIL in ext4_free_blocks()
Conflicts:
arch/arm/kernel/devtree.c
arch/arm64/Kconfig
arch/arm64/kernel/arm64ksyms.c
arch/arm64/kernel/psci.c
arch/arm64/mm/fault.c
drivers/android/binder.c
drivers/usb/host/xhci-hub.c
fs/ext4/readpage.c
include/linux/mmc/core.h
include/linux/mmzone.h
mm/memcontrol.c
net/core/filter.c
net/netlink/af_netlink.c
net/netlink/af_netlink.h
Change-Id: I99fe7a0914e83e284b11b33185b71448a8999d1f
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
We have allowed migration for only LRU pages until now and it was enough
to make high-order pages. But recently, embedded system(e.g., webOS,
android) uses lots of non-movable pages(e.g., zram, GPU memory) so we
have seen several reports about troubles of small high-order allocation.
For fixing the problem, there were several efforts (e,g,. enhance
compaction algorithm, SLUB fallback to 0-order page, reserved memory,
vmalloc and so on) but if there are lots of non-movable pages in system,
their solutions are void in the long run.
So, this patch is to support facility to change non-movable pages with
movable. For the feature, this patch introduces functions related to
migration to address_space_operations as well as some page flags.
If a driver want to make own pages movable, it should define three
functions which are function pointers of struct
address_space_operations.
1. bool (*isolate_page) (struct page *page, isolate_mode_t mode);
What VM expects on isolate_page function of driver is to return *true*
if driver isolates page successfully. On returing true, VM marks the
page as PG_isolated so concurrent isolation in several CPUs skip the
page for isolation. If a driver cannot isolate the page, it should
return *false*.
Once page is successfully isolated, VM uses page.lru fields so driver
shouldn't expect to preserve values in that fields.
2. int (*migratepage) (struct address_space *mapping,
struct page *newpage, struct page *oldpage, enum migrate_mode);
After isolation, VM calls migratepage of driver with isolated page. The
function of migratepage is to move content of the old page to new page
and set up fields of struct page newpage. Keep in mind that you should
indicate to the VM the oldpage is no longer movable via
__ClearPageMovable() under page_lock if you migrated the oldpage
successfully and returns 0. If driver cannot migrate the page at the
moment, driver can return -EAGAIN. On -EAGAIN, VM will retry page
migration in a short time because VM interprets -EAGAIN as "temporal
migration failure". On returning any error except -EAGAIN, VM will give
up the page migration without retrying in this time.
Driver shouldn't touch page.lru field VM using in the functions.
3. void (*putback_page)(struct page *);
If migration fails on isolated page, VM should return the isolated page
to the driver so VM calls driver's putback_page with migration failed
page. In this function, driver should put the isolated page back to the
own data structure.
4. non-lru movable page flags
There are two page flags for supporting non-lru movable page.
* PG_movable
Driver should use the below function to make page movable under
page_lock.
void __SetPageMovable(struct page *page, struct address_space *mapping)
It needs argument of address_space for registering migration family
functions which will be called by VM. Exactly speaking, PG_movable is
not a real flag of struct page. Rather than, VM reuses page->mapping's
lower bits to represent it.
#define PAGE_MAPPING_MOVABLE 0x2
page->mapping = page->mapping | PAGE_MAPPING_MOVABLE;
so driver shouldn't access page->mapping directly. Instead, driver
should use page_mapping which mask off the low two bits of page->mapping
so it can get right struct address_space.
For testing of non-lru movable page, VM supports __PageMovable function.
However, it doesn't guarantee to identify non-lru movable page because
page->mapping field is unified with other variables in struct page. As
well, if driver releases the page after isolation by VM, page->mapping
doesn't have stable value although it has PAGE_MAPPING_MOVABLE (Look at
__ClearPageMovable). But __PageMovable is cheap to catch whether page
is LRU or non-lru movable once the page has been isolated. Because LRU
pages never can have PAGE_MAPPING_MOVABLE in page->mapping. It is also
good for just peeking to test non-lru movable pages before more
expensive checking with lock_page in pfn scanning to select victim.
For guaranteeing non-lru movable page, VM provides PageMovable function.
Unlike __PageMovable, PageMovable functions validates page->mapping and
mapping->a_ops->isolate_page under lock_page. The lock_page prevents
sudden destroying of page->mapping.
Driver using __SetPageMovable should clear the flag via
__ClearMovablePage under page_lock before the releasing the page.
* PG_isolated
To prevent concurrent isolation among several CPUs, VM marks isolated
page as PG_isolated under lock_page. So if a CPU encounters PG_isolated
non-lru movable page, it can skip it. Driver doesn't need to manipulate
the flag because VM will set/clear it automatically. Keep in mind that
if driver sees PG_isolated page, it means the page have been isolated by
VM so it shouldn't touch page.lru field. PG_isolated is alias with
PG_reclaim flag so driver shouldn't use the flag for own purpose.
[opensource.ganesh@gmail.com: mm/compaction: remove local variable is_lru]
Link: http://lkml.kernel.org/r/20160618014841.GA7422@leo-test
Link: http://lkml.kernel.org/r/1464736881-24886-3-git-send-email-minchan@kernel.org
Signed-off-by: Gioh Kim <gi-oh.kim@profitbricks.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Ganesh Mahendran <opensource.ganesh@gmail.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: John Einar Reitan <john.reitan@foss.arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: bda807d4445414e8e77da704f116bb0880fe0c76
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Change-Id: I03380d927fed84c7464bd5f7c4405bef6b265b69
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
commit 5b398e416e880159fe55eefd93c6588fa072cd66 upstream.
I hit the following hung task when runing a OOM LTP test case with 4.1
kernel.
Call trace:
[<ffffffc000086a88>] __switch_to+0x74/0x8c
[<ffffffc000a1bae0>] __schedule+0x23c/0x7bc
[<ffffffc000a1c09c>] schedule+0x3c/0x94
[<ffffffc000a1eb84>] rwsem_down_write_failed+0x214/0x350
[<ffffffc000a1e32c>] down_write+0x64/0x80
[<ffffffc00021f794>] __ksm_exit+0x90/0x19c
[<ffffffc0000be650>] mmput+0x118/0x11c
[<ffffffc0000c3ec4>] do_exit+0x2dc/0xa74
[<ffffffc0000c46f8>] do_group_exit+0x4c/0xe4
[<ffffffc0000d0f34>] get_signal+0x444/0x5e0
[<ffffffc000089fcc>] do_signal+0x1d8/0x450
[<ffffffc00008a35c>] do_notify_resume+0x70/0x78
The oom victim cannot terminate because it needs to take mmap_sem for
write while the lock is held by ksmd for read which loops in the page
allocator
ksm_do_scan
scan_get_next_rmap_item
down_read
get_next_rmap_item
alloc_rmap_item #ksmd will loop permanently.
There is no way forward because the oom victim cannot release any memory
in 4.1 based kernel. Since 4.6 we have the oom reaper which would solve
this problem because it would release the memory asynchronously.
Nevertheless we can relax alloc_rmap_item requirements and use
__GFP_NORETRY because the allocation failure is acceptable as ksm_do_scan
would just retry later after the lock got dropped.
Such a patch would be also easy to backport to older stable kernels which
do not have oom_reaper.
While we are at it add GFP_NOWARN so the admin doesn't have to be alarmed
by the allocation failure.
Link: http://lkml.kernel.org/r/1474165570-44398-1-git-send-email-zhongjiang@huawei.com
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Suggested-by: Hugh Dickins <hughd@google.com>
Suggested-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Per Process Reclaim tries to avoid reclaiming of
shared pages by passing the target VMA. For anon and
file pages this works, but not for KSM. This is because
vma_address(page) will not return the intended value
resulting in crashes like this
[From 3.10 kernel]
kernel BUG at kernel/mm/rmap.c:534!
(vma_address+0x28/0x2c) from [<c01eff94>] (try_to_unmap_ksm+0x4c/0x170)
(try_to_unmap_ksm+0x4c/0x170) from [<c01e4834>] (try_to_unmap+0x34/0xa4)
(try_to_unmap+0x34/0xa4) from [<c01c8e64>] (shrink_page_list+0x3f0/0xa34)
(shrink_page_list+0x3f0/0xa34) from [<c01c9688>] (reclaim_pages_from_list)
(reclaim_pages_from_list+0xb0/0x100) from [<c0240ed0>] (reclaim_pte_range)
(reclaim_pte_range+0xf0/0x164) from [<c01e79a4>] (walk_page_range+0x1d0)
(walk_page_range+0x1d0/0x260) from [<c0241ccc>] (reclaim_task_anon+0xb0)
(reclaim_task_anon+0xb0/0x114) from [<c01f8d98>] (swap_fn+0x220/0x460)
(swap_fn+0x220/0x460) from [<c0133a38>] (process_one_work+0x294/0x430)
(process_one_work+0x294/0x430) from [<c0134720>] (worker_thread)
(worker_thread+0x224/0x358) from [<c01390d4>] (kthread+0xa0/0xac)
(kthread+0xa0/0xac) from [<c0105f38>] (ret_from_fork+0x14/0x3c)
CRs-Fixed: 984947
Change-Id: I5208fb68372f7af72868e39399bf545fb7b774f3
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
Some pages could be shared by several processes. (ex, libc)
In case of that, it's too bad to reclaim them from the beginnig.
This patch causes VM to keep them on memory until last task
try to reclaim them so shared pages will be reclaimed only if
all of task has gone swapping out.
This feature doesn't handle non-linear mapping on ramfs because
it's very time-consuming and doesn't make sure of reclaiming and
not common.
Change-Id: I7e5f34f2e947f5db6d405867fe2ad34863ca40f7
Signed-off-by: Sangseok Lee <sangseok.lee@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
Patch-mainline: linux-mm @ 9 May 2013 16:21:27
[vinmenon@codeaurora.org: trivial merge conflict fixes + changes
to make the patch work with 3.18 kernel]
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
KSM is yet another framework which may obfuscate some memory
problems. Use the showmem notifier to show how KSM is being
used to give some insight into potential issues or non-issues.
Change-Id: If82405dc33f212d085e6847f7c511fd4d0a32a10
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
KSM thread to scan pages is getting schedule on definite timeout.
That wakes up CPU from idle state and hence may affect the power
consumption. Provide an optional support to use deferred timer
which suites low-power use-cases.
To enable deferred timers,
$ echo 1 > /sys/kernel/mm/ksm/deferred_timer
Change-Id: I07fe199f97fe1f72f9a9e1b0b757a3ac533719e8
Signed-off-by: Chintan Pandya <cpandya@codeaurora.org>
get_mergeable_page() can only return NULL (also in case of errors) or the
pinned mergeable page. It can't return an error different than NULL.
This optimizes away the unnecessary error check.
Add a return after the "out:" label in the callee to make it more
readable.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing the VM_MERGEABLE check after the page == kpage check won't provide
any meaningful benefit. The !vma->anon_vma check of find_mergeable_vma is
the only superfluous bit in using find_mergeable_vma because the !PageAnon
check of try_to_merge_one_page() implicitly checks for that, but it still
looks cleaner to share the same find_mergeable_vma().
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This just uses the helper function to cleanup the assumption on the
hlist_node internals.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The stable_nodes can become stale at any time if the underlying pages gets
freed. The stable_node gets collected and removed from the stable rbtree
if that is detected during the rbtree lookups.
Don't fail the lookup if running into stale stable_nodes, just restart the
lookup after collecting the stale stable_nodes. Otherwise the CPU spent
in the preparation stage is wasted and the lookup must be repeated at the
next loop potentially failing a second time in a second stale stable_node.
If we don't prune aggressively we delay the merging of the unstable node
candidates and at the same time we delay the freeing of the stale
stable_nodes. Keeping stale stable_nodes around wastes memory and it
can't provide any benefit.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While at it add it to the file and anon walks too.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Petr Holasek <pholasek@redhat.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/
tree since it doesn't work reliably on non-scalar types.
This patch removes the rest of the usages of ACCESS_ONCE, and use the new
READ_ONCE API for the read accesses. This makes things cleaner, instead
of using separate/multiple sets of APIs.
Signed-off-by: Jason Low <jason.low2@hp.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
One bit in ->vm_flags is unused now!
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The core VM already knows about VM_FAULT_SIGBUS, but cannot return a
"you should SIGSEGV" error, because the SIGSEGV case was generally
handled by the caller - usually the architecture fault handler.
That results in lots of duplication - all the architecture fault
handlers end up doing very similar "look up vma, check permissions, do
retries etc" - but it generally works. However, there are cases where
the VM actually wants to SIGSEGV, and applications _expect_ SIGSEGV.
In particular, when accessing the stack guard page, libsigsegv expects a
SIGSEGV. And it usually got one, because the stack growth is handled by
that duplicated architecture fault handler.
However, when the generic VM layer started propagating the error return
from the stack expansion in commit fee7e49d45 ("mm: propagate error
from stack expansion even for guard page"), that now exposed the
existing VM_FAULT_SIGBUS result to user space. And user space really
expected SIGSEGV, not SIGBUS.
To fix that case, we need to add a VM_FAULT_SIGSEGV, and teach all those
duplicate architecture fault handlers about it. They all already have
the code to handle SIGSEGV, so it's about just tying that new return
value to the existing code, but it's all a bit annoying.
This is the mindless minimal patch to do this. A more extensive patch
would be to try to gather up the mostly shared fault handling logic into
one generic helper routine, and long-term we really should do that
cleanup.
Just from this patch, you can generally see that most architectures just
copied (directly or indirectly) the old x86 way of doing things, but in
the meantime that original x86 model has been improved to hold the VM
semaphore for shorter times etc and to handle VM_FAULT_RETRY and other
"newer" things, so it would be a good idea to bring all those
improvements to the generic case and teach other architectures about
them too.
Reported-and-tested-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # "s390 still compiles and boots"
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add calls to the new mmu_notifier_invalidate_range() function to all
places in the VMM that need it.
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Reviewed-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Rik van Riel <riel@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <jweiner@redhat.com>
Cc: Jay Cornwall <Jay.Cornwall@amd.com>
Cc: Oded Gabbay <Oded.Gabbay@amd.com>
Cc: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Oded Gabbay <oded.gabbay@amd.com>
The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().
So:
Rename wait_on_bit and wait_on_bit_lock to
wait_on_bit_action and wait_on_bit_lock_action
to make it explicit that they need an action function.
Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
which are *not* given an action function but implicitly use
a standard one.
The decision to error-out if a signal is pending is now made
based on the 'mode' argument rather than being encoded in the action
function.
All instances of the old wait_on_bit and wait_on_bit_lock which
can use the new version have been changed accordingly and their
action functions have been discarded.
wait_on_bit{_lock} does not return any specific error code in the
event of a signal so the caller must check for non-zero and
interpolate their own error code as appropriate.
The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"
The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.
A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack. So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).
Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS. CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Trinity has reported:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1))
CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W
3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398
lock_acquire (arch/x86/include/asm/current.h:14
kernel/locking/lockdep.c:3602)
_raw_spin_lock (include/linux/spinlock_api_smp.h:143
kernel/locking/spinlock.c:151)
remove_migration_pte (mm/migrate.c:137)
rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699)
remove_migration_ptes (mm/migrate.c:224)
migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126)
migrate_misplaced_page (mm/migrate.c:1733)
__handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925)
handle_mm_fault (mm/memory.c:3948)
__get_user_pages (mm/memory.c:1851)
__mlock_vma_pages_range (mm/mlock.c:255)
__mm_populate (mm/mlock.c:711)
SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791)
I believe this comes about because, whereas collapsing and splitting THP
functions take anon_vma lock in write mode (which excludes concurrent
rmap walks), faulting THP functions (write protection and misplaced
NUMA) do not - and mostly they do not need to.
But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for
an instant (indeed, for a long instant, given the inter-CPU TLB flush in
there), leaves *pmd neither present not trans_huge.
Which can confuse a concurrent rmap walk, as when removing migration
ptes, seen in the dumped trace. Although that rmap walk has a 4k page
to insert, anon_vmas containing THPs are in no way segregated from
4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with
that instant when a trans_huge pmd is temporarily absent.
I don't think we need strengthen the locking at the THP end: it's easily
handled with an ACCESS_ONCE() before testing both conditions.
And since mm_find_pmd() had only one caller who wanted a THP rather than
a pmd, let's slightly repurpose it to fail when it hits a THP or
non-present pmd, and open code split_huge_page_address() again.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Bob Liu <bob.liu@oracle.com>
Cc: Christoph Lameter <cl@gentwo.org>
Cc: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit bf6bddf192 ("mm: introduce compaction and migration for
ballooned pages") introduces page_count(page) into memory compaction
which dereferences page->first_page if PageTail(page).
This results in a very rare NULL pointer dereference on the
aforementioned page_count(page). Indeed, anything that does
compound_head(), including page_count() is susceptible to racing with
prep_compound_page() and seeing a NULL or dangling page->first_page
pointer.
This patch uses Andrea's implementation of compound_trans_head() that
deals with such a race and makes it the default compound_head()
implementation. This includes a read memory barrier that ensures that
if PageTail(head) is true that we return a head page that is neither
NULL nor dangling. The patch then adds a store memory barrier to
prep_compound_page() to ensure page->first_page is set.
This is the safest way to ensure we see the head page that we are
expecting, PageTail(page) is already in the unlikely() path and the
memory barriers are unfortunately required.
Hugetlbfs is the exception, we don't enforce a store memory barrier
during init since no race is possible.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Holger Kiehl <Holger.Kiehl@dwd.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Code that is obj-y (always built-in) or dependent on a bool Kconfig
(built-in or absent) can never be modular. So using module_init as an
alias for __initcall can be somewhat misleading.
Fix these up now, so that we can relocate module_init from init.h into
module.h in the future. If we don't do this, we'd have to add module.h
to obviously non-modular code, and that would be a worse thing.
The audit targets the following module_init users for change:
mm/ksm.c bool KSM
mm/mmap.c bool MMU
mm/huge_memory.c bool TRANSPARENT_HUGEPAGE
mm/mmu_notifier.c bool MMU_NOTIFIER
Note that direct use of __initcall is discouraged, vs. one of the
priority categorized subgroups. As __initcall gets mapped onto
device_initcall, our use of subsys_initcall (which makes sense for these
files) will thus change this registration from level 6-device to level
4-subsys (i.e. slightly earlier).
However no observable impact of that difference has been observed during
testing.
One might think that core_initcall (l2) or postcore_initcall (l3) would
be more appropriate for anything in mm/ but if we look at some actual
init functions themselves, we see things like:
mm/huge_memory.c --> hugepage_init --> hugepage_init_sysfs
mm/mmap.c --> init_user_reserve --> sysctl_user_reserve_kbytes
mm/ksm.c --> ksm_init --> sysfs_create_group
and hence the choice of subsys_initcall (l4) seems reasonable, and at
the same time minimizes the risk of changing the priority too
drastically all at once. We can adjust further in the future.
Also, several instances of missing ";" at EOL are fixed.
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Most of the VM_BUG_ON assertions are performed on a page. Usually, when
one of these assertions fails we'll get a BUG_ON with a call stack and
the registers.
I've recently noticed based on the requests to add a small piece of code
that dumps the page to various VM_BUG_ON sites that the page dump is
quite useful to people debugging issues in mm.
This patch adds a VM_BUG_ON_PAGE(cond, page) which beyond doing what
VM_BUG_ON() does, also dumps the page before executing the actual
BUG_ON.
[akpm@linux-foundation.org: fix up includes]
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, we have an infrastructure in rmap_walk() to handle difference from
variants of rmap traversing functions.
So, just use it in page_referenced().
In this patch, I change following things.
1. remove some variants of rmap traversing functions.
cf> page_referenced_ksm, page_referenced_anon,
page_referenced_file
2. introduce new struct page_referenced_arg and pass it to
page_referenced_one(), main function of rmap_walk, in order to count
reference, to store vm_flags and to check finish condition.
3. mechanical change to use rmap_walk() in page_referenced().
[liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk]
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, we have an infrastructure in rmap_walk() to handle difference from
variants of rmap traversing functions.
So, just use it in try_to_munlock().
In this patch, I change following things.
1. remove some variants of rmap traversing functions.
cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file
2. mechanical change to use rmap_walk() in try_to_munlock().
3. copy and paste comments.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now, we have an infrastructure in rmap_walk() to handle difference from
variants of rmap traversing functions.
So, just use it in try_to_unmap().
In this patch, I change following things.
1. enable rmap_walk() if !CONFIG_MIGRATION.
2. mechanical change to use rmap_walk() in try_to_unmap().
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There are a lot of common parts in traversing functions, but there are
also a little of uncommon parts in it. By assigning proper function
pointer on each rmap_walker_control, we can handle these difference
correctly.
Following are differences we should handle.
1. difference of lock function in anon mapping case
2. nonlinear handling in file mapping case
3. prechecked condition:
checking memcg in page_referenced(),
checking VM_SHARE in page_mkclean()
checking temporary vma in try_to_unmap()
4. exit condition:
checking page_mapped() in try_to_unmap()
So, in this patch, I introduce 4 function pointers to handle above
differences.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In each rmap traverse case, there is some difference so that we need
function pointers and arguments to them in order to handle these
For this purpose, struct rmap_walk_control is introduced in this patch,
and will be extended in following patch. Introducing and extending are
separate, because it clarify changes.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kcalloc returns zeroed memory. There's no need to use this flag.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The use of strict_strtoul() is not preferred, because strict_strtoul() is
obsolete. Thus, kstrtoul() should be used.
Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A CONFIG_DISCONTIGMEM=y m68k config gave
mm/ksm.c: In function `get_kpfn_nid':
mm/ksm.c:492: error: implicit declaration of function `pfn_to_nid'
linux/mmzone.h declares it for CONFIG_SPARSEMEM and CONFIG_FLATMEM, but
expects the arch's asm/mmzone.h to declare it for CONFIG_DISCONTIGMEM
(see arch/mips/include/asm/mmzone.h for example).
Or perhaps it is only expected when CONFIG_NUMA=y: too much of a maze,
and m68k got away without it so far, so fix the build in mm/ksm.c.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I'm not sure why, but the hlist for each entry iterators were conceived
list_for_each_entry(pos, head, member)
The hlist ones were greedy and wanted an extra parameter:
hlist_for_each_entry(tpos, pos, head, member)
Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.
Besides the semantic patch, there was some manual work required:
- Fix up the actual hlist iterators in linux/list.h
- Fix up the declaration of other iterators based on the hlist ones.
- A very small amount of places were using the 'node' parameter, this
was modified to use 'obj->member' instead.
- Coccinelle didn't handle the hlist_for_each_entry_safe iterator
properly, so those had to be fixed up manually.
The semantic patch which is mostly the work of Peter Senna Tschudin is here:
@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
type T;
expression a,c,d,e;
identifier b;
statement S;
@@
-T b;
<+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
...+>
[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It is a pity to have MAX_NUMNODES+MAX_NUMNODES tree roots statically
allocated, particularly when very few users will ever actually tune
merge_across_nodes 0 to use more than 1+1 of those trees. Not a big
deal (only 16kB wasted on each machine with CONFIG_MAXSMP), but a pity.
Start off with 1+1 statically allocated, then if merge_across_nodes is
ever tuned, allocate for nr_node_ids+nr_node_ids. Do not attempt to
free up the extra if it's tuned back, that would be a waste of effort.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In "ksm: remove old stable nodes more thoroughly" I said that I'd never
seen its WARN_ON_ONCE(page_mapped(page)). True at the time of writing,
but it soon appeared once I tried fuller tests on the whole series.
It turned out to be due to the KSM page migration itself: unmerge_and_
remove_all_rmap_items() failed to locate and replace all the KSM pages,
because of that hiatus in page migration when old pte has been replaced
by migration entry, but not yet by new pte. follow_page() finds no page
at that instant, but a KSM page reappears shortly after, without a
fault.
Add FOLL_MIGRATION flag, so follow_page() can do migration_entry_wait()
for KSM's break_cow(). I'd have preferred to avoid another flag, and do
it every time, in case someone else makes the same easy mistake; but did
not find another transgressor (the common get_user_pages() is of course
safe), and cannot be sure that every follow_page() caller is prepared to
sleep - ia64's xencomm_vtop()? Now, THP's wait_split_huge_page() can
already sleep there, since anon_vma locking was changed to mutex, but
maybe that's somehow excluded.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Think of struct rmap_item as an extension of struct page (restricted to
MADV_MERGEABLE areas): there may be a lot of them, we need to keep them
small, especially on 32-bit architectures of limited lowmem.
Siting "int nid" after "unsigned int checksum" works nicely on 64-bit,
making no change to its 64-byte struct rmap_item; but bloats the 32-bit
struct rmap_item from (nicely cache-aligned) 32 bytes to 36 bytes, which
rounds up to 40 bytes once allocated from slab. We'd better avoid that.
Hey, I only just remembered that the anon_vma pointer in struct
rmap_item has no purpose until the rmap_item is hung from a stable tree
node (which has its own nid field); and rmap_item's nid field no purpose
than to say which tree root to tell rb_erase() when unlinking from an
unstable tree.
Double them up in a union. There's just one place where we set anon_vma
early (when we already hold mmap_sem): now we must remove tree_rmap_item
from its unstable tree there, before overwriting nid. No need to
spatter BUG()s around: we'd be seeing oopses if this were wrong.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
An inconsistency emerged in reviewing the NUMA node changes to KSM: when
meeting a page from the wrong NUMA node in a stable tree, we say that
it's okay for comparisons, but not as a leaf for merging; whereas when
meeting a page from the wrong NUMA node in an unstable tree, we bail out
immediately.
Now, it might be that a wrong NUMA node in an unstable tree is more
likely to correlate with instablility (different content, with rbnode
now misplaced) than page migration; but even so, we are accustomed to
instablility in the unstable tree.
Without strong evidence for which strategy is generally better, I'd
rather be consistent with what's done in the stable tree: accept a page
from the wrong NUMA node for comparison, but not as a leaf for merging.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Added slightly more detail to the Documentation of merge_across_nodes, a
few comments in areas indicated by review, and renamed get_ksm_page()'s
argument from "locked" to "lock_it". No functional change.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Complaints are rare, but lockdep still does not understand the way
ksm_memory_callback(MEM_GOING_OFFLINE) takes ksm_thread_mutex, and holds
it until the ksm_memory_callback(MEM_OFFLINE): that appears to be a
problem because notifier callbacks are made under down_read of
blocking_notifier_head->rwsem (so first the mutex is taken while holding
the rwsem, then later the rwsem is taken while still holding the mutex);
but is not in fact a problem because mem_hotplug_mutex is held
throughout the dance.
There was an attempt to fix this with mutex_lock_nested(); but if that
happened to fool lockdep two years ago, apparently it does so no longer.
I had hoped to eradicate this issue in extending KSM page migration not
to need the ksm_thread_mutex. But then realized that although the page
migration itself is safe, we do still need to lock out ksmd and other
users of get_ksm_page() while offlining memory - at some point between
MEM_GOING_OFFLINE and MEM_OFFLINE, the struct pages themselves may
vanish, and get_ksm_page()'s accesses to them become a violation.
So, give up on holding ksm_thread_mutex itself from MEM_GOING_OFFLINE to
MEM_OFFLINE, and add a KSM_RUN_OFFLINE flag, and wait_while_offlining()
checks, to achieve the same lockout without being caught by lockdep.
This is less elegant for KSM, but it's more important to keep lockdep
useful to other users - and I apologize for how long it took to fix.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The new KSM NUMA merge_across_nodes knob introduces a problem, when it's
set to non-default 0: if a KSM page is migrated to a different NUMA node,
how do we migrate its stable node to the right tree? And what if that
collides with an existing stable node?
ksm_migrate_page() can do no more than it's already doing, updating
stable_node->kpfn: the stable tree itself cannot be manipulated without
holding ksm_thread_mutex. So accept that a stable tree may temporarily
indicate a page belonging to the wrong NUMA node, leave updating until the
next pass of ksmd, just be careful not to merge other pages on to a
misplaced page. Note nid of holding tree in stable_node, and recognize
that it will not always match nid of kpfn.
A misplaced KSM page is discovered, either when ksm_do_scan() next comes
around to one of its rmap_items (we now have to go to cmp_and_merge_page
even on pages in a stable tree), or when stable_tree_search() arrives at a
matching node for another page, and this node page is found misplaced.
In each case, move the misplaced stable_node to a list of migrate_nodes
(and use the address of migrate_nodes as magic by which to identify them):
we don't need them in a tree. If stable_tree_search() finds no match for
a page, but it's currently exiled to this list, then slot its stable_node
right there into the tree, bringing all of its mappings with it; otherwise
they get migrated one by one to the original page of the colliding node.
stable_tree_search() is now modelled more like stable_tree_insert(), in
order to handle these insertions of migrated nodes.
remove_node_from_stable_tree(), remove_all_stable_nodes() and
ksm_check_stable_tree() have to handle the migrate_nodes list as well as
the stable tree itself. Less obviously, we do need to prune the list of
stale entries from time to time (scan_get_next_rmap_item() does it once
each full scan): whereas stale nodes in the stable tree get naturally
pruned as searches try to brush past them, these migrate_nodes may get
forgotten and accumulate.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KSM page migration is already supported in the case of memory hotremove,
which takes the ksm_thread_mutex across all its migrations to keep life
simple.
But the new KSM NUMA merge_across_nodes knob introduces a problem, when
it's set to non-default 0: if a KSM page is migrated to a different NUMA
node, how do we migrate its stable node to the right tree? And what if
that collides with an existing stable node?
So far there's no provision for that, and this patch does not attempt to
deal with it either. But how will I test a solution, when I don't know
how to hotremove memory? The best answer is to enable KSM page migration
in all cases now, and test more common cases. With THP and compaction
added since KSM came in, page migration is now mainstream, and it's a
shame that a KSM page can frustrate freeing a page block.
Without worrying about merge_across_nodes 0 for now, this patch gets KSM
page migration working reliably for default merge_across_nodes 1 (but
leave the patch enabling it until near the end of the series).
It's much simpler than I'd originally imagined, and does not require an
additional tier of locking: page migration relies on the page lock, KSM
page reclaim relies on the page lock, the page lock is enough for KSM page
migration too.
Almost all the care has to be in get_ksm_page(): that's the function which
worries about when a stable node is stale and should be freed, now it also
has to worry about the KSM page being migrated.
The only new overhead is an additional put/get/lock/unlock_page when
stable_tree_search() arrives at a matching node: to make sure migration
respects the raised page count, and so does not migrate the page while
we're busy with it here. That's probably avoidable, either by changing
internal interfaces from using kpage to stable_node, or by moving the
ksm_migrate_page() callsite into a page_freeze_refs() section (even if not
swapcache); but this works well, I've no urge to pull it apart now.
(Descents of the stable tree may pass through nodes whose KSM pages are
under migration: being unlocked, the raised page count does not prevent
that, nor need it: it's safe to memcmp against either old or new page.)
You might worry about mremap, and whether page migration's rmap_walk to
remove migration entries will find all the KSM locations where it inserted
earlier: that should already be handled, by the satisfyingly heavy hammer
of move_vma()'s call to ksm_madvise(,,,MADV_UNMERGEABLE,).
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switching merge_across_nodes after running KSM is liable to oops on stale
nodes still left over from the previous stable tree. It's not something
that people will often want to do, but it would be lame to demand a reboot
when they're trying to determine which merge_across_nodes setting is best.
How can this happen? We only permit switching merge_across_nodes when
pages_shared is 0, and usually set run 2 to force that beforehand, which
ought to unmerge everything: yet oopses still occur when you then run 1.
Three causes:
1. The old stable tree (built according to the inverse
merge_across_nodes) has not been fully torn down. A stable node
lingers until get_ksm_page() notices that the page it references no
longer references it: but the page is not necessarily freed as soon as
expected, particularly when swapcache.
Fix this with a pass through the old stable tree, applying
get_ksm_page() to each of the remaining nodes (most found stale and
removed immediately), with forced removal of any left over. Unless the
page is still mapped: I've not seen that case, it shouldn't occur, but
better to WARN_ON_ONCE and EBUSY than BUG.
2. __ksm_enter() has a nice little optimization, to insert the new mm
just behind ksmd's cursor, so there's a full pass for it to stabilize
(or be removed) before ksmd addresses it. Nice when ksmd is running,
but not so nice when we're trying to unmerge all mms: we were missing
those mms forked and inserted behind the unmerge cursor. Easily fixed
by inserting at the end when KSM_RUN_UNMERGE.
3. It is possible for a KSM page to be faulted back from swapcache
into an mm, just after unmerge_and_remove_all_rmap_items() scanned past
it. Fix this by copying on fault when KSM_RUN_UNMERGE: but that is
private to ksm.c, so dissolve the distinction between
ksm_might_need_to_copy() and ksm_does_need_to_copy(), doing it all in
the one call into ksm.c.
A long outstanding, unrelated bugfix sneaks in with that third fix:
ksm_does_need_to_copy() would copy from a !PageUptodate page (implying I/O
error when read in from swap) to a page which it then marks Uptodate. Fix
this case by not copying, letting do_swap_page() discover the error.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some places where get_ksm_page() is used, we need the page to be locked.
When KSM migration is fully enabled, we shall want that to make sure that
the page just acquired cannot be migrated beneath us (raised page count is
only effective when there is serialization to make sure migration
notices). Whereas when navigating through the stable tree, we certainly
do not want to lock each node (raised page count is enough to guarantee
the memcmps, even if page is migrated to another node).
Since we're about to add another use case, add the locked argument to
get_ksm_page() now.
Hmm, what's that rcu_read_lock() about? Complete misunderstanding, I
really got the wrong end of the stick on that! There's a configuration in
which page_cache_get_speculative() can do something cheaper than
get_page_unless_zero(), relying on its caller's rcu_read_lock() to have
disabled preemption for it. There's no need for rcu_read_lock() around
get_page_unless_zero() (and mapping checks) here. Cut out that silliness
before making this any harder to understand.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Memory hotremove's ksm_check_stable_tree() is pitifully inefficient
(restarting whenever it finds a stale node to remove), but rearrange so
that at least it does not needlessly restart from nid 0 each time. And
add a couple of comments: here is why we keep pfn instead of page.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add NUMA() and DO_NUMA() macros to minimize blight of #ifdef
CONFIG_NUMAs (but indeed we don't want to expand struct rmap_item by nid
when not NUMA). Add comment, remove "unsigned" from rmap_item->nid, as
"int nid" elsewhere. Define ksm_merge_across_nodes 1U when #ifndef NUMA
to help optimizing out. Use ?: in get_kpfn_nid(). Adjust a few
comments noticed in ongoing work.
Leave stable_tree_insert()'s rb_linkage until after the node has been
set up, as unstable_tree_search_insert() does: ksm_thread_mutex and page
lock make either way safe, but we're going to copy and I prefer this
precedent.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Petr Holasek <pholasek@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Here's a KSM series, based on mmotm 2013-01-23-17-04: starting with
Petr's v7 "KSM: numa awareness sysfs knob"; then fixing the two issues
we had with that, fully enabling KSM page migration on the way.
(A different kind of KSM/NUMA issue which I've certainly not begun to
address here: when KSM pages are unmerged, there's usually no sense in
preferring to allocate the new pages local to the caller's node.)
This patch:
Introduces new sysfs boolean knob /sys/kernel/mm/ksm/merge_across_nodes
which control merging pages across different numa nodes. When it is set
to zero only pages from the same node are merged, otherwise pages from
all nodes can be merged together (default behavior).
Typical use-case could be a lot of KVM guests on NUMA machine and cpus
from more distant nodes would have significant increase of access
latency to the merged ksm page. Sysfs knob was choosen for higher
variability when some users still prefers higher amount of saved
physical memory regardless of access latency.
Every numa node has its own stable & unstable trees because of faster
searching and inserting. Changing of merge_across_nodes value is
possible only when there are not any ksm shared pages in system.
I've tested this patch on numa machines with 2, 4 and 8 nodes and
measured speed of memory access inside of KVM guests with memory pinned
to one of nodes with this benchmark:
http://pholasek.fedorapeople.org/alloc_pg.c
Population standard deviations of access times in percentage of average
were following:
merge_across_nodes=1
2 nodes 1.4%
4 nodes 1.6%
8 nodes 1.7%
merge_across_nodes=0
2 nodes 1%
4 nodes 0.32%
8 nodes 0.018%
RFC: https://lkml.org/lkml/2011/11/30/91
v1: https://lkml.org/lkml/2012/1/23/46
v2: https://lkml.org/lkml/2012/6/29/105
v3: https://lkml.org/lkml/2012/9/14/550
v4: https://lkml.org/lkml/2012/9/23/137
v5: https://lkml.org/lkml/2012/12/10/540
v6: https://lkml.org/lkml/2012/12/23/154
v7: https://lkml.org/lkml/2012/12/27/225
Hugh notes that this patch brings two problems, whose solution needs
further support in mm/ksm.c, which follows in subsequent patches:
1) switching merge_across_nodes after running KSM is liable to oops
on stale nodes still left over from the previous stable tree;
2) memory hotremove may migrate KSM pages, but there is no provision
here for !merge_across_nodes to migrate nodes to the proper tree.
Signed-off-by: Petr Holasek <pholasek@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Izik Eidus <izik.eidus@ravellosystems.com>
Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Switch ksm to use the new hashtable implementation. This reduces the
amount of generic unrelated code in the ksm module.
Signed-off-by: Sasha Levin <levinsasha928@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When ex-KSM pages are faulted from swap cache, the fault handler is not
capable of re-establishing anon_vma-spanning KSM pages. In this case, a
copy of the page is created instead, just like during a COW break.
These freshly made copies are known to be exclusive to the faulting VMA
and there is no reason to go look for this page in parent and sibling
processes during rmap operations.
Use page_add_new_anon_rmap() for these copies. This also puts them on
the proper LRU lists and marks them SwapBacked, so we can get rid of
doing this ad-hoc in the KSM copy code.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Simon Jeons <simon.jeons@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Satoru Moriya <satoru.moriya@hds.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The rmap walks in ksm.c are like those in rmap.c: they can safely be
done with anon_vma_lock_read().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)
iQIcBAABAgAGBQJQx0kQAAoJEHzG/DNEskfi4fQP/R5PRovayroZALBMLnVJDaLD
Ttr9p40VNXbiJ+MfRgatJjSSJZ4Jl+fC3NEqBhcwVZhckZZb9R2s0WtrSQo5+ZbB
vdRfiuKoCaKM4cSZ08C12uTvsF6xjhjd27CTUlMkyOcDoKxMEFKelv0hocSxe4Wo
xqlv3eF+VsY7kE1BNbgBP06SX4tDpIHRxXfqJPMHaSKQmre+cU0xG2GcEu3QGbHT
DEDTI788YSaWLmBfMC+kWoaQl1+bV/FYvavIAS8/o4K9IKvgR42VzrXmaFaqrbgb
72ksa6xfAi57yTmZHqyGmts06qYeBbPpKI+yIhCMInxA9CY3lPbvHppRf0RQOyzj
YOi4hovGEMJKE+BCILukhJcZ9jCTtS3zut6v1rdvR88f4y7uhR9RfmRfsxuW7PNj
3Rmh191+n0lVWDmhOs2psXuCLJr3LEiA0dFffN1z8REUTtTAZMsj8Rz+SvBNAZDR
hsJhERVeXB6X5uQ5rkLDzbn1Zic60LjVw7LIp6SF2OYf/YKaF8vhyWOA8dyCEu8W
CGo7AoG0BO8tIIr8+LvFe8CweypysZImx4AjCfIs4u9pu/v11zmBvO9NO5yfuObF
BreEERYgTes/UITxn1qdIW4/q+Nr0iKO3CTqsmu6L1GfCz3/XzPGs3U26fUhllqi
Ka0JKgnWvsa6ez6FSzKI
=ivQa
-----END PGP SIGNATURE-----
Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma
Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.
In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.
The most recent set of comparisons available from different people are
mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397
The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.
My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.
These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."
* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...
test_set_oom_score_adj() and compare_swap_oom_score_adj() are used to
specify that current should be killed first if an oom condition occurs in
between the two calls.
The usage is
short oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
...
compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);
to store the thread's oom_score_adj, temporarily change it to the maximum
score possible, and then restore the old value if it is still the same.
This happens to still be racy, however, if the user writes
OOM_SCORE_ADJ_MAX to /proc/pid/oom_score_adj in between the two calls.
The compare_swap_oom_score_adj() will then incorrectly reset the old value
prior to the write of OOM_SCORE_ADJ_MAX.
To fix this, introduce a new oom_flags_t member in struct signal_struct
that will be used for per-thread oom killer flags. KSM and swapoff can
now use a bit in this member to specify that threads should be killed
first in oom conditions without playing around with oom_score_adj.
This also allows the correct oom_score_adj to always be shown when reading
/proc/pid/oom_score.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>