* refs/heads/tmp-f057ff9
Linux 4.4.148
x86/speculation/l1tf: Unbreak !__HAVE_ARCH_PFN_MODIFY_ALLOWED architectures
x86/init: fix build with CONFIG_SWAP=n
x86/speculation/l1tf: Fix up CPU feature flags
x86/mm/kmmio: Make the tracer robust against L1TF
x86/mm/pat: Make set_memory_np() L1TF safe
x86/speculation/l1tf: Make pmd/pud_mknotpresent() invert
x86/speculation/l1tf: Invert all not present mappings
x86/speculation/l1tf: Fix up pte->pfn conversion for PAE
x86/speculation/l1tf: Protect PAE swap entries against L1TF
x86/cpufeatures: Add detection of L1D cache flush support.
x86/speculation/l1tf: Extend 64bit swap file size limit
x86/bugs: Move the l1tf function and define pr_fmt properly
x86/speculation/l1tf: Limit swap file size to MAX_PA/2
x86/speculation/l1tf: Disallow non privileged high MMIO PROT_NONE mappings
mm: fix cache mode tracking in vm_insert_mixed()
mm: Add vm_insert_pfn_prot()
x86/speculation/l1tf: Add sysfs reporting for l1tf
x86/speculation/l1tf: Make sure the first page is always reserved
x86/speculation/l1tf: Protect PROT_NONE PTEs against speculation
x86/speculation/l1tf: Protect swap entries against L1TF
x86/speculation/l1tf: Change order of offset/type in swap entry
mm: x86: move _PAGE_SWP_SOFT_DIRTY from bit 7 to bit 1
x86/mm: Fix swap entry comment and macro
x86/mm: Move swap offset/type up in PTE to work around erratum
x86/speculation/l1tf: Increase 32bit PAE __PHYSICAL_PAGE_SHIFT
x86/irqflags: Provide a declaration for native_save_fl
kprobes/x86: Fix %p uses in error messages
x86/speculation: Protect against userspace-userspace spectreRSB
x86/paravirt: Fix spectre-v2 mitigations for paravirt guests
ARM: dts: imx6sx: fix irq for pcie bridge
IB/ocrdma: fix out of bounds access to local buffer
IB/mlx4: Mark user MR as writable if actual virtual memory is writable
IB/core: Make testing MR flags for writability a static inline function
fix __legitimize_mnt()/mntput() race
fix mntput/mntput race
root dentries need RCU-delayed freeing
scsi: sr: Avoid that opening a CD-ROM hangs with runtime power management enabled
ACPI / LPSS: Add missing prv_offset setting for byt/cht PWM devices
xen/netfront: don't cache skb_shinfo()
parisc: Define mb() and add memory barriers to assembler unlock sequences
parisc: Enable CONFIG_MLONGCALLS by default
fork: unconditionally clear stack on fork
ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV
tpm: fix race condition in tpm_common_write()
ext4: fix check to prevent initializing reserved inodes
Linux 4.4.147
jfs: Fix inconsistency between memory allocation and ea_buf->max_size
i2c: imx: Fix reinit_completion() use
ring_buffer: tracing: Inherit the tracing setting to next ring buffer
ACPI / PCI: Bail early in acpi_pci_add_bus() if there is no ACPI handle
ext4: fix false negatives *and* false positives in ext4_check_descriptors()
netlink: Don't shift on 64 for ngroups
netlink: Don't shift with UB on nlk->ngroups
netlink: Do not subscribe to non-existent groups
nohz: Fix local_timer_softirq_pending()
genirq: Make force irq threading setup more robust
scsi: qla2xxx: Return error when TMF returns
scsi: qla2xxx: Fix ISP recovery on unload
Conflicts:
include/linux/swapfile.h
Removed CONFIG_CRYPTO_ECHAINIV from defconfig files since this upmerge is
adding this config to Kconfig file.
Change-Id: Ide96c29f919d76590c2bdccf356d1d464a892fd7
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
commit 42e4089c7890725fcd329999252dc489b72f2921 upstream
For L1TF PROT_NONE mappings are protected by inverting the PFN in the page
table entry. This sets the high bits in the CPU's address space, thus
making sure to point to not point an unmapped entry to valid cached memory.
Some server system BIOSes put the MMIO mappings high up in the physical
address space. If such an high mapping was mapped to unprivileged users
they could attack low memory by setting such a mapping to PROT_NONE. This
could happen through a special device driver which is not access
protected. Normal /dev/mem is of course access protected.
To avoid this forbid PROT_NONE mappings or mprotect for high MMIO mappings.
Valid page mappings are allowed because the system is then unsafe anyways.
It's not expected that users commonly use PROT_NONE on MMIO. But to
minimize any impact this is only enforced if the mapping actually refers to
a high MMIO address (defined as the MAX_PA-1 bit being set), and also skip
the check for root.
For mmaps this is straight forward and can be handled in vm_insert_pfn and
in remap_pfn_range().
For mprotect it's a bit trickier. At the point where the actual PTEs are
accessed a lot of state has been changed and it would be difficult to undo
on an error. Since this is a uncommon case use a separate early page talk
walk pass for MMIO PROT_NONE mappings that checks for this condition
early. For non MMIO and non PROT_NONE there are no changes.
[dwmw2: Backport to 4.9]
[groeck: Backport to 4.4]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 87744ab3832b83ba71b931f86f9cfdb000d07da5 upstream
vm_insert_mixed() unlike vm_insert_pfn_prot() and vmf_insert_pfn_pmd(),
fails to check the pgprot_t it uses for the mapping against the one
recorded in the memtype tracking tree. Add the missing call to
track_pfn_insert() to preclude cases where incompatible aliased mappings
are established for a given physical address range.
[groeck: Backport to v4.4.y]
Link: http://lkml.kernel.org/r/147328717909.35069.14256589123570653697.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1745cbc5d0dee0749a6bc0ea8e872c5db0074061 upstream
The x86 vvar vma contains pages with differing cacheability
flags. x86 currently implements this by manually inserting all
the ptes using (io_)remap_pfn_range when the vma is set up.
x86 wants to move to using .fault with VM_FAULT_NOPAGE to set up
the mappings as needed. The correct API to use to insert a pfn
in .fault is vm_insert_pfn(), but vm_insert_pfn() can't override the
vma's cache mode, and the HPET page in particular needs to be
uncached despite the fact that the rest of the VMA is cached.
Add vm_insert_pfn_prot() to support varying cacheability within
the same non-COW VMA in a more sane manner.
x86 could alternatively use multiple VMAs, but that's messy,
would break CRIU, and would create unnecessary VMAs that would
waste memory.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Acked-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/d2938d1eb37be7a5e4f86182db646551f11e45aa.1451446564.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-b1c4836
Linux 4.4.129
writeback: safer lock nesting
fanotify: fix logic of events on child
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
autofs: mount point create should honour passed in mode
Don't leak MNT_INTERNAL away from internal mounts
rpc_pipefs: fix double-dput()
hypfs_kill_super(): deal with failed allocations
jffs2_kill_sb(): deal with failed allocations
powerpc/lib: Fix off-by-one in alternate feature patching
powerpc/eeh: Fix enabling bridge MMIO windows
MIPS: memset.S: Fix clobber of v1 in last_fixup
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: EVA & fault support for small_memset
MIPS: uaccess: Add micromips clobbers to bzero invocation
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
ALSA: hda - New VIA controller suppor no-snoop path
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: line6: Use correct endpoint type for midi output
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ext4: fix crashes in dioread_nolock mode
drm/radeon: Fix PCIe lane width calculation
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
vfio/pci: Virtualize Maximum Read Request Size
vfio/pci: Virtualize Maximum Payload Size
vfio-pci: Virtualize PCIe & AF FLR
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
watchdog: f71808e_wdt: Fix WD_EN register read
thermal: imx: Fix race condition in imx_thermal_probe()
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
clk: mvebu: armada-38x: add support for missing clocks
clk: mvebu: armada-38x: add support for 1866MHz variants
mmc: jz4740: Fix race condition in IRQ mask update
iommu/vt-d: Fix a potential memory leak
um: Use POSIX ucontext_t instead of struct ucontext
dmaengine: at_xdmac: fix rare residue corruption
IB/srp: Fix completion vector assignment algorithm
IB/srp: Fix srp_abort()
ALSA: pcm: Fix UAF at PCM release via PCM timer access
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ext4: fail ext4_iget for root directory if unallocated
ext4: don't update checksum of new initialized bitmaps
jbd2: if the journal is aborted then don't allow update of the log tail
random: use a tighter cap in credit_entropy_bits_safe()
thunderbolt: Resume control channel after hibernation image is created
ASoC: ssm2602: Replace reg_default_raw with reg_default
HID: core: Fix size as type u32
HID: Fix hid_report_len usage
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
HID: i2c-hid: fix size check and type usage
usb: dwc3: pci: Properly cleanup resource
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
regmap: Fix reversed bounds check in regmap_raw_write()
xen-netfront: Fix hang on device removal
ARM: dts: at91: sama5d4: fix pinctrl compatible string
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
usb: musb: gadget: misplaced out of bounds check
mm, slab: reschedule cache_reap() on the same CPU
ipc/shm: fix use-after-free of shm file via remap_file_pages()
resource: fix integer overflow at reallocation
fs/reiserfs/journal.c: add missing resierfs_warning() arg
ubi: Reject MLC NAND
ubi: Fix error for write access
ubi: fastmap: Don't flush fastmap work on detach
ubifs: Check ubifs_wbuf_sync() return code
tty: make n_tty_read() always abort if hangup is in progress
x86/hweight: Don't clobber %rdi
x86/hweight: Get rid of the special calling convention
lan78xx: Correctly indicate invalid OTP
slip: Check if rstate is initialized before uncompressing
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
hwmon: (ina2xx) Fix access to uninitialized mutex
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
getname_kernel() needs to make sure that ->name != ->iname in long case
s390/ipl: ensure loadparm valid flag is set
s390/qdio: don't merge ERROR output buffers
s390/qdio: don't retry EQBS after CCQ 96
block/loop: fix deadlock after loop_set_status
Revert "perf tests: Decompress kernel module before objdump"
radeon: hide pointless #warning when compile testing
perf intel-pt: Fix timestamp following overflow
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix sync_switch
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
parisc: Fix out of array access in match_pci_device()
media: v4l2-compat-ioctl32: don't oops on overlay
f2fs: check cap_resource only for data blocks
Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
f2fs: clear PageError on writepage
UPSTREAM: timer: Export destroy_hrtimer_on_stack()
BACKPORT: dm verity: add 'check_at_most_once' option to only validate hashes once
f2fs: call unlock_new_inode() before d_instantiate()
f2fs: refactor read path to allow multiple postprocessing steps
fscrypt: allow synchronous bio decryption
Change-Id: I45f4ac10734d92023b53118d83dcd6c83974a283
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
commit c20cd45eb01748f0fba77a504f956b000df4ea73 upstream.
page_cache_read has been historically using page_cache_alloc_cold to
allocate a new page. This means that mapping_gfp_mask is used as the
base for the gfp_mask. Many filesystems are setting this mask to
GFP_NOFS to prevent from fs recursion issues. page_cache_read is called
from the vm_operations_struct::fault() context during the page fault.
This context doesn't need the reclaim protection normally.
ceph and ocfs2 which call filemap_fault from their fault handlers seem
to be OK because they are not taking any fs lock before invoking generic
implementation. xfs which takes XFS_MMAPLOCK_SHARED is safe from the
reclaim recursion POV because this lock serializes truncate and punch
hole with the page faults and it doesn't get involved in the reclaim.
There is simply no reason to deliberately use a weaker allocation
context when a __GFP_FS | __GFP_IO can be used. The GFP_NOFS protection
might be even harmful. There is a push to fail GFP_NOFS allocations
rather than loop within allocator indefinitely with a very limited
reclaim ability. Once we start failing those requests the OOM killer
might be triggered prematurely because the page cache allocation failure
is propagated up the page fault path and end up in
pagefault_out_of_memory.
We cannot play with mapping_gfp_mask directly because that would be racy
wrt. parallel page faults and it might interfere with other users who
really rely on NOFS semantic from the stored gfp_mask. The mask is also
inode proper so it would even be a layering violation. What we can do
instead is to push the gfp_mask into struct vm_fault and allow fs layer
to overwrite it should the callback need to be called with a different
allocation context.
Initialize the default to (mapping_gfp_mask | __GFP_FS | __GFP_IO)
because this should be safe from the page fault path normally. Why do
we care about mapping_gfp_mask at all then? Because this doesn't hold
only reclaim protection flags but it also might contain zone and
movability restrictions (GFP_DMA32, __GFP_MOVABLE and others) so we have
to respect those.
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit af27d9403f5b80685b79c88425086edccecaf711 upstream.
We get a warning about some slow configurations in randconfig kernels:
mm/memory.c:83:2: error: #warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_cpupid. [-Werror=cpp]
The warning is reasonable by itself, but gets in the way of randconfig
build testing, so I'm hiding it whenever CONFIG_COMPILE_TEST is set.
The warning was added in 2013 in commit 75980e97da ("mm: fold
page->_last_nid into page->flags where possible").
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-4b8fc9f
UPSTREAM: locking: avoid passing around 'thread_info' in mutex debugging code
ANDROID: arm64: fix undeclared 'init_thread_info' error
UPSTREAM: kdb: use task_cpu() instead of task_thread_info()->cpu
Linux 4.4.82
net: account for current skb length when deciding about UFO
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
KVM: arm/arm64: Handle hva aging while destroying the vm
sparc64: Prevent perf from running during super critical sections
udp: consistently apply ufo or fragmentation
revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
revert "net: account for current skb length when deciding about UFO"
packet: fix tp_reserve race in packet_set_ring
net: avoid skb_warn_bad_offload false positives on UFO
tcp: fastopen: tcp_connect() must refresh the route
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
bpf, s390: fix jit branch offset related to ldimm64
net: fix keepalive code vs TCP_FASTOPEN_CONNECT
tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
ANDROID: keychord: Fix for a memory leak in keychord.
ANDROID: keychord: Fix races in keychord_write.
Use %zu to print resid (size_t).
ANDROID: keychord: Fix a slab out-of-bounds read.
Linux 4.4.81
workqueue: implicit ordered attribute should be overridable
net: account for current skb length when deciding about UFO
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
mm: don't dereference struct page fields of invalid pages
signal: protect SIGNAL_UNKILLABLE from unintentional clearing.
lib/Kconfig.debug: fix frv build failure
mm, slab: make sure that KMALLOC_MAX_SIZE will fit into MAX_ORDER
ARM: 8632/1: ftrace: fix syscall name matching
virtio_blk: fix panic in initialization error path
drm/virtio: fix framebuffer sparse warning
scsi: qla2xxx: Get mutex lock before checking optrom_state
phy state machine: failsafe leave invalid RUNNING state
x86/boot: Add missing declaration of string functions
tg3: Fix race condition in tg3_get_stats64().
net: phy: dp83867: fix irq generation
sh_eth: R8A7740 supports packet shecksumming
wext: handle NULL extra data in iwe_stream_add_point better
sparc64: Measure receiver forward progress to avoid send mondo timeout
xen-netback: correctly schedule rate-limited queues
net: phy: Fix PHY unbind crash
net: phy: Correctly process PHY_HALTED in phy_stop_machine()
net/mlx5: Fix command bad flow on command entry allocation failure
sctp: fix the check for _sctp_walk_params and _sctp_walk_errors
sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}()
dccp: fix a memleak for dccp_feat_init err process
dccp: fix a memleak that dccp_ipv4 doesn't put reqsk properly
dccp: fix a memleak that dccp_ipv6 doesn't put reqsk properly
net: ethernet: nb8800: Handle all 4 RGMII modes identically
ipv6: Don't increase IPSTATS_MIB_FRAGFAILS twice in ip6_fragment()
packet: fix use-after-free in prb_retire_rx_blk_timer_expired()
openvswitch: fix potential out of bound access in parse_ct
mcs7780: Fix initialization when CONFIG_VMAP_STACK is enabled
rtnetlink: allocate more memory for dev_set_mac_address()
ipv4: initialize fib_trie prior to register_netdev_notifier call.
ipv6: avoid overflow of offset in ip6_find_1stfragopt
net: Zero terminate ifr_name in dev_ifname().
ipv4: ipv6: initialize treq->txhash in cookie_v[46]_check()
saa7164: fix double fetch PCIe access condition
drm: rcar-du: fix backport bug
f2fs: sanity check checkpoint segno and blkoff
media: lirc: LIRC_GET_REC_RESOLUTION should return microseconds
mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries
iser-target: Avoid isert_conn->cm_id dereference in isert_login_recv_done
iscsi-target: Fix delayed logout processing greater than SECONDS_FOR_LOGOUT_COMP
iscsi-target: Fix initial login PDU asynchronous socket close OOPs
iscsi-target: Fix early sk_data_ready LOGIN_FLAGS_READY race
iscsi-target: Always wait for kthread_should_stop() before kthread exit
target: Avoid mappedlun symlink creation during lun shutdown
media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl
ARM: dts: armada-38x: Fix irq type for pca955
ext4: fix overflow caused by missing cast in ext4_resize_fs()
ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize
mm/page_alloc: Remove kernel address exposure in free_reserved_area()
KVM: async_pf: make rcu irq exit if not triggered from idle task
ASoC: do not close shared backend dailink
ALSA: hda - Fix speaker output from VAIO VPCL14M1R
workqueue: restore WQ_UNBOUND/max_active==1 to be ordered
libata: array underflow in ata_find_dev()
ANDROID: binder: don't queue async transactions to thread.
ANDROID: binder: don't enqueue death notifications to thread todo.
ANDROID: binder: call poll_wait() unconditionally.
android: configs: move quota-related configs to recommended
BACKPORT: arm64: split thread_info from task stack
UPSTREAM: arm64: assembler: introduce ldr_this_cpu
UPSTREAM: arm64: make cpu number a percpu variable
UPSTREAM: arm64: smp: prepare for smp_processor_id() rework
BACKPORT: arm64: move sp_el0 and tpidr_el1 into cpu_suspend_ctx
UPSTREAM: arm64: prep stack walkers for THREAD_INFO_IN_TASK
UPSTREAM: arm64: unexport walk_stackframe
UPSTREAM: arm64: traps: simplify die() and __die()
UPSTREAM: arm64: factor out current_stack_pointer
BACKPORT: arm64: asm-offsets: remove unused definitions
UPSTREAM: arm64: thread_info remove stale items
UPSTREAM: thread_info: include <current.h> for THREAD_INFO_IN_TASK
UPSTREAM: thread_info: factor out restart_block
UPSTREAM: kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function
UPSTREAM: sched/core: Add try_get_task_stack() and put_task_stack()
UPSTREAM: sched/core: Allow putting thread_info into task_struct
UPSTREAM: printk: when dumping regs, show the stack, not thread_info
UPSTREAM: fix up initial thread stack pointer vs thread_info confusion
UPSTREAM: Clarify naming of thread info/stack allocators
ANDROID: sdcardfs: override credential for ioctl to lower fs
Conflicts:
android/configs/android-base.cfg
arch/arm64/Kconfig
arch/arm64/include/asm/suspend.h
arch/arm64/kernel/head.S
arch/arm64/kernel/smp.c
arch/arm64/kernel/suspend.c
arch/arm64/kernel/traps.c
arch/arm64/mm/proc.S
kernel/fork.c
sound/soc/soc-pcm.c
Change-Id: I273e216c94899a838bbd208391c6cbe20b2bf683
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit 3ea277194daaeaa84ce75180ec7c7a2075027a68 upstream.
Stable note for 4.4: The upstream patch patches madvise(MADV_FREE) but 4.4
does not have support for that feature. The changelog is left
as-is but the hunk related to madvise is omitted from the backport.
Nadav Amit identified a theoritical race between page reclaim and
mprotect due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such
as munmap, mremap and madvise.
For some operations like mprotect, it's not necessarily a data integrity
issue but it is a correctness issue as there is a window where an
mprotect that limits access still allows access. For munmap, it's
potentially a data integrity issue although the race is massive as an
munmap, mmap and return to userspace must all complete between the
window when reclaim drops the PTL and flushes the TLB. However, it's
theoritically possible so handle this issue by flushing the mm if
reclaim is potentially currently batching TLB flushes.
Other instances where a flush is required for a present pte should be ok
as either the page lock is held preventing parallel reclaim or a page
reference count is elevated preventing a parallel free leading to
corruption. In the case of page_mkclean there isn't an obvious path
that userspace could take advantage of without using the operations that
are guarded by this patch. Other users such as gup as a race with
reclaim looks just at PTEs. huge page variants should be ok as they
don't race with reclaim. mincore only looks at PTEs. userfault also
should be ok as if a parallel reclaim takes place, it will either fault
the page back in or read some of the data before the flush occurs
triggering a fault.
Note that a variant of this patch was acked by Andy Lutomirski but this
was for the x86 parts on top of his PCID work which didn't make the 4.13
merge window as expected. His ack is dropped from this version and
there will be a follow-on patch on top of PCID that will include his
ack.
[akpm@linux-foundation.org: tweak comments]
[akpm@linux-foundation.org: fix spello]
Link: http://lkml.kernel.org/r/20170717155523.emckq2esjro6hf3z@suse.de
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-77ddb50:
UPSTREAM: usb: gadget: f_fs: avoid out of bounds access on comp_desc
Linux 4.4.74
mm: fix new crash in unmapped_area_topdown()
Allow stack to grow up to address space limit
mm: larger stack guard gap, between vmas
alarmtimer: Rate limit periodic intervals
MIPS: Fix bnezc/jialc return address calculation
usb: dwc3: exynos fix axius clock error path to do cleanup
alarmtimer: Prevent overflow of relative timers
genirq: Release resources in __setup_irq() error path
swap: cond_resched in swap_cgroup_prepare()
mm/memory-failure.c: use compound_head() flags for huge pages
USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
usb: r8a66597-hcd: decrease timeout
usb: r8a66597-hcd: select a different endpoint on timeout
USB: gadget: dummy_hcd: fix hub-descriptor removable fields
pvrusb2: reduce stack usage pvr2_eeprom_analyze()
usb: core: fix potential memory leak in error path during hcd creation
USB: hub: fix SS max number of ports
iio: proximity: as3935: recalibrate RCO after resume
staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
mac80211: fix IBSS presp allocation size
mac80211: fix CSA in IBSS mode
mac80211/wpa: use constant time memory comparison for MACs
mac80211: don't look at the PM bit of BAR frames
vb2: Fix an off by one error in 'vb2_plane_vaddr'
cpufreq: conservative: Allow down_threshold to take values from 1 to 10
can: gs_usb: fix memory leak in gs_cmd_reset()
configfs: Fix race between create_link and configfs_rmdir
UPSTREAM: bpf: don't let ldimm64 leak map addresses on unprivileged
BACKPORT: ext4: fix data exposure after a crash
ANDROID: sdcardfs: remove dead function open_flags_to_access_mode()
ANDROID: android-base.cfg: split out arm64-specific configs
Linux 4.4.73
sparc64: make string buffers large enough
s390/kvm: do not rely on the ILC on kvm host protection fauls
xtensa: don't use linux IRQ #0
tipc: ignore requests when the connection state is not CONNECTED
proc: add a schedule point in proc_pid_readdir()
romfs: use different way to generate fsid for BLOCK or MTD
sctp: sctp_addr_id2transport should verify the addr before looking up assoc
r8152: avoid start_xmit to schedule napi when napi is disabled
r8152: fix rtl8152_post_reset function
r8152: re-schedule napi for tx
nfs: Fix "Don't increment lock sequence ID after NFS4ERR_MOVED"
ravb: unmap descriptors when freeing rings
drm/ast: Fixed system hanged if disable P2A
drm/nouveau: Don't enabling polling twice on runtime resume
parisc, parport_gsc: Fixes for printk continuation lines
net: adaptec: starfire: add checks for dma mapping errors
pinctrl: berlin-bg4ct: fix the value for "sd1a" of pin SCRD0_CRD_PRES
gianfar: synchronize DMA API usage by free_skb_rx_queue w/ gfar_new_page
net/mlx4_core: Avoid command timeouts during VF driver device shutdown
drm/nouveau/fence/g84-: protect against concurrent access to semaphore buffers
drm/nouveau: prevent userspace from deleting client object
ipv6: fix flow labels when the traffic class is non-0
FS-Cache: Initialise stores_lock in netfs cookie
fscache: Clear outstanding writes when disabling a cookie
fscache: Fix dead object requeue
ethtool: do not vzalloc(0) on registers dump
log2: make order_base_2() behave correctly on const input value zero
kasan: respect /proc/sys/kernel/traceoff_on_warning
jump label: pass kbuild_cflags when checking for asm goto support
PM / runtime: Avoid false-positive warnings from might_sleep_if()
ipv6: Fix IPv6 packet loss in scenarios involving roaming + snooping switches
i2c: piix4: Fix request_region size
sierra_net: Add support for IPv6 and Dual-Stack Link Sense Indications
sierra_net: Skip validating irrelevant fields for IDLE LSIs
net: hns: Fix the device being used for dma mapping during TX
NET: mkiss: Fix panic
NET: Fix /proc/net/arp for AX.25
ipv6: Inhibit IPv4-mapped src address on the wire.
ipv6: Handle IPv4-mapped src to in6addr_any dst.
net: xilinx_emaclite: fix receive buffer overflow
net: xilinx_emaclite: fix freezes due to unordered I/O
Call echo service immediately after socket reconnect
staging: rtl8192e: rtl92e_fill_tx_desc fix write to mapped out memory.
ARM: dts: imx6dl: Fix the VDD_ARM_CAP voltage for 396MHz operation
partitions/msdos: FreeBSD UFS2 file systems are not recognized
s390/vmem: fix identity mapping
usb: gadget: f_fs: Fix possibe deadlock
Conflicts:
drivers/usb/gadget/function/f_fs.c
Change-Id: I23106e9fc2c4f2d0b06acce59b781f6c36487fcc
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.
Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.
This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.
Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.
One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications. For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).
Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.
Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.
Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
[gkh: minor build fixes for 4.4]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/include/asm/mmu_context.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1069136
Signed-off-by: Bryan Huntsman <bryanh@codeaurora.org>
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ie9a16debd0578331a66947376f3b787a7bb54d65
This reverts commit 9d6fd2c3e9 ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.
Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
* msm-4.4/tmp-510d0a3f:
Linux 4.4.11
nf_conntrack: avoid kernel pointer value leak in slab name
drm/radeon: fix DP link training issue with second 4K monitor
drm/i915/bdw: Add missing delay during L3 SQC credit programming
drm/i915: Bail out of pipe config compute loop on LPT
drm/radeon: fix PLL sharing on DCE6.1 (v2)
Revert "[media] videobuf2-v4l2: Verify planes array in buffer dequeueing"
Input: max8997-haptic - fix NULL pointer dereference
get_rock_ridge_filename(): handle malformed NM entries
tools lib traceevent: Do not reassign parg after collapse_tree()
qla1280: Don't allocate 512kb of host tags
atomic_open(): fix the handling of create_error
regulator: axp20x: Fix axp22x ldo_io voltage ranges
regulator: s2mps11: Fix invalid selector mask and voltages for buck9
workqueue: fix rebind bound workers warning
ARM: dts: at91: sam9x5: Fix the memory range assigned to the PMC
vfs: rename: check backing inode being equal
vfs: add vfs_select_inode() helper
perf/core: Disable the event on a truncated AUX record
regmap: spmi: Fix regmap_spmi_ext_read in multi-byte case
pinctrl: at91-pio4: fix pull-up/down logic
spi: spi-ti-qspi: Handle truncated frames properly
spi: spi-ti-qspi: Fix FLEN and WLEN settings if bits_per_word is overridden
spi: pxa2xx: Do not detect number of enabled chip selects on Intel SPT
ALSA: hda - Fix broken reconfig
ALSA: hda - Fix white noise on Asus UX501VW headset
ALSA: hda - Fix subwoofer pin on ASUS N751 and N551
ALSA: usb-audio: Yet another Phoneix Audio device quirk
ALSA: usb-audio: Quirk for yet another Phoenix Audio devices (v2)
crypto: testmgr - Use kmalloc memory for RSA input
crypto: hash - Fix page length clamping in hash walk
crypto: qat - fix invalid pf2vf_resp_wq logic
s390/mm: fix asce_bits handling with dynamic pagetable levels
zsmalloc: fix zs_can_compact() integer overflow
ocfs2: fix posix_acl_create deadlock
ocfs2: revert using ocfs2_acl_chmod to avoid inode cluster lock hang
net/route: enforce hoplimit max value
tcp: refresh skb timestamp at retransmit time
net: thunderx: avoid exposing kernel stack
net: fix a kernel infoleak in x25 module
uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h MIME-Version: 1.0
bridge: fix igmp / mld query parsing
net: bridge: fix old ioctl unlocked net device walk
VSOCK: do not disconnect socket when peer has shutdown SEND only
net/mlx4_en: Fix endianness bug in IPV6 csum calculation
net: fix infoleak in rtnetlink
net: fix infoleak in llc
net: fec: only clear a queue's work bit if the queue was emptied
netem: Segment GSO packets on enqueue
sch_dsmark: update backlog as well
sch_htb: update backlog as well
net_sched: update hierarchical backlog too
net_sched: introduce qdisc_replace() helper
gre: do not pull header in ICMP error processing
net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case
samples/bpf: fix trace_output example
bpf: fix check_map_func_compatibility logic
bpf: fix refcnt overflow
bpf: fix double-fdput in replace_map_fd_with_map_ptr()
net/mlx4_en: fix spurious timestamping callbacks
ipv4/fib: don't warn when primary address is missing if in_dev is dead
net/mlx5e: Fix minimum MTU
net/mlx5e: Device's mtu field is u16 and not int
openvswitch: use flow protocol when recalculating ipv6 checksums
atl2: Disable unimplemented scatter/gather feature
vlan: pull on __vlan_insert_tag error path and fix csum correction
net: use skb_postpush_rcsum instead of own implementations
cdc_mbim: apply "NDP to end" quirk to all Huawei devices
bpf/verifier: reject invalid LD_ABS | BPF_DW instruction
net: sched: do not requeue a NULL skb
packet: fix heap info leak in PACKET_DIAG_MCLIST sock_diag interface
route: do not cache fib route info on local routes with oif
decnet: Do not build routes to devices without decnet private data.
parisc: Use generic extable search and sort routines
arm64: kasan: Use actual memory node when populating the kernel image shadow
arm64: mm: treat memstart_addr as a signed quantity
arm64: lse: deal with clobbered IP registers after branch via PLT
arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
arm64: consistently use p?d_set_huge
arm64: fix KASLR boot-time I-cache maintenance
arm64: hugetlb: partial revert of 66b3923a1a0f
arm64: make irq_stack_ptr more robust
arm64: efi: invoke EFI_RNG_PROTOCOL to supply KASLR randomness
efi: stub: use high allocation for converted command line
efi: stub: add implementation of efi_random_alloc()
efi: stub: implement efi_get_random_bytes() based on EFI_RNG_PROTOCOL
arm64: kaslr: randomize the linear region
arm64: add support for kernel ASLR
arm64: add support for building vmlinux as a relocatable PIE binary
arm64: switch to relative exception tables
extable: add support for relative extables to search and sort routines
scripts/sortextable: add support for ET_DYN binaries
arm64: futex.h: Add missing PAN toggling
arm64: make asm/elf.h available to asm files
arm64: avoid dynamic relocations in early boot code
arm64: avoid R_AARCH64_ABS64 relocations for Image header fields
arm64: add support for module PLTs
arm64: move brk immediate argument definitions to separate header
arm64: mm: use bit ops rather than arithmetic in pa/va translations
arm64: mm: only perform memstart_addr sanity check if DEBUG_VM
arm64: User die() instead of panic() in do_page_fault()
arm64: allow kernel Image to be loaded anywhere in physical memory
arm64: defer __va translation of initrd_start and initrd_end
arm64: move kernel image to base of vmalloc area
arm64: kvm: deal with kernel symbols outside of linear mapping
arm64: decouple early fixmap init from linear mapping
arm64: pgtable: implement static [pte|pmd|pud]_offset variants
arm64: introduce KIMAGE_VADDR as the virtual base of the kernel region
arm64: add support for ioremap() block mappings
arm64: prevent potential circular header dependencies in asm/bug.h
of/fdt: factor out assignment of initrd_start/initrd_end
of/fdt: make memblock minimum physical address arch configurable
arm64: Remove the get_thread_info() function
arm64: kernel: Don't toggle PAN on systems with UAO
arm64: cpufeature: Test 'matches' pointer to find the end of the list
arm64: kernel: Add support for User Access Override
arm64: add ARMv8.2 id_aa64mmfr2 boiler plate
arm64: cpufeature: Change read_cpuid() to use sysreg's mrs_s macro
arm64: use local label prefixes for __reg_num symbols
arm64: vdso: Mark vDSO code as read-only
arm64: ubsan: select ARCH_HAS_UBSAN_SANITIZE_ALL
arm64: ptdump: Indicate whether memory should be faulting
arm64: Add support for ARCH_SUPPORTS_DEBUG_PAGEALLOC
arm64: Drop alloc function from create_mapping
arm64: prefetch: add missing #include for spin_lock_prefetch
arm64: lib: patch in prfm for copy_page if requested
arm64: lib: improve copy_page to deal with 128 bytes at a time
arm64: prefetch: add alternative pattern for CPUs without a prefetcher
arm64: prefetch: don't provide spin_lock_prefetch with LSE
arm64: allow vmalloc regions to be set with set_memory_*
arm64: kernel: implement ACPI parking protocol
arm64: mm: create new fine-grained mappings at boot
arm64: ensure _stext and _etext are page-aligned
arm64: mm: allow passing a pgdir to alloc_init_*
arm64: mm: allocate pagetables anywhere
arm64: mm: use fixmap when creating page tables
arm64: mm: add functions to walk tables in fixmap
arm64: mm: add __{pud,pgd}_populate
arm64: mm: avoid redundant __pa(__va(x))
arm64: mm: add functions to walk page tables by PA
arm64: mm: move pte_* macros
arm64: kasan: avoid TLB conflicts
arm64: mm: add code to safely replace TTBR1_EL1
arm64: add function to install the idmap
arm64: unmap idmap earlier
arm64: unify idmap removal
arm64: mm: place empty_zero_page in bss
arm64: mm: specialise pagetable allocators
asm-generic: Fix local variable shadow in __set_fixmap_offset
Eliminate the .eh_frame sections from the aarch64 vmlinux and kernel modules
arm64: Fix an enum typo in mm/dump.c
arm64: kasan: ensure that the KASAN zero page is mapped read-only
arch/arm64/include/asm/pgtable.h: add pmd_mkclean for THP
arm64: hide __efistub_ aliases from kallsyms
Linux 4.4.10
drm/i915/skl: Fix DMC load on Skylake J0 and K0
lib/test-string_helpers.c: fix and improve string_get_size() tests
ACPI / processor: Request native thermal interrupt handling via _OSC
drm/i915: Fake HDMI live status
drm/i915: Make RPS EI/thresholds multiple of 25 on SNB-BDW
drm/i915: Fix eDP low vswing for Broadwell
drm/i915/ddi: Fix eDP VDD handling during booting and suspend/resume
drm/radeon: make sure vertical front porch is at least 1
iio: ak8975: fix maybe-uninitialized warning
iio: ak8975: Fix NULL pointer exception on early interrupt
drm/amdgpu: set metadata pointer to NULL after freeing.
drm/amdgpu: make sure vertical front porch is at least 1
gpu: ipu-v3: Fix imx-ipuv3-crtc module autoloading
nvmem: mxs-ocotp: fix buffer overflow in read
USB: serial: cp210x: add Straizona Focusers device ids
USB: serial: cp210x: add ID for Link ECU
ata: ahci-platform: Add ports-implemented DT bindings.
libahci: save port map for forced port map
powerpc: Fix bad inline asm constraint in create_zero_mask()
ACPICA: Dispatcher: Update thread ID for recursive method calls
x86/sysfb_efi: Fix valid BAR address range check
ARC: Add missing io barriers to io{read,write}{16,32}be()
ARM: cpuidle: Pass on arm_cpuidle_suspend()'s return value
propogate_mnt: Handle the first propogated copy being a slave
fs/pnode.c: treat zero mnt_group_id-s as unequal
x86/tsc: Read all ratio bits from MSR_PLATFORM_INFO
MAINTAINERS: Remove asterisk from EFI directory names
writeback: Fix performance regression in wb_over_bg_thresh()
batman-adv: Reduce refcnt of removed router when updating route
batman-adv: Fix broadcast/ogm queue limit on a removed interface
batman-adv: Check skb size before using encapsulated ETH+VLAN header
batman-adv: fix DAT candidate selection (must use vid)
mm: update min_free_kbytes from khugepaged after core initialization
proc: prevent accessing /proc/<PID>/environ until it's ready
Input: zforce_ts - fix dual touch recognition
HID: Fix boot delay for Creative SB Omni Surround 5.1 with quirk
HID: wacom: Add support for DTK-1651
xen/evtchn: fix ring resize when binding new events
xen/balloon: Fix crash when ballooning on x86 32 bit PAE
xen: Fix page <-> pfn conversion on 32 bit systems
ARM: SoCFPGA: Fix secondary CPU startup in thumb2 kernel
ARM: EXYNOS: Properly skip unitialized parent clock in power domain on
mm/zswap: provide unique zpool name
mm, cma: prevent nr_isolated_* counters from going negative
Minimal fix-up of bad hashing behavior of hash_64()
MD: make bio mergeable
tracing: Don't display trigger file for events that can't be enabled
mac80211: fix statistics leak if dev_alloc_name() fails
ath9k: ar5008_hw_cmn_spur_mitigate: add missing mask_m & mask_p initialisation
lpfc: fix misleading indentation
clk: qcom: msm8960: Fix ce3_src register offset
clk: versatile: sp810: support reentrance
clk: qcom: msm8960: fix ce3_core clk enable register
clk: meson: Fix meson_clk_register_clks() signature type mismatch
clk: rockchip: free memory in error cases when registering clock branches
soc: rockchip: power-domain: fix err handle while probing
clk-divider: make sure read-only dividers do not write to their register
CNS3xxx: Fix PCI cns3xxx_write_config()
mwifiex: fix corner case association failure
ata: ahci_xgene: dereferencing uninitialized pointer in probe
nbd: ratelimit error msgs after socket close
mfd: intel-lpss: Remove clock tree on error path
ipvs: drop first packet to redirect conntrack
ipvs: correct initial offset of Call-ID header search in SIP persistence engine
ipvs: handle ip_vs_fill_iph_skb_off failure
RDMA/iw_cxgb4: Fix bar2 virt addr calculation for T4 chips
Revert: "powerpc/tm: Check for already reclaimed tasks"
arm64: head.S: use memset to clear BSS
efi: stub: define DISABLE_BRANCH_PROFILING for all architectures
arm64: entry: remove pointless SPSR mode check
arm64: mm: move pgd_cache initialisation to pgtable_cache_init
arm64: module: avoid undefined shift behavior in reloc_data()
arm64: module: fix relocation of movz instruction with negative immediate
arm64: traps: address fallout from printk -> pr_* conversion
arm64: ftrace: fix a stack tracer's output under function graph tracer
arm64: pass a task parameter to unwind_frame()
arm64: ftrace: modify a stack frame in a safe way
arm64: remove irq_count and do_softirq_own_stack()
arm64: hugetlb: add support for PTE contiguous bit
arm64: Use PoU cache instr for I/D coherency
arm64: Defer dcache flush in __cpu_copy_user_page
arm64: reduce stack use in irq_handler
arm64: Documentation: add list of software workarounds for errata
arm64: mm: place __cpu_setup in .text
arm64: cmpxchg: Don't incldue linux/mmdebug.h
arm64: mm: fold alternatives into .init
arm64: Remove redundant padding from linker script
arm64: mm: remove pointless PAGE_MASKing
arm64: don't call C code with el0's fp register
arm64: when walking onto the task stack, check sp & fp are in current->stack
arm64: Add this_cpu_ptr() assembler macro for use in entry.S
arm64: irq: fix walking from irq stack to task stack
arm64: Add do_softirq_own_stack() and enable irq_stacks
arm64: Modify stack trace and dump for use with irq_stack
arm64: Store struct thread_info in sp_el0
arm64: Add trace_hardirqs_off annotation in ret_to_user
arm64: ftrace: fix the comments for ftrace_modify_code
arm64: ftrace: stop using kstop_machine to enable/disable tracing
arm64: spinlock: serialise spin_unlock_wait against concurrent lockers
arm64: enable HAVE_IRQ_TIME_ACCOUNTING
arm64: fix COMPAT_SHMLBA definition for large pages
arm64: add __init/__initdata section marker to some functions/variables
arm64: pgtable: implement pte_accessible()
arm64: mm: allow sections for unaligned bases
arm64: mm: detect bad __create_mapping uses
Linux 4.4.9
extcon: max77843: Use correct size for reading the interrupt register
stm class: Select CONFIG_SRCU
megaraid_sas: add missing curly braces in ioctl handler
sunrpc/cache: drop reference when sunrpc_cache_pipe_upcall() detects a race
thermal: rockchip: fix a impossible condition caused by the warning
unbreak allmodconfig KCONFIG_ALLCONFIG=...
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
bus: imx-weim: Take the 'status' property value into account
ARM: dts: pxa: fix dma engine node to pxa3xx-nand
ARM: dts: armada-375: use armada-370-sata for SATA
ARM: EXYNOS: select THERMAL_OF
ARM: prima2: always enable reset controller
ARM: OMAP3: Add cpuidle parameters table for omap3430
ext4: fix races of writeback with punch hole and zero range
ext4: fix races between buffered IO and collapse / insert range
ext4: move unlocked dio protection from ext4_alloc_file_blocks()
ext4: fix races between page faults and hole punching
perf stat: Document --detailed option
perf tools: handle spaces in file names obtained from /proc/pid/maps
perf hists browser: Only offer symbol scripting when a symbol is under the cursor
mtd: nand: Drop mtd.owner requirement in nand_scan
mtd: brcmnand: Fix v7.1 register offsets
mtd: spi-nor: remove micron_quad_enable()
serial: sh-sci: Remove cpufreq notifier to fix crash/deadlock
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
x86/mm/kmmio: Fix mmiotrace for hugepages
perf evlist: Reference count the cpu and thread maps at set_maps()
drivers/misc/ad525x_dpot: AD5274 fix RDAC read back errors
rtc: max77686: Properly handle regmap_irq_get_virq() error code
rtc: rx8025: remove rv8803 id
rtc: ds1685: passing bogus values to irq_restore
rtc: vr41xx: Wire up alarm_irq_enable
rtc: hym8563: fix invalid year calculation
PM / Domains: Fix removal of a subdomain
PM / OPP: Initialize u_volt_min/max to a valid value
misc: mic/scif: fix wrap around tests
misc/bmp085: Enable building as a module
lib/mpi: Endianness fix
fbdev: da8xx-fb: fix videomodes of lcd panels
scsi_dh: force modular build if SCSI is a module
paride: make 'verbose' parameter an 'int' again
regulator: s5m8767: fix get_register() error handling
irqchip/mxs: Fix error check of of_io_request_and_map()
irqchip/sunxi-nmi: Fix error check of of_io_request_and_map()
spi/rockchip: Make sure spi clk is on in rockchip_spi_set_cs
locking/mcs: Fix mcs_spin_lock() ordering
regulator: core: Fix nested locking of supplies
regulator: core: Ensure we lock all regulators
regulator: core: fix regulator_lock_supply regression
Revert "regulator: core: Fix nested locking of supplies"
videobuf2-v4l2: Verify planes array in buffer dequeueing
videobuf2-core: Check user space planes array in dqbuf
USB: usbip: fix potential out-of-bounds write
cgroup: make sure a parent css isn't freed before its children
mm/hwpoison: fix wrong num_poisoned_pages accounting
mm: vmscan: reclaim highmem zone if buffer_heads is over limit
numa: fix /proc/<pid>/numa_maps for THP
mm/huge_memory: replace VM_NO_THP VM_BUG_ON with actual VMA check
memcg: relocate charge moving from ->attach to ->post_attach
cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback
slub: clean up code for kmem cgroup support to kmem_cache_free_bulk
workqueue: fix ghost PENDING flag while doing MQ IO
x86/apic: Handle zero vector gracefully in clear_vector_irq()
efi: Expose non-blocking set_variable() wrapper to efivars
efi: Fix out-of-bounds read in variable_matches()
IB/security: Restrict use of the write() interface
IB/mlx5: Expose correct max_sge_rd limit
cxl: Keep IRQ mappings on context teardown
v4l2-dv-timings.h: fix polarity for 4k formats
vb2-memops: Fix over allocation of frame vectors
ASoC: rt5640: Correct the digital interface data select
ASoC: dapm: Make sure we have a card when displaying component widgets
ASoC: ssm4567: Reset device before regcache_sync()
ASoC: s3c24xx: use const snd_soc_component_driver pointer
EDAC: i7core, sb_edac: Don't return NOTIFY_BAD from mce_decoder callback
toshiba_acpi: Fix regression caused by hotkey enabling value
i2c: exynos5: Fix possible ABBA deadlock by keeping I2C clock prepared
i2c: cpm: Fix build break due to incompatible pointer types
perf intel-pt: Fix segfault tracing transactions
drm/i915: Use fw_domains_put_with_fifo() on HSW
drm/i915: Fixup the free space logic in ring_prepare
drm/amdkfd: uninitialized variable in dbgdev_wave_control_set_registers()
drm/i915: skl_update_scaler() wants a rotation bitmask instead of bit number
drm/i915: Cleanup phys status page too
pwm: brcmstb: Fix check of devm_ioremap_resource() return code
drm/dp/mst: Get validated port ref in drm_dp_update_payload_part1()
drm/dp/mst: Restore primary hub guid on resume
drm/dp/mst: Validate port in drm_dp_payload_send_msg()
drm/nouveau/gr/gf100: select a stream master to fixup tfb offset queries
drm: Loongson-3 doesn't fully support wc memory
drm/radeon: fix vertical bars appear on monitor (v2)
drm/radeon: forbid mapping of userptr bo through radeon device file
drm/radeon: fix initial connector audio value
drm/radeon: add a quirk for a XFX R9 270X
drm/amdgpu: fix regression on CIK (v2)
amdgpu/uvd: add uvd fw version for amdgpu
drm/amdgpu: bump the afmt limit for CZ, ST, Polaris
drm/amdgpu: use defines for CRTCs and AMFT blocks
drm/amdgpu: when suspending, if uvd/vce was running. need to cancel delay work.
iommu/dma: Restore scatterlist offsets correctly
iommu/amd: Fix checking of pci dma aliases
pinctrl: single: Fix pcs_parse_bits_in_pinctrl_entry to use __ffs than ffs
pinctrl: mediatek: correct debounce time unit in mtk_gpio_set_debounce
xen kconfig: don't "select INPUT_XEN_KBDDEV_FRONTEND"
Input: pmic8xxx-pwrkey - fix algorithm for converting trigger delay
Input: gtco - fix crash on detecting device without endpoints
netlink: don't send NETLINK_URELEASE for unbound sockets
nl80211: check netlink protocol in socket release notification
powerpc: Update TM user feature bits in scan_features()
powerpc: Update cpu_user_features2 in scan_features()
powerpc: scan_features() updates incorrect bits for REAL_LE
crypto: talitos - fix AEAD tcrypt tests
crypto: talitos - fix crash in talitos_cra_init()
crypto: sha1-mb - use corrcet pointer while completing jobs
crypto: ccp - Prevent information leakage on export
iwlwifi: mvm: fix memory leak in paging
iwlwifi: pcie: lower the debug level for RSA semaphore access
s390/pci: add extra padding to function measurement block
cpufreq: intel_pstate: Fix processing for turbo activation ratio
Revert "drm/amdgpu: disable runtime pm on PX laptops without dGPU power control"
Revert "drm/radeon: disable runtime pm on PX laptops without dGPU power control"
drm/i915: Fix race condition in intel_dp_destroy_mst_connector()
drm/qxl: fix cursor position with non-zero hotspot
drm/nouveau/core: use vzalloc for allocating ramht
futex: Acknowledge a new waiter in counter before plist
futex: Handle unlock_pi race gracefully
asm-generic/futex: Re-enable preemption in futex_atomic_cmpxchg_inatomic()
ALSA: hda - Add dock support for ThinkPad X260
ALSA: pcxhr: Fix missing mutex unlock
ALSA: hda - add PCI ID for Intel Broxton-T
ALSA: hda - Keep powering up ADCs on Cirrus codecs
ALSA: hda/realtek - Add ALC3234 headset mode for Optiplex 9020m
ALSA: hda - Don't trust the reported actual power state
x86 EDAC, sb_edac.c: Repair damage introduced when "fixing" channel address
x86/mm/xen: Suppress hugetlbfs in PV guests
arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
arm64: Honour !PTE_WRITE in set_pte_at() for kernel mappings
sched/cgroup: Fix/cleanup cgroup teardown/init
dmaengine: pxa_dma: fix the maximum requestor line
dmaengine: hsu: correct use of channel status register
dmaengine: dw: fix master selection
debugfs: Make automount point inodes permanently empty
lib: lz4: fixed zram with lz4 on big endian machines
dm cache metadata: fix cmd_read_lock() acquiring write lock
dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
usb: gadget: f_fs: Fix use-after-free
usb: hcd: out of bounds access in for_each_companion
xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers
usb: xhci: fix wild pointers in xhci_mem_cleanup
xhci: resume USB 3 roothub first
usb: xhci: applying XHCI_PME_STUCK_QUIRK to Intel BXT B0 host
assoc_array: don't call compare_object() on a node
ARM: OMAP2+: hwmod: Fix updating of sysconfig register
ARM: OMAP2: Fix up interconnect barrier initialization for DRA7
ARM: mvebu: Correct unit address for linksys
ARM: dts: AM43x-epos: Fix clk parent for synctimer
KVM: arm/arm64: Handle forward time correction gracefully
kvm: x86: do not leak guest xcr0 into host interrupt handlers
x86/mce: Avoid using object after free in genpool
block: loop: fix filesystem corruption in case of aio/dio
block: partition: initialize percpuref before sending out KOBJ_ADD
Conflicts:
arch/arm64/Kconfig
arch/arm64/include/asm/cputype.h
arch/arm64/include/asm/hardirq.h
arch/arm64/include/asm/irq.h
arch/arm64/kernel/cpu_errata.c
arch/arm64/kernel/cpuinfo.c
arch/arm64/kernel/setup.c
arch/arm64/kernel/smp.c
arch/arm64/kernel/stacktrace.c
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/pageattr.c
mm/memcontrol.c
CRs-Fixed: 1054234
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Change-Id: I2a7a34631ffee36ce18b9171f16d023be777392f
* tmp-917a9:
ARM/vdso: Mark the vDSO code read-only after init
x86/vdso: Mark the vDSO code read-only after init
lkdtm: Verify that '__ro_after_init' works correctly
arch: Introduce post-init read-only memory
x86/mm: Always enable CONFIG_DEBUG_RODATA and remove the Kconfig option
mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel mappings
asm-generic: Consolidate mark_rodata_ro()
Linux 4.4.6
ld-version: Fix awk regex compile failure
target: Drop incorrect ABORT_TASK put for completed commands
block: don't optimize for non-cloned bio in bio_get_last_bvec()
MIPS: smp.c: Fix uninitialised temp_foreign_map
MIPS: Fix build error when SMP is used without GIC
ovl: fix getcwd() failure after unsuccessful rmdir
ovl: copy new uid/gid into overlayfs runtime inode
userfaultfd: don't block on the last VM updates at exit time
powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic
powerpc: Fix dedotify for binutils >= 2.26
Revert "drm/radeon/pm: adjust display configuration after powerstate"
drm/radeon: Fix error handling in radeon_flip_work_func.
drm/amdgpu: Fix error handling in amdgpu_flip_work_func.
Revert "drm/radeon: call hpd_irq_event on resume"
x86/mm: Fix slow_virt_to_phys() for X86_PAE again
gpu: ipu-v3: Do not bail out on missing optional port nodes
mac80211: Fix Public Action frame RX in AP mode
mac80211: check PN correctly for GCMP-encrypted fragmented MPDUs
mac80211: minstrel_ht: fix a logic error in RTS/CTS handling
mac80211: minstrel_ht: set default tx aggregation timeout to 0
mac80211: fix use of uninitialised values in RX aggregation
mac80211: minstrel: Change expected throughput unit back to Kbps
iwlwifi: mvm: inc pending frames counter also when txing non-sta
can: gs_usb: fixed disconnect bug by removing erroneous use of kfree()
cfg80211/wext: fix message ordering
wext: fix message delay/ordering
ovl: fix working on distributed fs as lower layer
ovl: ignore lower entries when checking purity of non-directory entries
ASoC: wm8958: Fix enum ctl accesses in a wrong type
ASoC: wm8994: Fix enum ctl accesses in a wrong type
ASoC: samsung: Use IRQ safe spin lock calls
ASoC: dapm: Fix ctl value accesses in a wrong type
ncpfs: fix a braino in OOM handling in ncp_fill_cache()
jffs2: reduce the breakage on recovery from halfway failed rename()
dmaengine: at_xdmac: fix residue computation
tracing: Fix check for cpu online when event is disabled
s390/dasd: fix diag 0x250 inline assembly
s390/mm: four page table levels vs. fork
KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0
KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo
KVM: PPC: Book3S HV: Sanitize special-purpose register values on guest exit
KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS
KVM: VMX: disable PEBS before a guest entry
kvm: cap halt polling at exactly halt_poll_ns
PCI: Allow a NULL "parent" pointer in pci_bus_assign_domain_nr()
ARM: OMAP2+: hwmod: Introduce ti,no-idle dt property
ARM: dts: dra7: do not gate cpsw clock due to errata i877
ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window
arm64: account for sparsemem section alignment when choosing vmemmap offset
Linux 4.4.5
drm/amdgpu: fix topaz/tonga gmc assignment in 4.4 stable
modules: fix longstanding /proc/kallsyms vs module insertion race.
drm/i915: refine qemu south bridge detection
drm/i915: more virtual south bridge detection
block: get the 1st and last bvec via helpers
block: check virt boundary in bio_will_gap()
drm/amdgpu: Use drm_calloc_large for VM page_tables array
thermal: cpu_cooling: fix out of bounds access in time_in_idle
i2c: brcmstb: allocate correct amount of memory for regmap
ubi: Fix out of bounds write in volume update code
cxl: Fix PSL timebase synchronization detection
MIPS: traps: Fix SIGFPE information leak from `do_ov' and `do_trap_or_bp'
MIPS: scache: Fix scache init with invalid line size.
USB: serial: option: add support for Quectel UC20
USB: serial: option: add support for Telit LE922 PID 0x1045
USB: qcserial: add Sierra Wireless EM74xx device ID
USB: qcserial: add Dell Wireless 5809e Gobi 4G HSPA+ (rev3)
USB: cp210x: Add ID for Parrot NMEA GPS Flight Recorder
usb: chipidea: otg: change workqueue ci_otg as freezable
ALSA: timer: Fix broken compat timer user status ioctl
ALSA: hdspm: Fix zero-division
ALSA: hdsp: Fix wrong boolean ctl value accesses
ALSA: hdspm: Fix wrong boolean ctl value accesses
ALSA: seq: oss: Don't drain at closing a client
ALSA: pcm: Fix ioctls for X32 ABI
ALSA: timer: Fix ioctls for X32 ABI
ALSA: rawmidi: Fix ioctls X32 ABI
ALSA: hda - Fix mic issues on Acer Aspire E1-472
ALSA: ctl: Fix ioctls for X32 ABI
ALSA: usb-audio: Add a quirk for Plantronics DA45
adv7604: fix tx 5v detect regression
dmaengine: pxa_dma: fix cyclic transfers
Fix directory hardlinks from deleted directories
jffs2: Fix page lock / f->sem deadlock
Revert "jffs2: Fix lock acquisition order bug in jffs2_write_begin"
Btrfs: fix loading of orphan roots leading to BUG_ON
pata-rb532-cf: get rid of the irq_to_gpio() call
tracing: Do not have 'comm' filter override event 'comm' field
ata: ahci: don't mark HotPlugCapable Ports as external/removable
PM / sleep / x86: Fix crash on graph trace through x86 suspend
arm64: vmemmap: use virtual projection of linear region
Adding Intel Lewisburg device IDs for SATA
writeback: flush inode cgroup wb switches instead of pinning super_block
block: bio: introduce helpers to get the 1st and last bvec
libata: Align ata_device's id on a cacheline
libata: fix HDIO_GET_32BIT ioctl
drm/amdgpu: return from atombios_dp_get_dpcd only when error
drm/amdgpu/gfx8: specify which engine to wait before vm flush
drm/amdgpu: apply gfx_v8 fixes to gfx_v7 as well
drm/amdgpu/pm: update current crtc info after setting the powerstate
drm/radeon/pm: update current crtc info after setting the powerstate
drm/ast: Fix incorrect register check for DRAM width
target: Fix WRITE_SAME/DISCARD conversion to linux 512b sectors
iommu/vt-d: Use BUS_NOTIFY_REMOVED_DEVICE in hotplug path
iommu/amd: Fix boot warning when device 00:00.0 is not iommu covered
iommu/amd: Apply workaround for ATS write permission check
arm/arm64: KVM: Fix ioctl error handling
KVM: x86: fix root cause for missed hardware breakpoints
vfio: fix ioctl error handling
Fix cifs_uniqueid_to_ino_t() function for s390x
CIFS: Fix SMB2+ interim response processing for read requests
cifs: fix out-of-bounds access in lease parsing
fbcon: set a default value to blink interval
kvm: x86: Update tsc multiplier on change.
mips/kvm: fix ioctl error handling
parisc: Fix ptrace syscall number and return value modification
PCI: keystone: Fix MSI code that retrieves struct pcie_port pointer
block: Initialize max_dev_sectors to 0
drm/amdgpu: mask out WC from BO on unsupported arches
btrfs: async-thread: Fix a use-after-free error for trace
btrfs: Fix no_space in write and rm loop
Btrfs: fix deadlock running delayed iputs at transaction commit time
drivers: sh: Restore legacy clock domain on SuperH platforms
use ->d_seq to get coherency between ->d_inode and ->d_flags
Linux 4.4.4
iwlwifi: mvm: don't allow sched scans without matches to be started
iwlwifi: update and fix 7265 series PCI IDs
iwlwifi: pcie: properly configure the debug buffer size for 8000
iwlwifi: dvm: fix WoWLAN
security: let security modules use PTRACE_MODE_* with bitmasks
IB/cma: Fix RDMA port validation for iWarp
x86/irq: Plug vector cleanup race
x86/irq: Call irq_force_move_complete with irq descriptor
x86/irq: Remove outgoing CPU from vector cleanup mask
x86/irq: Remove the cpumask allocation from send_cleanup_vector()
x86/irq: Clear move_in_progress before sending cleanup IPI
x86/irq: Remove offline cpus from vector cleanup
x86/irq: Get rid of code duplication
x86/irq: Copy vectormask instead of an AND operation
x86/irq: Check vector allocation early
x86/irq: Reorganize the search in assign_irq_vector
x86/irq: Reorganize the return path in assign_irq_vector
x86/irq: Do not use apic_chip_data.old_domain as temporary buffer
x86/irq: Validate that irq descriptor is still active
x86/irq: Fix a race in x86_vector_free_irqs()
x86/irq: Call chip->irq_set_affinity in proper context
x86/entry/compat: Add missing CLAC to entry_INT80_32
x86/mpx: Fix off-by-one comparison with nr_registers
hpfs: don't truncate the file when delete fails
do_last(): ELOOP failure exit should be done after leaving RCU mode
should_follow_link(): validate ->d_seq after having decided to follow
xen/pcifront: Fix mysterious crashes when NUMA locality information was extracted.
xen/pciback: Save the number of MSI-X entries to be copied later.
xen/pciback: Check PF instead of VF for PCI_COMMAND_MEMORY
xen/scsiback: correct frontend counting
xen/arm: correctly handle DMA mapping of compound pages
ARM: at91/dt: fix typo in sama5d2 pinmux descriptions
ARM: OMAP2+: Fix onenand initialization to avoid filesystem corruption
do_last(): don't let a bogus return value from ->open() et.al. to confuse us
kernel/resource.c: fix muxed resource handling in __request_region()
sunrpc/cache: fix off-by-one in qword_get()
tracing: Fix showing function event in available_events
powerpc/eeh: Fix partial hotplug criterion
KVM: x86: MMU: fix ubsan index-out-of-range warning
KVM: x86: fix conversion of addresses to linear in 32-bit protected mode
KVM: x86: fix missed hardware breakpoints
KVM: arm/arm64: vgic: Ensure bitmaps are long enough
KVM: async_pf: do not warn on page allocation failures
of/irq: Fix msi-map calculation for nonzero rid-base
NFSv4: Fix a dentry leak on alias use
nfs: fix nfs_size_to_loff_t
block: fix use-after-free in dio_bio_complete
bio: return EINTR if copying to user space got interrupted
i2c: i801: Adding Intel Lewisburg support for iTCO
phy: core: fix wrong err handle for phy_power_on
writeback: keep superblock pinned during cgroup writeback association switches
cgroup: make sure a parent css isn't offlined before its children
cpuset: make mm migration asynchronous
PCI/AER: Flush workqueue on device remove to avoid use-after-free
ARCv2: SMP: Emulate IPI to self using software triggered interrupt
ARCv2: STAR 9000950267: Handle return from intr to Delay Slot #2
libata: fix sff host state machine locking while polling
qla2xxx: Fix stale pointer access.
spi: atmel: fix gpio chip-select in case of non-DT platform
target: Fix race with SCF_SEND_DELAYED_TAS handling
target: Fix remote-port TMR ABORT + se_cmd fabric stop
target: Fix TAS handling for multi-session se_node_acls
target: Fix LUN_RESET active TMR descriptor handling
target: Fix LUN_RESET active I/O handling for ACK_KREF
ALSA: hda - Fixing background noise on Dell Inspiron 3162
ALSA: hda - Apply clock gate workaround to Skylake, too
Revert "workqueue: make sure delayed work run in local cpu"
workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
mac80211: Requeue work after scan complete for all VIF types.
rfkill: fix rfkill_fop_read wait_event usage
tick/nohz: Set the correct expiry when switching to nohz/lowres mode
perf stat: Do not clean event's private stats
cdc-acm:exclude Samsung phone 04e8:685d
Revert "Staging: panel: usleep_range is preferred over udelay"
Staging: speakup: Fix getting port information
sd: Optimal I/O size is in bytes, not sectors
libceph: don't spam dmesg with stray reply warnings
libceph: use the right footer size when skipping a message
libceph: don't bail early from try_read() when skipping a message
libceph: fix ceph_msg_revoke()
seccomp: always propagate NO_NEW_PRIVS on tsync
cpufreq: Fix NULL reference crash while accessing policy->governor_data
cpufreq: pxa2xx: fix pxa_cpufreq_change_voltage prototype
hwmon: (ads1015) Handle negative conversion values correctly
hwmon: (gpio-fan) Remove un-necessary speed_index lookup for thermal hook
hwmon: (dell-smm) Blacklist Dell Studio XPS 8000
Thermal: do thermal zone update after a cooling device registered
Thermal: handle thermal zone device properly during system sleep
Thermal: initialize thermal zone device correctly
IB/mlx5: Expose correct maximum number of CQE capacity
IB/qib: Support creating qps with GFP_NOIO flag
IB/qib: fix mcast detach when qp not attached
IB/cm: Fix a recently introduced deadlock
dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer
dmaengine: at_xdmac: fix resume for cyclic transfers
dmaengine: dw: fix cyclic transfer callbacks
dmaengine: dw: fix cyclic transfer setup
nfit: fix multi-interface dimm handling, acpi6.1 compatibility
ACPI / PCI / hotplug: unlock in error path in acpiphp_enable_slot()
ACPI: Revert "ACPI / video: Add Dell Inspiron 5737 to the blacklist"
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Satellite R830
ACPI / video: Add disable_backlight_sysfs_if quirk for the Toshiba Portege R700
lib: sw842: select crc32
uapi: update install list after nvme.h rename
ideapad-laptop: Add Lenovo Yoga 700 to no_hw_rfkill dmi list
ideapad-laptop: Add Lenovo ideapad Y700-17ISK to no_hw_rfkill dmi list
toshiba_acpi: Fix blank screen at boot if transflective backlight is supported
make sure that freeing shmem fast symlinks is RCU-delayed
drm/radeon/pm: adjust display configuration after powerstate
drm/radeon: Don't hang in radeon_flip_work_func on disabled crtc. (v2)
drm: Fix treatment of drm_vblank_offdelay in drm_vblank_on() (v2)
drm: Fix drm_vblank_pre/post_modeset regression from Linux 4.4
drm: Prevent vblank counter bumps > 1 with active vblank clients. (v2)
drm: No-Op redundant calls to drm_vblank_off() (v2)
drm/radeon: use post-decrement in error handling
drm/qxl: use kmalloc_array to alloc reloc_info in qxl_process_single_command
drm/i915: fix error path in intel_setup_gmbus()
drm/i915/dsi: don't pass arbitrary data to sideband
drm/i915/dsi: defend gpio table against out of bounds access
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
drm/i915: Don't reject primary plane windowing with color keying enabled on SKL+
drm/i915/dp: fall back to 18 bpp when sink capability is unknown
drm/i915: Make sure DC writes are coherent on flush.
drm/i915: Init power domains early in driver load
drm/i915: intel_hpd_init(): Fix suspend/resume reprobing
drm/i915: Restore inhibiting the load of the default context
drm: fix missing reference counting decrease
drm/radeon: hold reference to fences in radeon_sa_bo_new
drm/radeon: mask out WC from BO on unsupported arches
drm: add helper to check for wc memory support
drm/radeon: fix DP audio support for APU with DCE4.1 display engine
drm/radeon: Add a common function for DFS handling
drm/radeon: cleaned up VCO output settings for DP audio
drm/radeon: properly byte swap vce firmware setup
drm/radeon: clean up fujitsu quirks
drm/radeon: Fix "slow" audio over DP on DCE8+
drm/radeon: call hpd_irq_event on resume
drm/radeon: Fix off-by-one errors in radeon_vm_bo_set_addr
drm/dp/mst: deallocate payload on port destruction
drm/dp/mst: Reverse order of MST enable and clearing VC payload table.
drm/dp/mst: move GUID storage from mgr, port to only mst branch
drm/dp/mst: Calculate MST PBN with 31.32 fixed point
drm: Add drm_fixp_from_fraction and drm_fixp2int_ceil
drm/dp/mst: fix in RAD element access
drm/dp/mst: fix in MSTB RAD initialization
drm/dp/mst: always send reply for UP request
drm/dp/mst: process broadcast messages correctly
drm/nouveau: platform: Fix deferred probe
drm/nouveau/disp/dp: ensure sink is powered up before attempting link training
drm/nouveau/display: Enable vblank irqs after display engine is on again.
drm/nouveau/kms: take mode_config mutex in connector hotplug path
drm/amdgpu/pm: adjust display configuration after powerstate
drm/amdgpu: Don't hang in amdgpu_flip_work_func on disabled crtc.
drm/amdgpu: use post-decrement in error handling
drm/amdgpu: fix issue with overlapping userptrs
drm/amdgpu: hold reference to fences in amdgpu_sa_bo_new (v2)
drm/amdgpu: remove unnecessary forward declaration
drm/amdgpu: fix s4 resume
drm/amdgpu: remove exp hardware support from iceland
drm/amdgpu: don't load MEC2 on topaz
drm/amdgpu: drop topaz support from gmc8 module
drm/amdgpu: pull topaz gmc bits into gmc_v7
drm/amdgpu: The VI specific EXE bit should only apply to GMC v8.0 above
drm/amdgpu: iceland use CI based MC IP
drm/amdgpu: move gmc7 support out of CIK dependency
drm/amdgpu: no need to load MC firmware on fiji
drm/amdgpu: fix amdgpu_bo_pin_restricted VRAM placing v2
drm/amdgpu: fix tonga smu resume
drm/amdgpu: fix lost sync_to if scheduler is enabled.
drm/amdgpu: call hpd_irq_event on resume
drm/amdgpu: Fix off-by-one errors in amdgpu_vm_bo_map
drm/vmwgfx: respect 'nomodeset'
drm/vmwgfx: Fix a width / pitch mismatch on framebuffer updates
drm/vmwgfx: Fix an incorrect lock check
virtio_pci: fix use after free on release
virtio_balloon: fix race between migration and ballooning
virtio_balloon: fix race by fill and leak
regulator: mt6311: MT6311_REGULATOR needs to select REGMAP_I2C
regulator: axp20x: Fix GPIO LDO enable value for AXP22x
clk: exynos: use irqsave version of spin_lock to avoid deadlock with irqs
cxl: use correct operator when writing pcie config space values
sparc64: fix incorrect sign extension in sys_sparc64_personality
EDAC, mc_sysfs: Fix freeing bus' name
EDAC: Robustify workqueues destruction
MIPS: Fix buffer overflow in syscall_get_arguments()
MIPS: Fix some missing CONFIG_CPU_MIPSR6 #ifdefs
MIPS: hpet: Choose a safe value for the ETIME check
MIPS: Loongson-3: Fix SMP_ASK_C0COUNT IPI handler
Revert "MIPS: Fix PAGE_MASK definition"
cputime: Prevent 32bit overflow in time[val|spec]_to_cputime()
time: Avoid signed overflow in timekeeping_get_ns()
Bluetooth: 6lowpan: Fix handling of uncompressed IPv6 packets
Bluetooth: 6lowpan: Fix kernel NULL pointer dereferences
Bluetooth: Fix incorrect removing of IRKs
Bluetooth: Add support of Toshiba Broadcom based devices
Bluetooth: Use continuous scanning when creating LE connections
Drivers: hv: vmbus: Fix a Host signaling bug
tools: hv: vss: fix the write()'s argument: error -> vss_msg
mmc: sdhci: Allow override of get_cd() called from sdhci_request()
mmc: sdhci: Allow override of mmc host operations
mmc: sdhci-pci: Fix card detect race for Intel BXT/APL
mmc: pxamci: fix again read-only gpio detection polarity
mmc: sdhci-acpi: Fix card detect race for Intel BXT/APL
mmc: mmci: fix an ages old detection error
mmc: core: Enable tuning according to the actual timing
mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
mmc: mmc: Fix incorrect use of driver strength switching HS200 and HS400
mmc: sdio: Fix invalid vdd in voltage switch power cycle
mmc: sdhci: Fix DMA descriptor with zero data length
mmc: sdhci-pci: Do not default to 33 Ohm driver strength for Intel SPT
mmc: usdhi6rol0: handle NULL data in timeout
clockevents/tcb_clksrc: Prevent disabling an already disabled clock
posix-clock: Fix return code on the poll method's error path
irqchip/gic-v3-its: Fix double ICC_EOIR write for LPI in EOImode==1
irqchip/atmel-aic: Fix wrong bit operation for IRQ priority
irqchip/mxs: Add missing set_handle_irq()
irqchip/omap-intc: Add support for spurious irq handling
coresight: checking for NULL string in coresight_name_match()
dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths
dm snapshot: fix hung bios when copy error occurs
dm space map metadata: remove unused variable in brb_pop()
tda1004x: only update the frontend properties if locked
vb2: fix a regression in poll() behavior for output,streams
gspca: ov534/topro: prevent a division by 0
si2157: return -EINVAL if firmware blob is too big
media: dvb-core: Don't force CAN_INVERSION_AUTO in oneshot mode
rc: sunxi-cir: Initialize the spinlock properly
namei: ->d_inode of a pinned dentry is stable only for positives
mei: validate request value in client notify request ioctl
mei: fix fasync return value on error
rtlwifi: rtl8723be: Fix module parameter initialization
rtlwifi: rtl8188ee: Fix module parameter initialization
rtlwifi: rtl8192se: Fix module parameter initialization
rtlwifi: rtl8723ae: Fix initialization of module parameters
rtlwifi: rtl8192de: Fix incorrect module parameter descriptions
rtlwifi: rtl8192ce: Fix handling of module parameters
rtlwifi: rtl8192cu: Add missing parameter setup
rtlwifi: rtl_pci: Fix kernel panic
locks: fix unlock when fcntl_setlk races with a close
um: link with -lpthread
uml: fix hostfs mknod()
uml: flush stdout before forking
s390/fpu: signals vs. floating point control register
s390/compat: correct restore of high gprs on signal return
s390/dasd: fix performance drop
s390/dasd: fix refcount for PAV reassignment
s390/dasd: prevent incorrect length error under z/VM after PAV changes
s390: fix normalization bug in exception table sorting
btrfs: initialize the seq counter in struct btrfs_device
Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots
Btrfs: fix transaction handle leak on failure to create hard link
Btrfs: fix number of transaction units required to create symlink
Btrfs: send, don't BUG_ON() when an empty symlink is found
btrfs: statfs: report zero available if metadata are exhausted
Btrfs: igrab inode in writepage
Btrfs: add missing brelse when superblock checksum fails
KVM: s390: fix memory overwrites when vx is disabled
s390/kvm: remove dependency on struct save_area definition
clocksource/drivers/vt8500: Increase the minimum delta
genirq: Validate action before dereferencing it in handle_irq_event_percpu()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED
ocfs2: unlock inode if deleting inode from orphan fails
drm/i915: shut up gen8+ SDE irq dmesg noise
iw_cxgb3: Fix incorrectly returning error on success
spi: omap2-mcspi: Prevent duplicate gpio_request
drivers: android: correct the size of struct binder_uintptr_t for BC_DEAD_BINDER_DONE
USB: option: add "4G LTE usb-modem U901"
USB: option: add support for SIM7100E
USB: cp210x: add IDs for GE B650V3 and B850V3 boards
usb: dwc3: Fix assignment of EP transfer resources
can: ems_usb: Fix possible tx overflow
dm thin: fix race condition when destroying thin pool workqueue
bcache: Change refill_dirty() to always scan entire disk if necessary
bcache: prevent crash on changing writeback_running
bcache: allows use of register in udev to avoid "device_busy" error.
bcache: unregister reboot notifier if bcache fails to unregister device
bcache: fix a leak in bch_cached_dev_run()
bcache: clear BCACHE_DEV_UNLINK_DONE flag when attaching a backing device
bcache: Add a cond_resched() call to gc
bcache: fix a livelock when we cause a huge number of cache misses
lib/ucs2_string: Correct ucs2 -> utf8 conversion
efi: Add pstore variables to the deletion whitelist
efi: Make efivarfs entries immutable by default
efi: Make our variable validation list include the guid
efi: Do variable name validation tests in utf8
efi: Use ucs2_as_utf8 in efivarfs instead of open coding a bad version
lib/ucs2_string: Add ucs2 -> utf8 helper functions
ARM: 8457/1: psci-smp is built only for SMP
drm/gma500: Use correct unref in the gem bo create function
devm_memremap: Fix error value when memremap failed
KVM: s390: fix guest fprs memory leak
arm64: errata: Add -mpc-relative-literal-loads to build flags
ARM: debug-ll: fix BCM63xx entry for multiplatform
ext4: fix bh->b_state corruption
sctp: Fix port hash table size computation
unix_diag: fix incorrect sign extension in unix_lookup_by_ino
tipc: unlock in error path
rtnl: RTM_GETNETCONF: fix wrong return value
IFF_NO_QUEUE: Fix for drivers not calling ether_setup()
tcp/dccp: fix another race at listener dismantle
route: check and remove route cache when we get route
net_sched fix: reclassification needs to consider ether protocol changes
pppoe: fix reference counting in PPPoE proxy
l2tp: Fix error creating L2TP tunnels
net/mlx4_en: Avoid changing dev->features directly in run-time
net/mlx4_en: Choose time-stamping shift value according to HW frequency
net/mlx4_en: Count HW buffer overrun only once
qmi_wwan: add "4G LTE usb-modem U901"
tcp: md5: release request socket instead of listener
tipc: fix premature addition of node to lookup table
af_unix: Guard against other == sk in unix_dgram_sendmsg
af_unix: Don't set err in unix_stream_read_generic unless there was an error
ipv4: fix memory leaks in ip_cmsg_send() callers
bonding: Fix ARP monitor validation
bpf: fix branch offset adjustment on backjumps after patching ctx expansion
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
net: Copy inner L3 and L4 headers as unaligned on GRE TEB
sctp: translate network order to host order when users get a hmacid
enic: increment devcmd2 result ring in case of timeout
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
net:Add sysctl_max_skb_frags
tcp: do not drop syn_recv on all icmp reports
unix: correctly track in-flight fds in sending process user_struct
ipv6: fix a lockdep splat
ipv6: addrconf: Fix recursive spin lock call
ipv6/udp: use sticky pktinfo egress ifindex on connect()
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
tcp: beware of alignments in tcp_get_info()
switchdev: Require RTNL mutex to be held when sending FDB notifications
inet: frag: Always orphan skbs inside ip_defrag()
tipc: fix connection abort during subscription cancel
net: dsa: fix mv88e6xxx switches
sctp: allow setting SCTP_SACK_IMMEDIATELY by the application
pptp: fix illegal memory access caused by multiple bind()s
af_unix: fix struct pid memory leak
tcp: fix NULL deref in tcp_v4_send_ack()
lwt: fix rx checksum setting for lwt devices tunneling over ipv6
tunnels: Allow IPv6 UDP checksums to be correctly controlled.
net: dp83640: Fix tx timestamp overflow handling.
gro: Make GRO aware of lightweight tunnels.
af_iucv: Validate socket address length in iucv_sock_bind()
Conflicts:
arch/arm64/Makefile
arch/arm64/include/asm/cacheflush.h
drivers/mmc/host/sdhci.c
drivers/usb/dwc3/ep0.c
drivers/usb/dwc3/gadget.c
kernel/module.c
sound/core/pcm_compat.c
CRs-Fixed: 1010239
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: I41a28636fc9ad91f9d979b191784609476294cdf
commit 28093f9f34cedeaea0f481c58446d9dac6dd620f upstream.
In gather_pte_stats() a THP pmd is cast into a pte, which is wrong
because the layouts may differ depending on the architecture. On s390
this will lead to inaccurate numa_maps accounting in /proc because of
misguided pte_present() and pte_dirty() checks on the fake pte.
On other architectures pte_present() and pte_dirty() may work by chance,
but there may be an issue with direct-access (dax) mappings w/o
underlying struct pages when HAVE_PTE_SPECIAL is set and THP is
available. In vm_normal_page() the fake pte will be checked with
pte_special() and because there is no "special" bit in a pmd, this will
always return false and the VM_PFNMAP | VM_MIXEDMAP checking will be
skipped. On dax mappings w/o struct pages, an invalid struct page
pointer would then be returned that can crash the kernel.
This patch fixes the numa_maps THP handling by introducing new "_pmd"
variants of the can_gather_numa_stats() and vm_normal_page() functions.
Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
mapping multiple pages on a fault result in page_check_references
hitting more number of referenced inactive pages and this results
in increased pressure on reclaim. Reduce it to the lowest possible
value. Reduced kswapd wakeups are observed with this change.
Change-Id: I03c6cac9f28fa328abab7b40f5f01144084a147c
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
There are couple of issues with swapcache usage when ZRAM is used
as swap device.
1) Kernel does a swap readahead which can be around 6 to 8 pages
depending on total ram, which is not required for zram since
accesses are fast.
2) Kernel delays the freeing up of swapcache expecting a later hit,
which again is useless in the case of zram.
3) This is not related to swapcache, but zram usage itself.
As mentioned in (2) kernel delays freeing of swapcache, but along with
that it delays zram compressed page free also. i.e. there can be 2 copies,
though one is compressed.
This patch addresses these issues using two new flags
QUEUE_FLAG_FAST and SWP_FAST, to indicate that accesses to the device
will be fast and cheap, and instructs the swap layer to free up
swap space agressively, and not to do read ahead.
Change-Id: I5d2d5176a5f9420300bb2f843f6ecbdb25ea80e4
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
commit ad33bb04b2a6cee6c1f99fabb15cddbf93ff0433 upstream.
pmd_trans_unstable()/pmd_none_or_trans_huge_or_clear_bad() were
introduced to locklessy (but atomically) detect when a pmd is a regular
(stable) pmd or when the pmd is unstable and can infinitely transition
from pmd_none() and pmd_trans_huge() from under us, while only holding
the mmap_sem for reading (for writing not).
While holding the mmap_sem only for reading, MADV_DONTNEED can run from
under us and so before we can assume the pmd to be a regular stable pmd
we need to compare it against pmd_none() and pmd_trans_huge() in an
atomic way, with pmd_trans_unstable(). The old pmd_trans_huge() left a
tiny window for a race.
Useful applications are unlikely to notice the difference as doing
MADV_DONTNEED concurrently with a page fault would lead to undefined
behavior.
[akpm@linux-foundation.org: tidy up comment grammar/layout]
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
DAX handling of COW faults has wrong locking sequence:
dax_fault does i_mmap_lock_read
do_cow_fault does i_mmap_unlock_write
Ross's commit[1] missed a fix[2] that Kirill added to Matthew's
commit[3].
Original COW locking logic was introduced by Matthew here[4].
This should be applied to v4.3 as well.
[1] 0f90cc6609 mm, dax: fix DAX deadlocks
[2] 52a2b53ffd mm, dax: use i_mmap_unlock_write() in do_cow_fault()
[3] 843172978b dax: fix race between simultaneous faults
[4] 2e4cdab058 mm: allow page fault handlers to perform the COW
Cc: <stable@vger.kernel.org>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
The following two locking commits in the DAX code:
commit 843172978b ("dax: fix race between simultaneous faults")
commit 46c043ede4 ("mm: take i_mmap_lock in unmap_mapping_range() for DAX")
introduced a number of deadlocks and other issues which need to be fixed
for the v4.3 kernel. The list of issues in DAX after these commits
(some newly introduced by the commits, some preexisting) can be found
here:
https://lkml.org/lkml/2015/9/25/602 (Subject: "Re: [PATCH] dax: fix deadlock in __dax_fault").
This undoes most of the changes introduced by those two commits,
essentially returning us to the DAX locking scheme that was used in
v4.2.
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.williams@intel.com>
Tested-by: Dave Chinner <dchinner@redhat.com>
Cc: Jan Kara <jack@suse.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Let's use helper rather than direct check of vma->vm_ops to distinguish
anonymous VMA.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
__dax_fault() takes i_mmap_lock for write. Let's pair it with write
unlock on do_cow_fault() side.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.
__dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.
Re-aquiring the lock should be fine since we check i_size after the
point.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If two threads write-fault on the same hole at the same time, the winner
of the race will return to userspace and complete their store, only to
have the loser overwrite their store with zeroes. Fix this for now by
taking the i_mmap_sem for write instead of read, and do so outside the
call to get_block(). Now the loser of the race will see the block has
already been zeroed, and will not zero it again.
This severely limits our scalability. I have ideas for improving it, but
those can wait for a later patch.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Allow non-anonymous VMAs to provide huge pages in response to a page fault.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
special_mapping_fault() is absolutely broken. It seems it was always
wrong, but this didn't matter until vdso/vvar started to use more than
one page.
And after this change vma_is_anonymous() becomes really trivial, it
simply checks vm_ops == NULL. However, I do think the helper makes
sense. There are a lot of ->vm_ops != NULL checks, the helper makes the
caller's code more understandable (self-documented) and this is more
grep-friendly.
This patch (of 3):
Preparation. Add the new simple helper, vma_is_anonymous(vma), and change
handle_pte_fault() to use it. It will have more users.
The name is not accurate, say a hpet_mmap()'ed vma is not anonymous.
Perhaps it should be named vma_has_fault() instead. But it matches the
logic in mmap.c/memory.c (see next changes). "True" just means that a
page fault will use do_anonymous_page().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This makes the tlb_next_batch() bool due to this particular function only
ever returning either one or zero as its return value.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is where the page faults must be modified to call
handle_userfault() if userfaultfd_missing() is true (so if the
vma->vm_flags had VM_UFFD_MISSING set).
handle_userfault() then takes care of blocking the page fault and
delivering it to userland.
The fault flags must also be passed as parameter so the "read|write"
kind of fault can be passed to userland.
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Reading page fault handler code I've noticed that under right
circumstances kernel would map anonymous pages into file mappings: if
the VMA doesn't have vm_ops->fault() and the VMA wasn't fully populated
on ->mmap(), kernel would handle page fault to not populated pte with
do_anonymous_page().
Let's change page fault handler to use do_anonymous_page() only on
anonymous VMA (->vm_ops == NULL) and make sure that the VMA is not
shared.
For file mappings without vm_ops->fault() or shred VMA without vm_ops,
page fault on pte_none() entry would lead to SIGBUS.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"
[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
Historically memcg overhead was high even if memcg was unused. This has
improved a lot but it still showed up in a profile summary as being a
problem.
/usr/src/linux-4.0-vanilla/mm/memcontrol.c 6.6441 395842
mem_cgroup_try_charge 2.950% 175781
__mem_cgroup_count_vm_event 1.431% 85239
mem_cgroup_page_lruvec 0.456% 27156
mem_cgroup_commit_charge 0.392% 23342
uncharge_list 0.323% 19256
mem_cgroup_update_lru_size 0.278% 16538
memcg_check_events 0.216% 12858
mem_cgroup_charge_statistics.isra.22 0.188% 11172
try_charge 0.150% 8928
commit_charge 0.141% 8388
get_mem_cgroup_from_mm 0.121% 7184
That is showing that 6.64% of system CPU cycles were in memcontrol.c and
dominated by mem_cgroup_try_charge. The annotation shows that the bulk
of the cost was checking PageSwapCache which is expected to be cache hot
but is very expensive. The problem appears to be that __SetPageUptodate
is called just before the check which is a write barrier. It is
required to make sure struct page and page data is written before the
PTE is updated and the data visible to userspace. memcg charging does
not require or need the barrier but gets unfairly hit with the cost so
this patch attempts the charging before the barrier. Aside from the
accidental cost to memcg there is the added benefit that the barrier is
avoided if the page cannot be charged. When applied the relevant
profile summary is as follows.
/usr/src/linux-4.0-chargefirst-v2r1/mm/memcontrol.c 3.7907 223277
__mem_cgroup_count_vm_event 1.143% 67312
mem_cgroup_page_lruvec 0.465% 27403
mem_cgroup_commit_charge 0.381% 22452
uncharge_list 0.332% 19543
mem_cgroup_update_lru_size 0.284% 16704
get_mem_cgroup_from_mm 0.271% 15952
mem_cgroup_try_charge 0.237% 13982
memcg_check_events 0.222% 13058
mem_cgroup_charge_statistics.isra.22 0.185% 10920
commit_charge 0.140% 8235
try_charge 0.131% 7716
That brings the overhead down to 3.79% and leaves the memcg fault
accounting to the root cgroup but it's an improvement. The difference
in headline performance of the page fault microbench is marginal as
memcg is such a small component of it.
pft faults
4.0.0 4.0.0
vanilla chargefirst
Hmean faults/cpu-1 1443258.1051 ( 0.00%) 1509075.7561 ( 4.56%)
Hmean faults/cpu-3 1340385.9270 ( 0.00%) 1339160.7113 ( -0.09%)
Hmean faults/cpu-5 875599.0222 ( 0.00%) 874174.1255 ( -0.16%)
Hmean faults/cpu-7 601146.6726 ( 0.00%) 601370.9977 ( 0.04%)
Hmean faults/cpu-8 510728.2754 ( 0.00%) 510598.8214 ( -0.03%)
Hmean faults/sec-1 1432084.7845 ( 0.00%) 1497935.5274 ( 4.60%)
Hmean faults/sec-3 3943818.1437 ( 0.00%) 3941920.1520 ( -0.05%)
Hmean faults/sec-5 3877573.5867 ( 0.00%) 3869385.7553 ( -0.21%)
Hmean faults/sec-7 3991832.0418 ( 0.00%) 3992181.4189 ( 0.01%)
Hmean faults/sec-8 3987189.8167 ( 0.00%) 3986452.2204 ( -0.02%)
It's only visible at single threaded. The overhead is there for higher
threads but other factors dominate.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 662bbcb274 ("mm, sched: Allow uaccess in atomic with
pagefault_disable()") removed might_sleep() checks for all user access
code (that uses might_fault()).
The reason was to disable wrong "sleep in atomic" warnings in the
following scenario:
pagefault_disable()
rc = copy_to_user(...)
pagefault_enable()
Which is valid, as pagefault_disable() increments the preempt counter
and therefore disables the pagefault handler. copy_to_user() will not
sleep and return an error code if a page is not available.
However, as all might_sleep() checks are removed,
CONFIG_DEBUG_ATOMIC_SLEEP would no longer detect the following scenario:
spin_lock(&lock);
rc = copy_to_user(...)
spin_unlock(&lock)
If the kernel is compiled with preemption turned on, preempt_disable()
will make in_atomic() detect disabled preemption. The fault handler would
correctly never sleep on user access.
However, with preemption turned off, preempt_disable() is usually a NOP
(with !CONFIG_PREEMPT_COUNT), therefore in_atomic() will not be able to
detect disabled preemption nor disabled pagefaults. The fault handler
could sleep.
We really want to enable CONFIG_DEBUG_ATOMIC_SLEEP checks for user access
functions again, otherwise we can end up with horrible deadlocks.
Root of all evil is that pagefault_disable() acts almost as
preempt_disable(), depending on preemption being turned on/off.
As we now have pagefault_disabled(), we can use it to distinguish
whether user acces functions might sleep.
Convert might_fault() into a makro that calls __might_fault(), to
allow proper file + line messages in case of a might_sleep() warning.
Reviewed-and-tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: David.Laight@ACULAB.COM
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: airlied@linux.ie
Cc: akpm@linux-foundation.org
Cc: benh@kernel.crashing.org
Cc: bigeasy@linutronix.de
Cc: borntraeger@de.ibm.com
Cc: daniel.vetter@intel.com
Cc: heiko.carstens@de.ibm.com
Cc: herbert@gondor.apana.org.au
Cc: hocko@suse.cz
Cc: hughd@google.com
Cc: mst@redhat.com
Cc: paulus@samba.org
Cc: ralf@linux-mips.org
Cc: schwidefsky@de.ibm.com
Cc: yang.shi@windriver.com
Link: http://lkml.kernel.org/r/1431359540-32227-3-git-send-email-dahi@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This will allow FS that uses VM_PFNMAP | VM_MIXEDMAP (no page structs) to
get notified when access is a write to a read-only PFN.
This can happen if we mmap() a file then first mmap-read from it to
page-in a read-only PFN, than we mmap-write to the same page.
We need this functionality to fix a DAX bug, where in the scenario above
we fail to set ctime/mtime though we modified the file. An xfstest is
attached to this patchset that shows the failure and the fix. (A DAX
patch will follow)
This functionality is extra important for us, because upon dirtying of a
pmem page we also want to RDMA the page to a remote cluster node.
We define a new pfn_mkwrite and do not reuse page_mkwrite because
1 - The name ;-)
2 - But mainly because it would take a very long and tedious
audit of all page_mkwrite functions of VM_MIXEDMAP/VM_PFNMAP
users. To make sure they do not now CRASH. For example current
DAX code (which this is for) would crash.
If we would want to reuse page_mkwrite, We will need to first
patch all users, so to not-crash-on-no-page. Then enable this
patch. But even if I did that I would not sleep so well at night.
Adding a new vector is the safest thing to do, and is not that
expensive. an extra pointer at a static function vector per driver.
Also the new vector is better for performance, because else we
Will call all current Kernel vectors, so to:
check-ha-no-page-do-nothing and return.
No need to call it from do_shared_fault because do_wp_page is called to
change pte permissions anyway.
Signed-off-by: Yigal Korman <yigal@plexistor.com>
Signed-off-by: Boaz Harrosh <boaz@plexistor.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
A lot of filesystems use generic_file_mmap() and filemap_fault(),
f_op->mmap and vm_ops->fault aren't enough to identify filesystem.
This prints file name, vm_ops->fault, f_op->mmap and a_ops->readpage
(which is almost always implemented and filesystem-specific).
Example:
[ 23.676410] BUG: Bad page map in process sh pte:1b7e6025 pmd:19bbd067
[ 23.676887] page:ffffea00006df980 count:4 mapcount:1 mapping:ffff8800196426c0 index:0x97
[ 23.677481] flags: 0x10000000000000c(referenced|uptodate)
[ 23.677896] page dumped because: bad pte
[ 23.678205] addr:00007f52fcb17000 vm_flags:00000075 anon_vma: (null) mapping:ffff8800196426c0 index:97
[ 23.678922] file:libc-2.19.so fault:filemap_fault mmap:generic_file_readonly_mmap readpage:v9fs_vfs_readpage
[akpm@linux-foundation.org: use pr_alert, per Kirill]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Acked-by: Kirill A. Shutemov <kirill@shutemov.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We converted some of the usages of ACCESS_ONCE to READ_ONCE in the mm/
tree since it doesn't work reliably on non-scalar types.
This patch removes the rest of the usages of ACCESS_ONCE, and use the new
READ_ONCE API for the read accesses. This makes things cleaner, instead
of using separate/multiple sets of APIs.
Signed-off-by: Jason Low <jason.low2@hp.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The do_wp_page function is extremely long. Extract the logic for
handling a page belonging to a shared vma into a function of its own.
This helps the readability of the code, without doing any functional
change in it.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In some cases, do_wp_page had to copy the page suffering a write fault
to a new location. If the function logic decided that to do this, it
was done by jumping with a "goto" operation to the relevant code block.
This made the code really hard to understand. It is also against the
kernel coding style guidelines.
This patch extracts the page copy and page table update logic to a
separate function. It also clean up the naming, from "gotten" to
"wp_page_copy", and adds few comments.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When do_wp_page is ending, in several cases it needs to unlock the pages
and ptls it was accessing.
Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. Readability was
further hampered by the unlock case containing large amount of logic
needed only in one of the 3 cases.
Using goto for cleanup is generally allowed. However, moving the
trivial unlocking flows to the relevant call sites allow deeper
refactoring in the next patch.
Signed-off-by: Shachar Raindel <raindel@mellanox.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently do_wp_page contains 265 code lines. It also contains 9 goto
statements, of which 5 are targeting labels which are not cleanup
related. This makes the function extremely difficult to understand.
The following patches are an attempt at breaking the function to its
basic components, and making it easier to understand.
The patches are straight forward function extractions from do_wp_page.
As we extract functions, we remove unneeded parameters and simplify the
code as much as possible. However, the functionality is supposed to
remain completely unchanged. The patches also attempt to document the
functionality of each extracted function. In patch 2, we split the
unlock logic to the contain logic relevant to specific needs of each use
case, instead of having huge number of conditional decisions in a single
unlock flow.
This patch (of 4):
When do_wp_page is ending, in several cases it needs to reuse the existing
page. This is achieved by making the page table writable, and possibly
updating the page-cache state.
Currently, this logic was "called" by using a goto jump. This makes
following the control flow of the function harder. It is also against the
coding style guidelines for using goto.
As the code can easily be refactored into a specialized function, refactor
it out and simplify the code flow in do_wp_page.
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Haggai Eran <haggaie@mellanox.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: Michel Lespinasse <walken@google.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner reported the following on https://lkml.org/lkml/2015/3/1/226
Across the board the 4.0-rc1 numbers are much slower, and the degradation
is far worse when using the large memory footprint configs. Perf points
straight at the cause - this is from 4.0-rc1 on the "-o bhash=101073" config:
- 56.07% 56.07% [kernel] [k] default_send_IPI_mask_sequence_phys
- default_send_IPI_mask_sequence_phys
- 99.99% physflat_send_IPI_mask
- 99.37% native_send_call_func_ipi
smp_call_function_many
- native_flush_tlb_others
- 99.85% flush_tlb_page
ptep_clear_flush
try_to_unmap_one
rmap_walk
try_to_unmap
migrate_pages
migrate_misplaced_page
- handle_mm_fault
- 99.73% __do_page_fault
trace_do_page_fault
do_async_page_fault
+ async_page_fault
0.63% native_send_call_func_single_ipi
generic_exec_single
smp_call_function_single
This is showing excessive migration activity even though excessive
migrations are meant to get throttled. Normally, the scan rate is tuned
on a per-task basis depending on the locality of faults. However, if
migrations fail for any reason then the PTE scanner may scan faster if
the faults continue to be remote. This means there is higher system CPU
overhead and fault trapping at exactly the time we know that migrations
cannot happen. This patch tracks when migration failures occur and
slows the PTE scanner.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Protecting a PTE to trap a NUMA hinting fault clears the writable bit
and further faults are needed after trapping a NUMA hinting fault to set
the writable bit again. This patch preserves the writable bit when
trapping NUMA hinting faults. The impact is obvious from the number of
minor faults trapped during the basis balancing benchmark and the system
CPU usage;
autonumabench
4.0.0-rc4 4.0.0-rc4
baseline preserve
Time System-NUMA01 107.13 ( 0.00%) 103.13 ( 3.73%)
Time System-NUMA01_THEADLOCAL 131.87 ( 0.00%) 83.30 ( 36.83%)
Time System-NUMA02 8.95 ( 0.00%) 10.72 (-19.78%)
Time System-NUMA02_SMT 4.57 ( 0.00%) 3.99 ( 12.69%)
Time Elapsed-NUMA01 515.78 ( 0.00%) 517.26 ( -0.29%)
Time Elapsed-NUMA01_THEADLOCAL 384.10 ( 0.00%) 384.31 ( -0.05%)
Time Elapsed-NUMA02 48.86 ( 0.00%) 48.78 ( 0.16%)
Time Elapsed-NUMA02_SMT 47.98 ( 0.00%) 48.12 ( -0.29%)
4.0.0-rc4 4.0.0-rc4
baseline preserve
User 44383.95 43971.89
System 252.61 201.24
Elapsed 998.68 1000.94
Minor Faults 2597249 1981230
Major Faults 365 364
There is a similar drop in system CPU usage using Dave Chinner's xfsrepair
workload
4.0.0-rc4 4.0.0-rc4
baseline preserve
Amean real-xfsrepair 454.14 ( 0.00%) 442.36 ( 2.60%)
Amean syst-xfsrepair 277.20 ( 0.00%) 204.68 ( 26.16%)
The patch looks hacky but the alternatives looked worse. The tidest was
to rewalk the page tables after a hinting fault but it was more complex
than this approach and the performance was worse. It's not generally
safe to just mark the page writable during the fault if it's a write
fault as it may have been read-only for COW so that approach was
discarded.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These are three follow-on patches based on the xfsrepair workload Dave
Chinner reported was problematic in 4.0-rc1 due to changes in page table
management -- https://lkml.org/lkml/2015/3/1/226.
Much of the problem was reduced by commit 53da3bc2ba ("mm: fix up numa
read-only thread grouping logic") and commit ba68bc0115 ("mm: thp:
Return the correct value for change_huge_pmd"). It was known that the
performance in 3.19 was still better even if is far less safe. This
series aims to restore the performance without compromising on safety.
For the test of this mail, I'm comparing 3.19 against 4.0-rc4 and the
three patches applied on top
autonumabench
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanilla vmwrite-v5r8 preserve-v5r8 slowscan-v5r8
Time System-NUMA01 124.00 ( 0.00%) 161.86 (-30.53%) 107.13 ( 13.60%) 103.13 ( 16.83%) 145.01 (-16.94%)
Time System-NUMA01_THEADLOCAL 115.54 ( 0.00%) 107.64 ( 6.84%) 131.87 (-14.13%) 83.30 ( 27.90%) 92.35 ( 20.07%)
Time System-NUMA02 9.35 ( 0.00%) 10.44 (-11.66%) 8.95 ( 4.28%) 10.72 (-14.65%) 8.16 ( 12.73%)
Time System-NUMA02_SMT 3.87 ( 0.00%) 4.63 (-19.64%) 4.57 (-18.09%) 3.99 ( -3.10%) 3.36 ( 13.18%)
Time Elapsed-NUMA01 570.06 ( 0.00%) 567.82 ( 0.39%) 515.78 ( 9.52%) 517.26 ( 9.26%) 543.80 ( 4.61%)
Time Elapsed-NUMA01_THEADLOCAL 393.69 ( 0.00%) 384.83 ( 2.25%) 384.10 ( 2.44%) 384.31 ( 2.38%) 380.73 ( 3.29%)
Time Elapsed-NUMA02 49.09 ( 0.00%) 49.33 ( -0.49%) 48.86 ( 0.47%) 48.78 ( 0.63%) 50.94 ( -3.77%)
Time Elapsed-NUMA02_SMT 47.51 ( 0.00%) 47.15 ( 0.76%) 47.98 ( -0.99%) 48.12 ( -1.28%) 49.56 ( -4.31%)
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
User 46334.60 46391.94 44383.95 43971.89 44372.12
System 252.84 284.66 252.61 201.24 249.00
Elapsed 1062.14 1050.96 998.68 1000.94 1026.78
Overall the system CPU usage is comparable and the test is naturally a
bit variable. The slowing of the scanner hurts numa01 but on this
machine it is an adverse workload and patches that dramatically help it
often hurt absolutely everything else.
Due to patch 2, the fault activity is interesting
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
Minor Faults 2097811 2656646 2597249 1981230 1636841
Major Faults 362 450 365 364 365
Note the impact preserving the write bit across protection updates and
fault reduces faults.
NUMA alloc hit 1229008 1217015 1191660 1178322 1199681
NUMA alloc miss 0 0 0 0 0
NUMA interleave hit 0 0 0 0 0
NUMA alloc local 1228514 1216317 1190871 1177448 1199021
NUMA base PTE updates 245706197 240041607 238195516 244704842 115012800
NUMA huge PMD updates 479530 468448 464868 477573 224487
NUMA page range updates 491225557 479886983 476207932 489222218 229950144
NUMA hint faults 659753 656503 641678 656926 294842
NUMA hint local faults 381604 373963 360478 337585 186249
NUMA hint local percent 57 56 56 51 63
NUMA pages migrated 5412140 6374899 6266530 5277468 5755096
AutoNUMA cost 5121% 5083% 4994% 5097% 2388%
Here the impact of slowing the PTE scanner on migratrion failures is
obvious as "NUMA base PTE updates" and "NUMA huge PMD updates" are
massively reduced even though the headline performance is very similar.
As xfsrepair was the reported workload here is the impact of the series
on it.
xfsrepair
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanilla vmwrite-v5r8 preserve-v5r8 slowscan-v5r8
Min real-fsmark 1183.29 ( 0.00%) 1165.73 ( 1.48%) 1152.78 ( 2.58%) 1153.64 ( 2.51%) 1177.62 ( 0.48%)
Min syst-fsmark 4107.85 ( 0.00%) 4027.75 ( 1.95%) 3986.74 ( 2.95%) 3979.16 ( 3.13%) 4048.76 ( 1.44%)
Min real-xfsrepair 441.51 ( 0.00%) 463.96 ( -5.08%) 449.50 ( -1.81%) 440.08 ( 0.32%) 439.87 ( 0.37%)
Min syst-xfsrepair 195.76 ( 0.00%) 278.47 (-42.25%) 262.34 (-34.01%) 203.70 ( -4.06%) 143.64 ( 26.62%)
Amean real-fsmark 1188.30 ( 0.00%) 1177.34 ( 0.92%) 1157.97 ( 2.55%) 1158.21 ( 2.53%) 1182.22 ( 0.51%)
Amean syst-fsmark 4111.37 ( 0.00%) 4055.70 ( 1.35%) 3987.19 ( 3.02%) 3998.72 ( 2.74%) 4061.69 ( 1.21%)
Amean real-xfsrepair 450.88 ( 0.00%) 468.32 ( -3.87%) 454.14 ( -0.72%) 442.36 ( 1.89%) 440.59 ( 2.28%)
Amean syst-xfsrepair 199.66 ( 0.00%) 290.60 (-45.55%) 277.20 (-38.84%) 204.68 ( -2.51%) 150.55 ( 24.60%)
Stddev real-fsmark 4.12 ( 0.00%) 10.82 (-162.29%) 4.14 ( -0.28%) 5.98 (-45.05%) 4.60 (-11.53%)
Stddev syst-fsmark 2.63 ( 0.00%) 20.32 (-671.82%) 0.37 ( 85.89%) 16.47 (-525.59%) 15.05 (-471.79%)
Stddev real-xfsrepair 6.87 ( 0.00%) 4.55 ( 33.75%) 3.46 ( 49.58%) 1.78 ( 74.12%) 0.52 ( 92.50%)
Stddev syst-xfsrepair 3.02 ( 0.00%) 10.30 (-241.37%) 13.17 (-336.37%) 0.71 ( 76.63%) 5.00 (-65.61%)
CoeffVar real-fsmark 0.35 ( 0.00%) 0.92 (-164.73%) 0.36 ( -2.91%) 0.52 (-48.82%) 0.39 (-12.10%)
CoeffVar syst-fsmark 0.06 ( 0.00%) 0.50 (-682.41%) 0.01 ( 85.45%) 0.41 (-543.22%) 0.37 (-478.78%)
CoeffVar real-xfsrepair 1.52 ( 0.00%) 0.97 ( 36.21%) 0.76 ( 49.94%) 0.40 ( 73.62%) 0.12 ( 92.33%)
CoeffVar syst-xfsrepair 1.51 ( 0.00%) 3.54 (-134.54%) 4.75 (-214.31%) 0.34 ( 77.20%) 3.32 (-119.63%)
Max real-fsmark 1193.39 ( 0.00%) 1191.77 ( 0.14%) 1162.90 ( 2.55%) 1166.66 ( 2.24%) 1188.50 ( 0.41%)
Max syst-fsmark 4114.18 ( 0.00%) 4075.45 ( 0.94%) 3987.65 ( 3.08%) 4019.45 ( 2.30%) 4082.80 ( 0.76%)
Max real-xfsrepair 457.80 ( 0.00%) 474.60 ( -3.67%) 457.82 ( -0.00%) 444.42 ( 2.92%) 441.03 ( 3.66%)
Max syst-xfsrepair 203.11 ( 0.00%) 303.65 (-49.50%) 294.35 (-44.92%) 205.33 ( -1.09%) 155.28 ( 23.55%)
The really relevant lines as syst-xfsrepair which is the system CPU
usage when running xfsrepair. Note that on my machine the overhead was
45% higher on 4.0-rc4 which may be part of what Dave is seeing. Once we
preserve the write bit across faults, it's only 2.51% higher on average.
With the full series applied, system CPU usage is 24.6% lower on
average.
Again, the impact of preserving the write bit on minor faults is obvious
and the impact of slowing scanning after migration failures is obvious
on the PTE updates. Note also that the number of pages migrated is much
reduced even though the headline performance is comparable.
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
Minor Faults 153466827 254507978 249163829 153501373 105737890
Major Faults 610 702 690 649 724
NUMA base PTE updates 217735049 210756527 217729596 216937111 144344993
NUMA huge PMD updates 129294 85044 106921 127246 79887
NUMA pages migrated 21938995 29705270 28594162 22687324 16258075
3.19.0 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4 4.0.0-rc4
vanilla vanillavmwrite-v5r8preserve-v5r8slowscan-v5r8
Mean sdb-avgqusz 13.47 2.54 2.55 2.47 2.49
Mean sdb-avgrqsz 202.32 140.22 139.50 139.02 138.12
Mean sdb-await 25.92 5.09 5.33 5.02 5.22
Mean sdb-r_await 4.71 0.19 0.83 0.51 0.11
Mean sdb-w_await 104.13 5.21 5.38 5.05 5.32
Mean sdb-svctm 0.59 0.13 0.14 0.13 0.14
Mean sdb-rrqm 0.16 0.00 0.00 0.00 0.00
Mean sdb-wrqm 3.59 1799.43 1826.84 1812.21 1785.67
Max sdb-avgqusz 111.06 12.13 14.05 11.66 15.60
Max sdb-avgrqsz 255.60 190.34 190.01 187.33 191.78
Max sdb-await 168.24 39.28 49.22 44.64 65.62
Max sdb-r_await 660.00 52.00 280.00 76.00 12.00
Max sdb-w_await 7804.00 39.28 49.22 44.64 65.62
Max sdb-svctm 4.00 2.82 2.86 1.98 2.84
Max sdb-rrqm 8.30 0.00 0.00 0.00 0.00
Max sdb-wrqm 34.20 5372.80 5278.60 5386.60 5546.15
FWIW, I also checked SPECjbb in different configurations but it's
similar observations -- minor faults lower, PTE update activity lower
and performance is roughly comparable against 3.19.
This patch (of 3):
Threads that share writable data within pages are grouped together as
related tasks. This decision is based on whether the PTE is marked
dirty which is subject to timing races between the PTE scanner update
and when the application writes the page. If the page is file-backed,
then background flushes and sync also affect placement. This is
unpredictable behaviour which is impossible to reason about so this
patch makes grouping decisions based on the VMA flags.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Dave Chinner <david@fromorbit.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dave Chinner reported that commit 4d94246699 ("mm: convert
p[te|md]_mknonnuma and remaining page table manipulations") slowed down
his xfsrepair test enormously. In particular, it was using more system
time due to extra TLB flushing.
The ultimate reason turns out to be how the change to use the regular
page table accessor functions broke the NUMA grouping logic. The old
special mknuma/mknonnuma code accessed the page table present bit and
the magic NUMA bit directly, while the new code just changes the page
protections using PROT_NONE and the regular vma protections.
That sounds equivalent, and from a fault standpoint it really is, but a
subtle side effect is that the *other* protection bits of the page table
entries also change. And the code to decide how to group the NUMA
entries together used the writable bit to decide whether a particular
page was likely to be shared read-only or not.
And with the change to make the NUMA handling use the regular permission
setting functions, that writable bit was basically always cleared for
private mappings due to COW. So even if the page actually ends up being
written to in the end, the NUMA balancing would act as if it was always
shared RO.
This code is a heuristic anyway, so the fix - at least for now - is to
instead check whether the page is dirty rather than writable. The bit
doesn't change with protection changes.
NOTE! This also adds a FIXME comment to revisit this issue,
Not only should we probably re-visit the whole "is this a shared
read-only page" heuristic (we might want to take the vma permissions
into account and base this more on those than the per-page ones, and
also look at whether the particular access that triggers it is a write
or not), but the whole COW issue shows that we should think about the
NUMA fault handling some more.
For example, maybe we should do the early-COW thing that a regular fault
does. Or maybe we should accept that while using the same bits as
PROTNONE was a good thing (and got rid of the specual NUMA bit), we
might still want to just preseve the other protection bits across NUMA
faulting.
Those are bigger questions, left for later. This just fixes up the
heuristic so that it at least approximates working again. More analysis
and work needed.
Reported-by: Dave Chinner <david@fromorbit.com>
Tested-by: Mel Gorman <mgorman@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently COW of an XIP file is done by first bringing in a read-only
mapping, then retrying the fault and copying the page. It is much more
efficient to tell the fault handler that a COW is being attempted (by
passing in the pre-allocated page in the vm_fault structure), and allow
the handler to perform the COW operation itself.
The handler cannot insert the page itself if there is already a read-only
mapping at that address, so allow the handler to return VM_FAULT_LOCKED
and set the fault_page to be NULL. This indicates to the MM code that the
i_mmap_lock is held instead of the page lock.
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
DAX is a replacement for the variation of XIP currently supported by the
ext2 filesystem. We have three different things in the tree called 'XIP',
and the new focus is on access to data rather than executables, so a name
change was in order. DAX stands for Direct Access. The X is for
eXciting.
The new focus on data access has resulted in more careful attention to
races that exist in the current XIP code, but are not hit by the use-case
that it was designed for. XIP's architecture worked fine for ext2, but
DAX is architected to work with modern filsystems such as ext4 and XFS.
DAX is not intended for use with btrfs; the value that btrfs adds relies
on manipulating data and writing data to different locations, while DAX's
value is for write-in-place and keeping the kernel from touching the data.
DAX was developed in order to support NV-DIMMs, but it's become clear that
its usefuless extends beyond NV-DIMMs and there are several potential
customers including the tracing machinery. Other people want to place the
kernel log in an area of memory, as long as they have a BIOS that does not
clear DRAM on reboot.
Patch 1 is a bug fix, probably worth including in 3.18.
Patches 2 & 3 are infrastructure for DAX.
Patches 4-8 replace the XIP code with its DAX equivalents, transforming
ext2 to use the DAX code as we go. Note that patch 10 is the
Documentation patch.
Patches 9-15 clean up after the XIP code, removing the infrastructure
that is no longer needed and renaming various XIP things to DAX.
Most of these patches were added after Jan found things he didn't
like in an earlier version of the ext4 patch ... that had been copied
from ext2. So ext2 i being transformed to do things the same way that
ext4 will later. The ability to mount ext2 filesystems with the 'xip'
option is retained, although the 'dax' option is now preferred.
Patch 16 adds some DAX infrastructure to support ext4.
Patch 17 adds DAX support to ext4. It is broadly similar to ext2's DAX
support, but it is more efficient than ext4's due to its support for
unwritten extents.
Patch 18 is another cleanup patch renaming XIP to DAX.
My thanks to Mathieu Desnoyers for his reviews of the v11 patchset. Most
of the changes below were based on his feedback.
This patch (of 18):
Pagecache faults recheck i_size after taking the page lock to ensure that
the fault didn't race against a truncate. We don't have a page to lock in
the XIP case, so use i_mmap_lock_read() instead. It is locked in the
truncate path in unmap_mapping_range() after updating i_size. So while we
hold it in the fault path, we are guaranteed that either i_size has
already been updated in the truncate path, or that the truncate will
subsequently call zap_page_range_single() and so remove the mapping we
have just inserted.
There is a window of time in which i_size has been reduced and the thread
has a mapping to a page which will be removed from the file, but this is
harmless as the page will not be allocated to a different purpose before
the thread's access to it is revoked.
[akpm@linux-foundation.org: switch to i_mmap_lock_read(), add comment in unmap_single_vma()]
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andreas Dilger <andreas.dilger@intel.com>
Cc: Boaz Harrosh <boaz@plexistor.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
For whatever reason, generic_access_phys() only remaps one page, but
actually allows to access arbitrary size. It's quite easy to trigger
large reads, like printing out large structure with gdb, which leads to a
crash. Fix it by remapping correct size.
Fixes: 28b2ee20c7 ("access_process_vm device memory infrastructure")
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
pte_protnone_numa is only safe to use after VMA checks for PROT_NONE are
complete. Treating a real PROT_NONE PTE as a NUMA hinting fault is going
to result in strangeness so add a check for it. BUG_ON looks like
overkill but if this is hit then it's a serious bug that could result in
corruption so do not even try recovering. It would have been more
comprehensive to check VMA flags in pte_protnone_numa but it would have
made the API ugly just for a debugging check.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Dave Jones <davej@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>