* refs/heads/tmp-c71ad0f:
BACKPORT: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
ANDROID: sdcardfs: update module info
ANDROID: sdcardfs: use d_splice_alias
ANDROID: sdcardfs: add read_iter/write_iter opeations
ANDROID: sdcardfs: fix ->llseek to update upper and lower offset
ANDROID: sdcardfs: copy lower inode attributes in ->ioctl
ANDROID: sdcardfs: remove unnecessary call to do_munmap
Merge 4.4.59 into android-4.4
UPSTREAM: ipv6 addrconf: implement RFC7559 router solicitation backoff
android: base-cfg: enable CONFIG_INET_DIAG_DESTROY
ANDROID: android-base.cfg: add CONFIG_MODULES option
ANDROID: android-base.cfg: add CONFIG_IKCONFIG option
ANDROID: android-base.cfg: properly sort the file
ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES.
ANDROID: sort android-recommended.cfg
UPSTREAM: config/android: Remove CONFIG_IPV6_PRIVACY
UPSTREAM: config: android: set SELinux as default security mode
config: android: move device mapper options to recommended
ANDROID: ARM64: Allow to choose appended kernel image
UPSTREAM: arm64: vdso: constify vm_special_mapping used for aarch32 vectors page
UPSTREAM: arm64: vdso: add __init section marker to alloc_vectors_page
UPSTREAM: ARM: 8597/1: VDSO: put RO and RO after init objects into proper sections
UPSTREAM: arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
UPSTREAM: arm64: Refactor vDSO time functions
UPSTREAM: arm64: fix vdso-offsets.h dependency
UPSTREAM: kbuild: drop FORCE from PHONY targets
UPSTREAM: mm: add PHYS_PFN, use it in __phys_to_pfn()
UPSTREAM: ARM: 8476/1: VDSO: use PTR_ERR_OR_ZERO for vma check
Linux 4.4.58
crypto: algif_hash - avoid zero-sized array
fbcon: Fix vc attr at deinit
serial: 8250_pci: Detach low-level driver during PCI error recovery
ACPI / blacklist: Make Dell Latitude 3350 ethernet work
ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520
uvcvideo: uvc_scan_fallback() for webcams with broken chain
s390/zcrypt: Introduce CEX6 toleration
block: allow WRITE_SAME commands with the SG_IO ioctl
vfio/spapr: Postpone allocation of userspace version of TCE table
PCI: Do any VF BAR updates before enabling the BARs
PCI: Ignore BAR updates on virtual functions
PCI: Update BARs using property bits appropriate for type
PCI: Don't update VF BARs while VF memory space is enabled
PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
PCI: Add comments about ROM BAR updating
PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
PCI: Separate VF BAR updates from standard BAR updates
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
igb: add i211 to i210 PHY workaround
igb: Workaround for igb i210 firmware issue
xen: do not re-use pirq number cached in pci device msi msg data
xfs: clear _XBF_PAGES from buffers when readahead page
USB: usbtmc: add missing endpoint sanity check
nl80211: fix dumpit error path RTNL deadlocks
xfs: fix up xfs_swap_extent_forks inline extent handling
xfs: don't allow di_size with high bit set
libceph: don't set weight to IN when OSD is destroyed
raid10: increment write counter after bio is split
cpufreq: Restore policy min/max limits on CPU online
ARM: dts: at91: sama5d2: add dma properties to UART nodes
ARM: at91: pm: cpu_idle: switch DDR to power-down mode
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
xen/acpi: upload PM state from init-domain to Xen
mmc: sdhci: Do not disable interrupts while waiting for clock
ext4: mark inode dirty after converting inline directory
parport: fix attempt to write duplicate procfiles
iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
iio: adc: ti_am335x_adc: fix fifo overrun recovery
mmc: ushc: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
uwb: i1480-dfu: fix NULL-deref at probe
usb: hub: Fix crash after failure to read BOS descriptor
usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
USB: wusbcore: fix NULL-deref at probe
USB: idmouse: fix NULL-deref at probe
USB: lvtest: fix NULL-deref at probe
USB: uss720: fix NULL-deref at probe
usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
ACM gadget: fix endianness in notifications
USB: serial: qcserial: add Dell DW5811e
USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
ALSA: hda - Adding a group of pin definition to fix headset problem
ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
ALSA: seq: Fix racy cell insertions during snd_seq_pool_done()
Input: sur40 - validate number of endpoints before using them
Input: kbtab - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: ims-pcu - validate number of endpoints before using them
Input: iforce - validate number of endpoints before using them
Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000
Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw
tcp: initialize icsk_ack.lrcvtime at session start time
socket, bpf: fix sk_filter use after free in sk_clone_lock
ipv4: provide stronger user input validation in nl_fib_input()
net: bcmgenet: remove bcmgenet_internal_phy_setup()
net/mlx5e: Count LRO packets correctly
net/mlx5: Increase number of max QPs in default profile
net: unix: properly re-increment inflight counter of GC discarded candidates
amd-xgbe: Fix jumbo MTU processing on newer hardware
net: properly release sk_frag.page
net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled
net/openvswitch: Set the ipv6 source tunnel key address attribute correctly
Linux 4.4.57
ext4: fix fencepost in s_first_meta_bg validation
percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
gfs2: Avoid alignment hole in struct lm_lockname
isdn/gigaset: fix NULL-deref at probe
target: Fix VERIFY_16 handling in sbc_parse_cdb
scsi: libiscsi: add lock around task lists to fix list corruption regression
scsi: lpfc: Add shutdown method for kexec
target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export
md/raid1/10: fix potential deadlock
powerpc/boot: Fix zImage TOC alignment
cpufreq: Fix and clean up show_cpuinfo_cur_freq()
perf/core: Fix event inheritance on fork()
give up on gcc ilog2() constant optimizations
kernek/fork.c: allocate idle task for a CPU always on its local node
hv_netvsc: use skb_get_hash() instead of a homegrown implementation
tpm_tis: Use devm_free_irq not free_irq
drm/amdgpu: add missing irq.h include
s390/pci: fix use after free in dma_init
KVM: PPC: Book3S PR: Fix illegal opcode emulation
xen/qspinlock: Don't kick CPU if IRQ is not initialized
Drivers: hv: avoid vfree() on crash
Drivers: hv: balloon: don't crash when memory is added in non-sorted order
pinctrl: cherryview: Do not mask all interrupts in probe
ACPI / video: skip evaluating _DOD when it does not exist
cxlflash: Increase cmd_per_lun for better throughput
crypto: mcryptd - Fix load failure
crypto: cryptd - Assign statesize properly
crypto: ghash-clmulni - Fix load failure
USB: don't free bandwidth_mutex too early
usb: core: hub: hub_port_init lock controller instead of bus
ANDROID: sdcardfs: Fix style issues in macros
ANDROID: sdcardfs: Use seq_puts over seq_printf
ANDROID: sdcardfs: Use to kstrout
ANDROID: sdcardfs: Use pr_[...] instead of printk
ANDROID: sdcardfs: remove unneeded null check
ANDROID: sdcardfs: Fix style issues with comments
ANDROID: sdcardfs: Fix formatting
ANDROID: sdcardfs: correct order of descriptors
fix the deadlock in xt_qtaguid when enable DDEBUG
net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.
Linux 4.4.56
futex: Add missing error handling to FUTEX_REQUEUE_PI
futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
fscrypto: lock inode while setting encryption policy
fscrypt: fix renaming and linking special files
net sched actions: decrement module reference count after table flush.
dccp: fix memory leak during tear-down of unsuccessful connection request
dccp/tcp: fix routing redirect race
bridge: drop netfilter fake rtable unconditionally
ipv6: avoid write to a possibly cloned skb
ipv6: make ECMP route replacement less greedy
mpls: Send route delete notifications when router module is unloaded
act_connmark: avoid crashing on malformed nlattrs with null parms
uapi: fix linux/packet_diag.h userspace compilation error
vrf: Fix use-after-free in vrf_xmit
dccp: fix use-after-free in dccp_feat_activate_values
net: fix socket refcounting in skb_complete_tx_timestamp()
net: fix socket refcounting in skb_complete_wifi_ack()
tcp: fix various issues for sockets morphing to listen state
dccp: Unlock sock before calling sk_free()
net: net_enable_timestamp() can be called from irq contexts
net: don't call strlen() on the user buffer in packet_bind_spkt()
l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
ipv4: mask tos for input route
vti6: return GRE_KEY for vti6
vxlan: correctly validate VXLAN ID against VXLAN_N_VID
netlink: remove mmapped netlink support
ANDROID: mmc: core: export emmc revision
BACKPORT: mmc: core: Export device lifetime information through sysfs
ANDROID: android-verity: do not compile as independent module
ANDROID: sched: fix duplicate sched_group_energy const specifiers
config: disable CONFIG_USELIB and CONFIG_FHANDLE
ANDROID: power: align wakeup_sources format
ANDROID: dm: android-verity: allow disable dm-verity for Treble VTS
uid_sys_stats: change to use rt_mutex
ANDROID: vfs: user permission2 in notify_change2
ANDROID: sdcardfs: Fix gid issue
ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h
ANDROID: sdcardfs: Remove uninformative prints
ANDROID: sdcardfs: move path_put outside of spinlock
ANDROID: sdcardfs: Use case insensitive hash function
ANDROID: sdcardfs: declare MODULE_ALIAS_FS
ANDROID: sdcardfs: Get the blocksize from the lower fs
ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve
ANDROID: sdcardfs: Switch to internal case insensitive compare
ANDROID: sdcardfs: Use spin_lock_nested
ANDROID: sdcardfs: Replace get/put with d_lock
ANDROID: sdcardfs: rate limit warning print
ANDROID: sdcardfs: Fix case insensitive lookup
ANDROID: uid_sys_stats: account for fsync syscalls
ANDROID: sched: add a counter to track fsync
ANDROID: uid_sys_stats: fix negative write bytes.
ANDROID: uid_sys_stats: allow writing same state
ANDROID: uid_sys_stats: rename uid_cputime.c to uid_sys_stats.c
ANDROID: uid_cputime: add per-uid IO usage accounting
DTB: Add EAS compatible Juno Energy model to 'juno.dts'
arm64: dts: juno: Add idle-states to device tree
ANDROID: Replace spaces by '_' for some android filesystem tracepoints.
usb: gadget: f_accessory: Fix for UsbAccessory clean unbind.
android: binder: move global binder state into context struct.
android: binder: add padding to binder_fd_array_object.
binder: use group leader instead of open thread
nf: IDLETIMER: Use fullsock when querying uid
nf: IDLETIMER: Fix use after free condition during work
ANDROID: dm: android-verity: fix table_make_digest() error handling
ANDROID: usb: gadget: function: Fix commenting style
cpufreq: interactive governor drops bits in time calculation
ANDROID: sdcardfs: support direct-IO (DIO) operations
ANDROID: sdcardfs: implement vm_ops->page_mkwrite
ANDROID: sdcardfs: Don't bother deleting freelist
ANDROID: sdcardfs: Add missing path_put
ANDROID: sdcardfs: Fix incorrect hash
ANDROID: ext4 crypto: Disables zeroing on truncation when there's no key
ANDROID: ext4: add a non-reversible key derivation method
ANDROID: ext4: allow encrypting filenames using HEH algorithm
ANDROID: arm64/crypto: add ARMv8-CE optimized poly_hash algorithm
ANDROID: crypto: heh - factor out poly_hash algorithm
ANDROID: crypto: heh - Add Hash-Encrypt-Hash (HEH) algorithm
ANDROID: crypto: gf128mul - Add ble multiplication functions
ANDROID: crypto: gf128mul - Refactor gf128 overflow macros and tables
UPSTREAM: crypto: gf128mul - Zero memory when freeing multiplication table
ANDROID: crypto: shash - Add crypto_grab_shash() and crypto_spawn_shash_alg()
ANDROID: crypto: allow blkcipher walks over ablkcipher data
UPSTREAM: arm/arm64: crypto: assure that ECB modes don't require an IV
ANDROID: Refactor fs readpage/write tracepoints.
ANDROID: export security_path_chown
Squashfs: optimize reading uncompressed data
Squashfs: implement .readpages()
Squashfs: replace buffer_head with BIO
Squashfs: refactor page_actor
Squashfs: remove the FILE_CACHE option
ANDROID: android-recommended.cfg: CONFIG_CPU_SW_DOMAIN_PAN=y
FROMLIST: 9p: fix a potential acl leak
BACKPORT: posix_acl: Clear SGID bit when setting file permissions
UPSTREAM: udp: properly support MSG_PEEK with truncated buffers
UPSTREAM: arm64: Allow hw watchpoint of length 3,5,6 and 7
BACKPORT: arm64: hw_breakpoint: Handle inexact watchpoint addresses
UPSTREAM: arm64: Allow hw watchpoint at varied offset from base address
BACKPORT: hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
ANDROID: sdcardfs: Switch strcasecmp for internal call
ANDROID: sdcardfs: switch to full_name_hash and qstr
ANDROID: sdcardfs: Add GID Derivation to sdcardfs
ANDROID: sdcardfs: Remove redundant operation
ANDROID: sdcardfs: add support for user permission isolation
ANDROID: sdcardfs: Refactor configfs interface
ANDROID: sdcardfs: Allow non-owners to touch
ANDROID: binder: fix format specifier for type binder_size_t
ANDROID: fs: Export vfs_rmdir2
ANDROID: fs: Export free_fs_struct and set_fs_pwd
BACKPORT: Input: xpad - validate USB endpoint count during probe
BACKPORT: Input: xpad - fix oops when attaching an unknown Xbox One gamepad
ANDROID: mnt: remount should propagate to slaves of slaves
ANDROID: sdcardfs: Switch ->d_inode to d_inode()
ANDROID: sdcardfs: Fix locking issue with permision fix up
ANDROID: sdcardfs: Change magic value
ANDROID: sdcardfs: Use per mount permissions
ANDROID: sdcardfs: Add gid and mask to private mount data
ANDROID: sdcardfs: User new permission2 functions
ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
ANDROID: vfs: Add permission2 for filesystems with per mount permissions
ANDROID: vfs: Allow filesystems to access their private mount data
ANDROID: mnt: Add filesystem private data to mount points
ANDROID: sdcardfs: Move directory unlock before touch
ANDROID: sdcardfs: fix external storage exporting incorrect uid
ANDROID: sdcardfs: Added top to sdcardfs_inode_info
ANDROID: sdcardfs: Switch package list to RCU
ANDROID: sdcardfs: Fix locking for permission fix up
ANDROID: sdcardfs: Check for other cases on path lookup
ANDROID: sdcardfs: override umask on mkdir and create
arm64: kernel: Fix build warning
DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups
DEBUG: sched/fair: Fix missing sched_load_avg_cpu events
UPSTREAM: l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
UPSTREAM: packet: fix race condition in packet_set_ring
UPSTREAM: netlink: Fix dump skb leak/double free
UPSTREAM: net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
MIPS: Prevent "restoration" of MSA context in non-MSA kernels
net: socket: don't set sk_uid to garbage value in ->setattr()
ANDROID: configs: CONFIG_ARM64_SW_TTBR0_PAN=y
UPSTREAM: arm64: Disable PAN on uaccess_enable()
UPSTREAM: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
UPSTREAM: arm64: xen: Enable user access before a privcmd hvc call
UPSTREAM: arm64: Handle faults caused by inadvertent user access with PAN enabled
BACKPORT: arm64: Disable TTBR0_EL1 during normal kernel execution
BACKPORT: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
BACKPORT: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
BACKPORT: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
UPSTREAM: arm64: alternative: add auto-nop infrastructure
UPSTREAM: arm64: barriers: introduce nops and __nops macros for NOP sequences
Revert "FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros"
Revert "FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro"
Revert "FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1"
Revert "FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution"
Revert "FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled"
Revert "FROMLIST: arm64: xen: Enable user access before a privcmd hvc call"
Revert "FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN"
ANDROID: sched/walt: fix build failure if FAIR_GROUP_SCHED=n
ANDROID: trace: net: use %pK for kernel pointers
ANDROID: android-base: Enable QUOTA related configs
net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
net: inet: Support UID-based routing in IP protocols.
net: core: add UID to flows, rules, and routes
net: core: Add a UID field to struct sock.
Revert "net: core: Support UID-based routing."
UPSTREAM: efi/arm64: Don't apply MEMBLOCK_NOMAP to UEFI memory map mapping
UPSTREAM: arm64: mm: always take dirty state from new pte in ptep_set_access_flags
UPSTREAM: arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
UPSTREAM: arm64: Fix typo in the pmdp_huge_get_and_clear() definition
UPSTREAM: arm64: enable CONFIG_DEBUG_RODATA by default
goldfish: enable CONFIG_INET_DIAG_DESTROY
sched/walt: kill {min,max}_capacity
sched: fix wrong truncation of walt_avg
build: fix build config kernel_dir
ANDROID: dm verity: add minimum prefetch size
build: add build server configs for goldfish
usb: gadget: Fix compilation problem with tx_qlen field
Conflicts:
android/configs/android-base.cfg
arch/arm64/Makefile
arch/arm64/include/asm/cpufeature.h
arch/arm64/kernel/vdso/gettimeofday.S
arch/arm64/mm/cache.S
drivers/md/Kconfig
drivers/misc/Makefile
drivers/mmc/host/sdhci.c
drivers/usb/core/hcd.c
drivers/usb/gadget/function/u_ether.c
fs/sdcardfs/derived_perm.c
fs/sdcardfs/file.c
fs/sdcardfs/inode.c
fs/sdcardfs/lookup.c
fs/sdcardfs/main.c
fs/sdcardfs/multiuser.h
fs/sdcardfs/packagelist.c
fs/sdcardfs/sdcardfs.h
fs/sdcardfs/super.c
include/linux/mmc/card.h
include/linux/mmc/mmc.h
include/trace/events/android_fs.h
include/trace/events/android_fs_template.h
drivers/android/binder.c
fs/exec.c
fs/ext4/crypto_key.c
fs/ext4/ext4.h
fs/ext4/inline.c
fs/ext4/inode.c
fs/ext4/readpage.c
fs/f2fs/data.c
fs/f2fs/inline.c
fs/mpage.c
include/linux/dcache.h
include/trace/events/sched.h
include/uapi/linux/ipv6.h
net/ipv4/tcp_ipv4.c
net/netfilter/xt_IDLETIMER.c
Change-Id: Ie345db6a14869fe0aa794aef4b71b5d0d503690b
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
refs/heads/tmp-28ec98b:
Linux 4.4.55
ext4: don't BUG when truncating encrypted inodes on the orphan list
dm: flush queued bios when process blocks to avoid deadlock
nfit, libnvdimm: fix interleave set cookie calculation
s390/kdump: Use "LINUX" ELF note name instead of "CORE"
KVM: s390: Fix guest migration for huge guests resulting in panic
mvsas: fix misleading indentation
serial: samsung: Continue to work if DMA request fails
USB: serial: io_ti: fix information leak in completion handler
USB: serial: io_ti: fix NULL-deref in interrupt callback
USB: iowarrior: fix NULL-deref in write
USB: iowarrior: fix NULL-deref at probe
USB: serial: omninet: fix reference leaks at open
USB: serial: safe_serial: fix information leak in completion handler
usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
usb: host: xhci-dbg: HCIVERSION should be a binary number
usb: gadget: function: f_fs: pass companion descriptor along
usb: dwc3: gadget: make Set Endpoint Configuration macros safe
usb: gadget: dummy_hcd: clear usb_gadget region before registration
powerpc: Emulation support for load/store instructions on LE
tracing: Add #undef to fix compile error
MIPS: Netlogic: Fix CP0_EBASE redefinition warnings
MIPS: DEC: Avoid la pseudo-instruction in delay slots
mm: memcontrol: avoid unused function warning
cpmac: remove hopeless #warning
MIPS: ralink: Remove unused rt*_wdt_reset functions
MIPS: ralink: Cosmetic change to prom_init().
mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy
MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change
MIPS: ip22: Fix ip28 build for modern gcc
MIPS: Update ip27_defconfig for SCSI_DH change
MIPS: ip27: Disable qlge driver in defconfig
MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change
crypto: improve gcc optimization flags for serpent and wp512
USB: serial: digi_acceleport: fix OOB-event processing
USB: serial: digi_acceleport: fix OOB data sanity check
Linux 4.4.54
drivers: hv: Turn off write permission on the hypercall page
fat: fix using uninitialized fields of fat_inode/fsinfo_inode
libceph: use BUG() instead of BUG_ON(1)
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
fakelb: fix schedule while atomic
drm/atomic: fix an error code in mode_fixup()
drm/ttm: Make sure BOs being swapped out are cacheable
drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058
drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS
drm/ast: Call open_key before enable_mmio in POST code
drm/ast: Fix test for VGA enabled
drm/amdgpu: add more cases to DCE11 possible crtc mask setup
mac80211: flush delayed work when entering suspend
xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
pwm: pca9685: Fix period change with same duty cycle
nlm: Ensure callback code also checks that the files match
target: Fix NULL dereference during LUN lookup + active I/O shutdown
ceph: remove req from unsafe list when unregistering it
ktest: Fix child exit code processing
IB/srp: Fix race conditions related to task management
IB/srp: Avoid that duplicate responses trigger a kernel bug
IB/IPoIB: Add destination address when re-queue packet
IB/ipoib: Fix deadlock between rmmod and set_mode
mnt: Tuck mounts under others instead of creating shadow/side mounts.
net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put()
s390: use correct input data address for setup_randomness
s390: make setup_randomness work
s390: TASK_SIZE for kernel threads
s390/dcssblk: fix device size calculation in dcssblk_direct_access()
s390/qdio: clear DSCI prior to scanning multiple input queues
Bluetooth: Add another AR3012 04ca:3018 device
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
KVM: s390: Disable dirty log retrieval for UCONTROL guests
serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards
tty: n_hdlc: get rid of racy n_hdlc.tbuf
TTY: n_hdlc, fix lockdep false positive
Linux 4.4.53
scsi: lpfc: Correct WQ creation for pagesize
MIPS: IP22: Fix build error due to binutils 2.25 uselessnes.
MIPS: IP22: Reformat inline assembler code to modern standards.
powerpc/xmon: Fix data-breakpoint
dmaengine: ipu: Make sure the interrupt routine checks all interrupts.
bcma: use (get|put)_device when probing/removing device driver
md linear: fix a race between linear_add() and linear_congested()
rtc: sun6i: Switch to the external oscillator
rtc: sun6i: Add some locking
NFSv4: fix getacl ERANGE for some ACL buffer sizes
NFSv4: fix getacl head length estimation
NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
nfsd: special case truncates some more
nfsd: minor nfsd_setattr cleanup
rtlwifi: rtl8192c-common: Fix "BUG: KASAN:
rtlwifi: Fix alignment issues
gfs2: Add missing rcu locking for glock lookup
rdma_cm: fail iwarp accepts w/o connection params
RDMA/core: Fix incorrect structure packing for booleans
Drivers: hv: util: Backup: Fix a rescind processing issue
Drivers: hv: util: Fcopy: Fix a rescind processing issue
Drivers: hv: util: kvp: Fix a rescind processing issue
hv: init percpu_list in hv_synic_alloc()
hv: allocate synic pages for all present CPUs
usb: gadget: udc: fsl: Add missing complete function.
usb: host: xhci: plat: check hcc_params after add hcd
usb: musb: da8xx: Remove CPPI 3.0 quirk and methods
w1: ds2490: USB transfer buffers need to be DMAable
w1: don't leak refcount on slave attach failure in w1_attach_slave_device()
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
iio: pressure: mpl3115: do not rely on structure field ordering
iio: pressure: mpl115: do not rely on structure field ordering
arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2
fuse: add missing FR_FORCE
crypto: testmgr - Pad aes_ccm_enc_tv_template vector
ath9k: use correct OTP register offsets for the AR9340 and AR9550
ath9k: fix race condition in enabling/disabling IRQs
ath5k: drop bogus warning on drv_set_key with unsupported cipher
target: Fix multi-session dynamic se_node_acl double free OOPs
target: Obtain se_node_acl->acl_kref during get_initiator_node_acl
samples/seccomp: fix 64-bit comparison macros
ext4: return EROFS if device is r/o and journal replay is needed
ext4: preserve the needs_recovery flag when the journal is aborted
ext4: fix inline data error paths
ext4: fix data corruption in data=journal mode
ext4: trim allocation requests to group size
ext4: do not polute the extents cache while shifting extents
ext4: Include forgotten start block on fallocate insert range
loop: fix LO_FLAGS_PARTSCAN hang
block/loop: fix race between I/O and set_status
jbd2: don't leak modified metadata buffers on an aborted journal
Fix: Disable sys_membarrier when nohz_full is enabled
sd: get disk reference in sd_check_events()
scsi: use 'scsi_device_from_queue()' for scsi_dh
scsi: aacraid: Reorder Adapter status check
scsi: storvsc: properly set residual data length on errors
scsi: storvsc: properly handle SRB_ERROR when sense message is present
scsi: storvsc: use tagged SRB requests if supported by the device
dm stats: fix a leaked s->histogram_boundaries array
dm cache: fix corruption seen when using cache > 2TB
ipc/shm: Fix shmat mmap nil-page protection
mm: do not access page->mapping directly on page_endio
mm: vmpressure: fix sending wrong events on underflow
mm/page_alloc: fix nodes for reclaim in fast path
iommu/vt-d: Tylersburg isoch identity map check is done too late.
iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu
regulator: Fix regulator_summary for deviceless consumers
staging: rtl: fix possible NULL pointer dereference
ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine
ALSA: hda - Add subwoofer support for Dell Inspiron 17 7000 Gaming
ALSA: seq: Fix link corruption by event error handling
ALSA: ctxfi: Fallback DMA mask to 32bit
ALSA: timer: Reject user params with too small ticks
ALSA: hda - fix Lewisburg audio issue
ALSA: hda/realtek - Cannot adjust speaker's volume on a Dell AIO
ARM: dts: at91: Enable DMA on sama5d2_xplained console
ARM: dts: at91: Enable DMA on sama5d4_xplained console
ARM: at91: define LPDDR types
media: fix dm1105.c build error
uvcvideo: Fix a wrong macro
am437x-vpfe: always assign bpp variable
MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
MIPS: Calculate microMIPS ra properly when unwinding the stack
MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
MIPS: Fix get_frame_info() handling of microMIPS function size
MIPS: Prevent unaligned accesses during stack unwinding
MIPS: Clear ISA bit correctly in get_frame_info()
MIPS: Lantiq: Keep ethernet enabled during boot
MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
MIPS: BCM47XX: Fix button inversion for Asus WL-500W
MIPS: Fix special case in 64 bit IP checksumming.
samples: move mic/mpssd example code from Documentation
Linux 4.4.52
kvm: vmx: ensure VMCS is current while enabling PML
Revert "usb: chipidea: imx: enable CI_HDRC_SET_NON_ZERO_TTHA"
rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down
block: fix double-free in the failure path of cgwb_bdi_init()
goldfish: Sanitize the broken interrupt handler
x86/platform/goldfish: Prevent unconditional loading
USB: serial: ark3116: fix register-accessor error handling
USB: serial: opticon: fix CTS retrieval at open
USB: serial: spcp8x5: fix modem-status handling
USB: serial: ftdi_sio: fix line-status over-reporting
USB: serial: ftdi_sio: fix extreme low-latency setting
USB: serial: ftdi_sio: fix modem-status error handling
USB: serial: cp210x: add new IDs for GE Bx50v3 boards
USB: serial: mos7840: fix another NULL-deref at open
tty: serial: msm: Fix module autoload
net: socket: fix recvmmsg not returning error from sock_error
ip: fix IP_CHECKSUM handling
irda: Fix lockdep annotations in hashbin_delete().
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
packet: Do not call fanout_release from atomic contexts
packet: fix races in fanout_add()
net/llc: avoid BUG_ON() in skb_orphan()
blk-mq: really fix plug list flushing for nomerge queues
rtc: interface: ignore expired timers when enqueuing new timers
rtlwifi: rtl_usb: Fix missing entry in USB driver's private data
Linux 4.4.51
mmc: core: fix multi-bit bus width without high-speed mode
bcache: Make gc wakeup sane, remove set_task_state()
ntb_transport: Pick an unused queue
NTB: ntb_transport: fix debugfs_remove_recursive
printk: use rcuidle console tracepoint
ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
futex: Move futex_init() to core_initcall
drm/dp/mst: fix kernel oops when turning off secondary monitor
drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
Input: elan_i2c - add ELAN0605 to the ACPI table
Fix missing sanity check in /dev/sg
scsi: don't BUG_ON() empty DMA transfers
fuse: fix use after free issue in fuse_dev_do_read()
siano: make it work again with CONFIG_VMAP_STACK
vfs: fix uninitialized flags in splice_to_pipe()
Linux 4.4.50
l2tp: do not use udp_ioctl()
ping: fix a null pointer dereference
packet: round up linear to header len
net: introduce device min_header_len
sit: fix a double free on error path
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
mlx4: Invoke softirqs after napi_reschedule
macvtap: read vnet_hdr_size once
tun: read vnet_hdr_sz once
tcp: avoid infinite loop in tcp_splice_read()
ipv6: tcp: add a missing tcp_v6_restore_cb()
ip6_gre: fix ip6gre_err() invalid reads
netlabel: out of bound access in cipso_v4_validate()
ipv4: keep skb->dst around in presence of IP options
net: use a work queue to defer net_disable_timestamp() work
tcp: fix 0 divide in __tcp_select_window()
ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
ipv6: fix ip6_tnl_parse_tlv_enc_lim()
can: Fix kernel panic at security_sock_rcv_skb
Conflicts:
drivers/scsi/sd.c
drivers/usb/gadget/function/f_fs.c
drivers/usb/host/xhci-plat.c
CRs-Fixed: 2023471
Change-Id: I396051a8de30271af77b3890d4b19787faa1c31e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.
While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.
pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.
Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.
The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.
The user mode helper is triggered by device init calls and the executable
might use the futex syscall.
futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.
If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.
Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.
[ tglx: Rewrote changelog ]
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.
For example, the 32-bit version of "__copy_from_user_inatomic()" is
mostly the special cases for the constant size, and it's actually almost
never relevant. Most users aren't actually using a constant size
anyway, and the few cases that do small constant copies are better off
just using __get_user() instead.
So get rid of the unnecessary complexity.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bd28b14591b98f696bc9f94c5ba2e598ca487dfd)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
commit fe1bce9e2107ba3a8faffe572483b6974201a0e6 upstream.
Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: bigeasy@linutronix.de
Cc: dvhart@infradead.org
Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 89e9e66ba1b3bde9d8ea90566c2aee20697ad681 upstream.
If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:
CPU0 CPU1
uval = get_user(futex)
lock(hb)
lock(hb)
futex |= FUTEX_WAITERS
....
unlock(hb)
cmpxchg(futex, uval, newval)
So the cmpxchg fails and returns -EINVAL to user space, which is wrong because
the futex value is valid.
To handle this (yes, yet another) corner case gracefully, check for a flag
change and retry.
[ tglx: Massaged changelog and slightly reworked implementation ]
Fixes: ccf9e6a80d ("futex: Make unlock_pi more robust")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fb75a4282d0d9a3c7c44d940582c2d226cf3acfb upstream.
If the proxy lock in the requeue loop acquires the rtmutex for a
waiter then it acquired also refcount on the pi_state related to the
futex, but the waiter side does not drop the reference count.
Add the missing free_pi_state() call.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Bhuvanesh_Surachari@mentor.com
Cc: Andy Lowe <Andy_Lowe@mentor.com>
Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.
To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.
The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.
While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.
In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:
/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secret
Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).
[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch of
debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
=g7ZO
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here's the "big" driver core updates for 4.4-rc1. Primarily a bunch
of debugfs updates, with a smattering of minor driver core fixes and
updates as well.
All have been in linux-next for a long time"
* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
debugfs: Add debugfs_create_ulong()
of: to support binding numa node to specified device in devicetree
debugfs: Add read-only/write-only bool file ops
debugfs: Add read-only/write-only size_t file ops
debugfs: Add read-only/write-only x64 file ops
debugfs: Consolidate file mode checks in debugfs_create_*()
Revert "mm: Check if section present during memory block (un)registering"
driver-core: platform: Provide helpers for multi-driver modules
mm: Check if section present during memory block (un)registering
devres: fix a for loop bounds check
CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
base/platform: assert that dev_pm_domain callbacks are called unconditionally
sysfs: correctly handle short reads on PREALLOC attrs.
base: soc: siplify ida usage
kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
kobject: explain what kobject's sd field is
debugfs: document that debugfs_remove*() accepts NULL and error values
debugfs: Pass bool pointer to debugfs_create_bool()
ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
when all it needs is a boolean pointer.
It would be better to update this API to make it accept 'bool *'
instead, as that will make it more consistent and often more convenient.
Over that bool takes just a byte.
That required updates to all user sites as well, in the same commit
updating the API. regmap core was also using
debugfs_{read|write}_file_bool(), directly and variable types were
updated for that to be bool as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.
We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other (and even if they do, they
may still end up in different cache lines). So put the two variables
in a small singleton struct with sufficient alignment and mark that as
__read_mostly.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Although futexes are well known for being a royal pita,
we really have very little debugging capabilities - except
for relying on tglx's eye half the time.
By simply making use of the existing fault-injection machinery,
we can improve this situation, allowing generating artificial
uaddress faults and deadlock scenarios. Of course, when this is
disabled in production systems, the overhead for failure checks
is practically zero -- so this is very cheap at the same time.
Future work would be nice to now enhance trinity to make use of
this.
There is a special tunable 'ignore-private', which can filter
out private futexes. Given the tsk->make_it_fail filter and
this option, pi futexes can be narrowed down pretty closely.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... serves a bit better to clarify between blocking
and non-blocking code paths.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull locking updates from Thomas Gleixner:
"These locking updates depend on the alreay merged sched/core branch:
- Lockless top waiter wakeup for rtmutex (Davidlohr)
- Reduce hash bucket lock contention for PI futexes (Sebastian)
- Documentation update (Davidlohr)"
* 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rtmutex: Update stale plist comments
futex: Lower the lock contention on the HB lock during wake up
locking/rtmutex: Implement lockless top-waiter wakeup
Pull timer updates from Thomas Gleixner:
"A rather largish update for everything time and timer related:
- Cache footprint optimizations for both hrtimers and timer wheel
- Lower the NOHZ impact on systems which have NOHZ or timer migration
disabled at runtime.
- Optimize run time overhead of hrtimer interrupt by making the clock
offset updates smarter
- hrtimer cleanups and removal of restrictions to tackle some
problems in sched/perf
- Some more leap second tweaks
- Another round of changes addressing the 2038 problem
- First step to change the internals of clock event devices by
introducing the necessary infrastructure
- Allow constant folding for usecs/msecs_to_jiffies()
- The usual pile of clockevent/clocksource driver updates
The hrtimer changes contain updates to sched, perf and x86 as they
depend on them plus changes all over the tree to cleanup API changes
and redundant code, which got copied all over the place. The y2038
changes touch s390 to remove the last non 2038 safe code related to
boot/persistant clock"
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
clocksource: Increase dependencies of timer-stm32 to limit build wreckage
timer: Minimize nohz off overhead
timer: Reduce timer migration overhead if disabled
timer: Stats: Simplify the flags handling
timer: Replace timer base by a cpu index
timer: Use hlist for the timer wheel hash buckets
timer: Remove FIFO "guarantee"
timers: Sanitize catchup_timer_jiffies() usage
hrtimer: Allow hrtimer::function() to free the timer
seqcount: Introduce raw_write_seqcount_barrier()
seqcount: Rename write_seqcount_barrier()
hrtimer: Fix hrtimer_is_queued() hole
hrtimer: Remove HRTIMER_STATE_MIGRATE
selftest: Timers: Avoid signal deadlock in leap-a-day
timekeeping: Copy the shadow-timekeeper over the real timekeeper last
clockevents: Check state instead of mode in suspend/resume path
selftests: timers: Add leap-second timer edge testing to leap-a-day.c
ntp: Do leapsecond adjustment in adjtimex read path
time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
ntp: Introduce and use SECS_PER_DAY macro instead of 86400
...
Pull scheduler updates from Ingo Molnar:
"The main changes are:
- lockless wakeup support for futexes and IPC message queues
(Davidlohr Bueso, Peter Zijlstra)
- Replace spinlocks with atomics in thread_group_cputimer(), to
improve scalability (Jason Low)
- NUMA balancing improvements (Rik van Riel)
- SCHED_DEADLINE improvements (Wanpeng Li)
- clean up and reorganize preemption helpers (Frederic Weisbecker)
- decouple page fault disabling machinery from the preemption
counter, to improve debuggability and robustness (David
Hildenbrand)
- SCHED_DEADLINE documentation updates (Luca Abeni)
- topology CPU masks cleanups (Bartosz Golaszewski)
- /proc/sched_debug improvements (Srikar Dronamraju)"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
sched/deadline: Remove needless parameter in dl_runtime_exceeded()
sched: Remove superfluous resetting of the p->dl_throttled flag
sched/deadline: Drop duplicate init_sched_dl_class() declaration
sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
sched/deadline: Make init_sched_dl_class() __init
sched/deadline: Optimize pull_dl_task()
sched/preempt: Add static_key() to preempt_notifiers
sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
sched/debug: Properly format runnable tasks in /proc/sched_debug
sched/numa: Only consider less busy nodes as numa balancing destinations
Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
sched/fair: Prevent throttling in early pick_next_task_fair()
preempt: Reorganize the notrace definitions a bit
preempt: Use preempt_schedule_context() as the official tracing preemption point
sched: Make preempt_schedule_context() function-tracing safe
x86: Remove cpu_sibling_mask() and cpu_core_mask()
x86: Replace cpu_**_mask() with topology_**_cpumask()
...
wake_futex_pi() wakes the task before releasing the hash bucket lock
(HB). The first thing the woken up task usually does is to acquire the
lock which requires the HB lock. On SMP Systems this leads to blocking
on the HB lock which is released by the owner shortly after.
This patch rearranges the unlock path by first releasing the HB lock and
then waking up the task.
[ tglx: Fixed up the rtmutex unlock path ]
Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier
like mb() rename it to match the recent smp_load_acquire() and
smp_store_release().
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Given the overall futex architecture, any chance of reducing
hb->lock contention is welcome. In this particular case, using
wake-queues to enable lockless wakeups addresses very much real
world performance concerns, even cases of soft-lockups in cases
of large amounts of blocked tasks (which is not hard to find in
large boxes, using but just a handful of futex).
At the lowest level, this patch can reduce latency of a single thread
attempting to acquire hb->lock in highly contended scenarios by a
up to 2x. At lower counts of nr_wake there are no regressions,
confirming, of course, that the wake_q handling overhead is practically
non existent. For instance, while a fair amount of variation,
the extended pef-bench wakeup benchmark shows for a 20 core machine
the following avg per-thread time to wakeup its share of tasks:
nr_thr ms-before ms-after
16 0.0590 0.0215
32 0.0396 0.0220
48 0.0417 0.0182
64 0.0536 0.0236
80 0.0414 0.0097
96 0.0672 0.0152
Naturally, this can cause spurious wakeups. However there is no core code
that cannot handle them afaict, and furthermore tglx does have the point
that other events can already trigger them anyway.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Mason <clm@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: George Spelvin <linux@horizon.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.985825453@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
=vXB6
-----END PGP SIGNATURE-----
Merge tag 'v4.0-rc1' into locking/core, to refresh the tree before merging new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
attach_to_pi_owner() checks p->mm to prevent attaching to kthreads and
this looks doubly wrong:
1. It should actually check PF_KTHREAD, kthread can do use_mm().
2. If this task is not kthread and it is actually the lock owner we can
wrongly return -EPERM instead of -ESRCH or retry-if-EAGAIN.
And note that this wrong EPERM is the likely case unless the exiting
task is (auto)reaped quickly, we check ->mm before PF_EXITING.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Link: http://lkml.kernel.org/r/20150202140536.GA26406@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target. This is because the
restart_block is held in the same memory allocation as the kernel stack.
Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.
Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.
It's also a decent simplification, since the restart code is more or less
identical on all architectures.
[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch fixes two separate buglets in calls to futex_lock_pi():
* Eliminate unused 'detect' argument
* Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL
The 'detect' argument of futex_lock_pi() seems never to have been
used (when it was included with the initial PI mutex implementation
in Linux 2.6.18, all checks against its value were disabled by
ANDing against 0 (i.e., if (detect... && 0)), and with
commit 778e9a9c3e, any mention of
this argument in futex_lock_pi() went way altogether. Its presence
now serves only to confuse readers of the code, by giving the
impression that the futex() FUTEX_LOCK_PI operation actually does
use the 'val' argument. This patch removes the argument.
The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes
'timeout' as one of its arguments. This misleads the reader into thinking
that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible
purpose; but it does not. Indeed, it cannot, because the checks at the
start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations
that do copy_from_user() on the timeout argument. So, in the
FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change
'timeout' to 'NULL'. This patch does that.
Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:
task A | task B
exit_pi_state_list |
pi_state = |
curr->pi_state_list->next |
| futex_requeue(requeue_pi=1)
| // pi_state is the same as
| // the one in task A
| free_pi_state(pi_state)
| list_del_init(&pi_state->list)
| kfree(pi_state)
list_del_init(&pi_state->list) |
Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.
[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
debugging. The latter comes via a seperate patch ]
Signed-off-by: Brian Silverman <bsilver16384@gmail.com>
Cc: austin.linux@gmail.com
Cc: darren@dvhart.com
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414282837-23092-1-git-send-email-bsilver16384@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Update our documentation as of fix 76835b0ebf (futex: Ensure
get_futex_key_refs() always implies a barrier). Explicitly
state that we don't do key referencing for private futexes.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Matteo Franchin <Matteo.Franchin@arm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/1414121220.817.0.camel@linux-t7sj.site
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Commit b0c29f79ec (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd0 (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.
However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).
Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ec (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:
if (match_futex(&q.key, &key2)) {
ret = -EINVAL;
goto out_put_keys;
}
which releases the keys but does not release hb->lock.
So we happily return to user space with hb->lock held and therefor
preemption disabled.
Unlock hb->lock before taking the exit route.
Reported-by: Dave "Trinity" Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
futex_lock_pi_atomic() is a maze of retry hoops and loops.
Reduce it to simple and understandable states:
First step is to lookup existing waiters (state) in the kernel.
If there is an existing waiter, validate it and attach to it.
If there is no existing waiter, check the user space value
If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.
If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.
Reduces text size by 128 bytes on x8664.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Cc: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the code which attaches the first
waiter to the owner into a separate function. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the waiter verification into a separate
function. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
No point in open coding the same function again.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kernel tries to atomically unlock the futex without checking
whether there is kernel state associated to the futex.
So if user space manipulated the user space value, this will leave
kernel internal state around associated to the owner task.
For robustness sake, lookup first whether there are waiters on the
futex. If there are waiters, wake the top priority waiter with all the
proper sanity checks applied.
If there are no waiters, do the atomic release. We do not have to
preserve the waiters bit in this case, because a potentially incoming
waiter is blocked on the hb->lock and will acquire the futex
atomically. We neither have to preserve the owner died bit. The caller
is the owner and it was supposed to cleanup the mess.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The deadlock logic is only required for futexes.
Remove the extra arguments for the public functions and also for the
futex specific ones which get always called with deadlock detection
enabled.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.
* accumulated work in next: (6809 commits)
ufs: sb mutex merge + mutex_destroy
powerpc: update comments for generic idle conversion
cris: update comments for generic idle conversion
idle: remove cpu_idle() forward declarations
nbd: zero from and len fields in NBD_CMD_DISCONNECT.
mm: convert some level-less printks to pr_*
MAINTAINERS: adi-buildroot-devel is moderated
MAINTAINERS: add linux-api for review of API/ABI changes
mm/kmemleak-test.c: use pr_fmt for logging
fs/dlm/debug_fs.c: replace seq_printf by seq_puts
fs/dlm/lockspace.c: convert simple_str to kstr
fs/dlm/config.c: convert simple_str to kstr
mm: mark remap_file_pages() syscall as deprecated
mm: memcontrol: remove unnecessary memcg argument from soft limit functions
mm: memcontrol: clean up memcg zoneinfo lookup
mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
mm/mempool.c: update the kmemleak stack trace for mempool allocations
lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
mm: introduce kmemleak_update_trace()
mm/kmemleak.c: use %u to print ->checksum
...
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex. We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.
The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address. This can lead to state leakage and worse under some
circumstances.
Handle the cases explicit:
Waiter | pi_state | pi->owner | uTID | uODIED | ?
[1] NULL | --- | --- | 0 | 0/1 | Valid
[2] NULL | --- | --- | >0 | 0/1 | Valid
[3] Found | NULL | -- | Any | 0/1 | Invalid
[4] Found | Found | NULL | 0 | 1 | Valid
[5] Found | Found | NULL | >0 | 1 | Invalid
[6] Found | Found | task | 0 | 1 | Valid
[7] Found | Found | NULL | Any | 0 | Invalid
[8] Found | Found | task | ==taskTID | 0/1 | Valid
[9] Found | Found | task | 0 | 0 | Invalid
[10] Found | Found | task | !=taskTID | 0/1 | Invalid
[1] Indicates that the kernel can acquire the futex atomically. We
came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.
[2] Valid, if TID does not belong to a kernel thread. If no matching
thread is found then it indicates that the owner TID has died.
[3] Invalid. The waiter is queued on a non PI futex
[4] Valid state after exit_robust_list(), which sets the user space
value to FUTEX_WAITERS | FUTEX_OWNER_DIED.
[5] The user space value got manipulated between exit_robust_list()
and exit_pi_state_list()
[6] Valid state after exit_pi_state_list() which sets the new owner in
the pi_state but cannot access the user space value.
[7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.
[8] Owner and user space value match
[9] There is no transient state which sets the user space TID to 0
except exit_robust_list(), but this is indicated by the
FUTEX_OWNER_DIED bit. See [4]
[10] There is no transient state which leaves owner and user space
TID out of sync.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex. So the owner TID of the current owner
(the unlocker) persists. That's observable inconsistant state,
especially when the ownership of the pi state got transferred.
Clean it up unconditionally.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.
Verify whether the futex has waiters associated with kernel state. If
it has, return -EINVAL. The state is corrupted already, so no point in
cleaning it up. Subsequent calls will fail as well. Not our problem.
[ tglx: Use futex_top_waiter() and explain why we do not need to try
restoring the already corrupted user space state. ]
Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call. If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.
This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")
[ tglx: Compare the resulting keys as well, as uaddrs might be
different depending on the mapping ]
Fixes CVE-2014-3153.
Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.
Found while analysing the fallout of Dave Jones syscall fuzzer.
We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Carlos ODonell <carlos@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com
That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
- it can detect that issue early
- it avoids a different and more complex fixup for backing out
If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.
The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Carlos ODonell <carlos@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Commits 11d4616bd0 ("futex: revert back to the explicit waiter
counting code") and 69cd9eba38 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.
The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Stancek reported:
"pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
occasionally fails, because some threads fail to wake up.
Testcase creates 5 threads, which are all waiting on same condition.
Main thread then calls pthread_cond_broadcast() without holding mutex,
which calls:
futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)
This immediately wakes up single thread A, which unlocks mutex and
tries to wake up another thread:
futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
If thread A manages to call futex_wake() before any waiters are
requeued for uaddr2, no other thread is woken up"
The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.
This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks. It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.
Reported-and-tested-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull core locking updates from Ingo Molnar:
"The biggest change is the MCS spinlock generalization changes from Tim
Chen, Peter Zijlstra, Jason Low et al. There's also lockdep
fixes/enhancements from Oleg Nesterov, in particular a false negative
fix related to lockdep_set_novalidate_class() usage"
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
locking/mutex: Fix debug checks
locking/mutexes: Add extra reschedule point
locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
locking/mutexes: Unlock the mutex without the wait_lock
locking/mutexes: Modify the way optimistic spinners are queued
locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
locking: Move mcs_spinlock.h into kernel/locking/
m68k: Skip futex_atomic_cmpxchg_inatomic() test
futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
lockdep: Change lockdep_set_novalidate_class() to use _and_name
lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
lockdep: Don't create the wrong dependency on hlock->check == 0
lockdep: Make held_lock->check and "int check" argument bool
locking/mcs: Allow architecture specific asm files to be used for contended case
locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
sched/wait: Suppress Sparse 'variable shadowing' warning
hung_task/Documentation: Fix hung_task_warnings description
locking/mcs: Allow architectures to hook in to contended paths
locking/mcs: Micro-optimize the MCS code, add extra comments
...
Srikar Dronamraju reports that commit b0c29f79ec ("futexes: Avoid
taking the hb->lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.
The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.
So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.
Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Original-code-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>