Commit graph

263 commits

Author SHA1 Message Date
Blagovest Kolenichev
b47135257c Merge branch 'android-4.4@c71ad0f' into branch 'msm-4.4'
* refs/heads/tmp-c71ad0f:
  BACKPORT: arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
  staging: android: ashmem: lseek failed due to no FMODE_LSEEK.
  ANDROID: sdcardfs: update module info
  ANDROID: sdcardfs: use d_splice_alias
  ANDROID: sdcardfs: add read_iter/write_iter opeations
  ANDROID: sdcardfs: fix ->llseek to update upper and lower offset
  ANDROID: sdcardfs: copy lower inode attributes in ->ioctl
  ANDROID: sdcardfs: remove unnecessary call to do_munmap
  Merge 4.4.59 into android-4.4
  UPSTREAM: ipv6 addrconf: implement RFC7559 router solicitation backoff
  android: base-cfg: enable CONFIG_INET_DIAG_DESTROY
  ANDROID: android-base.cfg: add CONFIG_MODULES option
  ANDROID: android-base.cfg: add CONFIG_IKCONFIG option
  ANDROID: android-base.cfg: properly sort the file
  ANDROID: binder: add hwbinder,vndbinder to BINDER_DEVICES.
  ANDROID: sort android-recommended.cfg
  UPSTREAM: config/android: Remove CONFIG_IPV6_PRIVACY
  UPSTREAM: config: android: set SELinux as default security mode
  config: android: move device mapper options to recommended
  ANDROID: ARM64: Allow to choose appended kernel image
  UPSTREAM: arm64: vdso: constify vm_special_mapping used for aarch32 vectors page
  UPSTREAM: arm64: vdso: add __init section marker to alloc_vectors_page
  UPSTREAM: ARM: 8597/1: VDSO: put RO and RO after init objects into proper sections
  UPSTREAM: arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime() vDSO
  UPSTREAM: arm64: Refactor vDSO time functions
  UPSTREAM: arm64: fix vdso-offsets.h dependency
  UPSTREAM: kbuild: drop FORCE from PHONY targets
  UPSTREAM: mm: add PHYS_PFN, use it in __phys_to_pfn()
  UPSTREAM: ARM: 8476/1: VDSO: use PTR_ERR_OR_ZERO for vma check
  Linux 4.4.58
  crypto: algif_hash - avoid zero-sized array
  fbcon: Fix vc attr at deinit
  serial: 8250_pci: Detach low-level driver during PCI error recovery
  ACPI / blacklist: Make Dell Latitude 3350 ethernet work
  ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520
  uvcvideo: uvc_scan_fallback() for webcams with broken chain
  s390/zcrypt: Introduce CEX6 toleration
  block: allow WRITE_SAME commands with the SG_IO ioctl
  vfio/spapr: Postpone allocation of userspace version of TCE table
  PCI: Do any VF BAR updates before enabling the BARs
  PCI: Ignore BAR updates on virtual functions
  PCI: Update BARs using property bits appropriate for type
  PCI: Don't update VF BARs while VF memory space is enabled
  PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
  PCI: Add comments about ROM BAR updating
  PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
  PCI: Separate VF BAR updates from standard BAR updates
  x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
  igb: add i211 to i210 PHY workaround
  igb: Workaround for igb i210 firmware issue
  xen: do not re-use pirq number cached in pci device msi msg data
  xfs: clear _XBF_PAGES from buffers when readahead page
  USB: usbtmc: add missing endpoint sanity check
  nl80211: fix dumpit error path RTNL deadlocks
  xfs: fix up xfs_swap_extent_forks inline extent handling
  xfs: don't allow di_size with high bit set
  libceph: don't set weight to IN when OSD is destroyed
  raid10: increment write counter after bio is split
  cpufreq: Restore policy min/max limits on CPU online
  ARM: dts: at91: sama5d2: add dma properties to UART nodes
  ARM: at91: pm: cpu_idle: switch DDR to power-down mode
  iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
  xen/acpi: upload PM state from init-domain to Xen
  mmc: sdhci: Do not disable interrupts while waiting for clock
  ext4: mark inode dirty after converting inline directory
  parport: fix attempt to write duplicate procfiles
  iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
  iio: adc: ti_am335x_adc: fix fifo overrun recovery
  mmc: ushc: fix NULL-deref at probe
  uwb: hwa-rc: fix NULL-deref at probe
  uwb: i1480-dfu: fix NULL-deref at probe
  usb: hub: Fix crash after failure to read BOS descriptor
  usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
  USB: wusbcore: fix NULL-deref at probe
  USB: idmouse: fix NULL-deref at probe
  USB: lvtest: fix NULL-deref at probe
  USB: uss720: fix NULL-deref at probe
  usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
  usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
  ACM gadget: fix endianness in notifications
  USB: serial: qcserial: add Dell DW5811e
  USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
  ALSA: hda - Adding a group of pin definition to fix headset problem
  ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
  ALSA: seq: Fix racy cell insertions during snd_seq_pool_done()
  Input: sur40 - validate number of endpoints before using them
  Input: kbtab - validate number of endpoints before using them
  Input: cm109 - validate number of endpoints before using them
  Input: yealink - validate number of endpoints before using them
  Input: hanwang - validate number of endpoints before using them
  Input: ims-pcu - validate number of endpoints before using them
  Input: iforce - validate number of endpoints before using them
  Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000
  Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw
  tcp: initialize icsk_ack.lrcvtime at session start time
  socket, bpf: fix sk_filter use after free in sk_clone_lock
  ipv4: provide stronger user input validation in nl_fib_input()
  net: bcmgenet: remove bcmgenet_internal_phy_setup()
  net/mlx5e: Count LRO packets correctly
  net/mlx5: Increase number of max QPs in default profile
  net: unix: properly re-increment inflight counter of GC discarded candidates
  amd-xgbe: Fix jumbo MTU processing on newer hardware
  net: properly release sk_frag.page
  net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled
  net/openvswitch: Set the ipv6 source tunnel key address attribute correctly
  Linux 4.4.57
  ext4: fix fencepost in s_first_meta_bg validation
  percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pages
  gfs2: Avoid alignment hole in struct lm_lockname
  isdn/gigaset: fix NULL-deref at probe
  target: Fix VERIFY_16 handling in sbc_parse_cdb
  scsi: libiscsi: add lock around task lists to fix list corruption regression
  scsi: lpfc: Add shutdown method for kexec
  target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export
  md/raid1/10: fix potential deadlock
  powerpc/boot: Fix zImage TOC alignment
  cpufreq: Fix and clean up show_cpuinfo_cur_freq()
  perf/core: Fix event inheritance on fork()
  give up on gcc ilog2() constant optimizations
  kernek/fork.c: allocate idle task for a CPU always on its local node
  hv_netvsc: use skb_get_hash() instead of a homegrown implementation
  tpm_tis: Use devm_free_irq not free_irq
  drm/amdgpu: add missing irq.h include
  s390/pci: fix use after free in dma_init
  KVM: PPC: Book3S PR: Fix illegal opcode emulation
  xen/qspinlock: Don't kick CPU if IRQ is not initialized
  Drivers: hv: avoid vfree() on crash
  Drivers: hv: balloon: don't crash when memory is added in non-sorted order
  pinctrl: cherryview: Do not mask all interrupts in probe
  ACPI / video: skip evaluating _DOD when it does not exist
  cxlflash: Increase cmd_per_lun for better throughput
  crypto: mcryptd - Fix load failure
  crypto: cryptd - Assign statesize properly
  crypto: ghash-clmulni - Fix load failure
  USB: don't free bandwidth_mutex too early
  usb: core: hub: hub_port_init lock controller instead of bus
  ANDROID: sdcardfs: Fix style issues in macros
  ANDROID: sdcardfs: Use seq_puts over seq_printf
  ANDROID: sdcardfs: Use to kstrout
  ANDROID: sdcardfs: Use pr_[...] instead of printk
  ANDROID: sdcardfs: remove unneeded null check
  ANDROID: sdcardfs: Fix style issues with comments
  ANDROID: sdcardfs: Fix formatting
  ANDROID: sdcardfs: correct order of descriptors
  fix the deadlock in xt_qtaguid when enable DDEBUG
  net: ipv6: Add sysctl for minimum prefix len acceptable in RIOs.
  Linux 4.4.56
  futex: Add missing error handling to FUTEX_REQUEUE_PI
  futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
  x86/perf: Fix CR4.PCE propagation to use active_mm instead of mm
  x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=y
  fscrypto: lock inode while setting encryption policy
  fscrypt: fix renaming and linking special files
  net sched actions: decrement module reference count after table flush.
  dccp: fix memory leak during tear-down of unsuccessful connection request
  dccp/tcp: fix routing redirect race
  bridge: drop netfilter fake rtable unconditionally
  ipv6: avoid write to a possibly cloned skb
  ipv6: make ECMP route replacement less greedy
  mpls: Send route delete notifications when router module is unloaded
  act_connmark: avoid crashing on malformed nlattrs with null parms
  uapi: fix linux/packet_diag.h userspace compilation error
  vrf: Fix use-after-free in vrf_xmit
  dccp: fix use-after-free in dccp_feat_activate_values
  net: fix socket refcounting in skb_complete_tx_timestamp()
  net: fix socket refcounting in skb_complete_wifi_ack()
  tcp: fix various issues for sockets morphing to listen state
  dccp: Unlock sock before calling sk_free()
  net: net_enable_timestamp() can be called from irq contexts
  net: don't call strlen() on the user buffer in packet_bind_spkt()
  l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv
  ipv4: mask tos for input route
  vti6: return GRE_KEY for vti6
  vxlan: correctly validate VXLAN ID against VXLAN_N_VID
  netlink: remove mmapped netlink support
  ANDROID: mmc: core: export emmc revision
  BACKPORT: mmc: core: Export device lifetime information through sysfs
  ANDROID: android-verity: do not compile as independent module
  ANDROID: sched: fix duplicate sched_group_energy const specifiers
  config: disable CONFIG_USELIB and CONFIG_FHANDLE
  ANDROID: power: align wakeup_sources format
  ANDROID: dm: android-verity: allow disable dm-verity for Treble VTS
  uid_sys_stats: change to use rt_mutex
  ANDROID: vfs: user permission2 in notify_change2
  ANDROID: sdcardfs: Fix gid issue
  ANDROID: sdcardfs: Use tabs instead of spaces in multiuser.h
  ANDROID: sdcardfs: Remove uninformative prints
  ANDROID: sdcardfs: move path_put outside of spinlock
  ANDROID: sdcardfs: Use case insensitive hash function
  ANDROID: sdcardfs: declare MODULE_ALIAS_FS
  ANDROID: sdcardfs: Get the blocksize from the lower fs
  ANDROID: sdcardfs: Use d_invalidate instead of drop_recurisve
  ANDROID: sdcardfs: Switch to internal case insensitive compare
  ANDROID: sdcardfs: Use spin_lock_nested
  ANDROID: sdcardfs: Replace get/put with d_lock
  ANDROID: sdcardfs: rate limit warning print
  ANDROID: sdcardfs: Fix case insensitive lookup
  ANDROID: uid_sys_stats: account for fsync syscalls
  ANDROID: sched: add a counter to track fsync
  ANDROID: uid_sys_stats: fix negative write bytes.
  ANDROID: uid_sys_stats: allow writing same state
  ANDROID: uid_sys_stats: rename uid_cputime.c to uid_sys_stats.c
  ANDROID: uid_cputime: add per-uid IO usage accounting
  DTB: Add EAS compatible Juno Energy model to 'juno.dts'
  arm64: dts: juno: Add idle-states to device tree
  ANDROID: Replace spaces by '_' for some android filesystem tracepoints.
  usb: gadget: f_accessory: Fix for UsbAccessory clean unbind.
  android: binder: move global binder state into context struct.
  android: binder: add padding to binder_fd_array_object.
  binder: use group leader instead of open thread
  nf: IDLETIMER: Use fullsock when querying uid
  nf: IDLETIMER: Fix use after free condition during work
  ANDROID: dm: android-verity: fix table_make_digest() error handling
  ANDROID: usb: gadget: function: Fix commenting style
  cpufreq: interactive governor drops bits in time calculation
  ANDROID: sdcardfs: support direct-IO (DIO) operations
  ANDROID: sdcardfs: implement vm_ops->page_mkwrite
  ANDROID: sdcardfs: Don't bother deleting freelist
  ANDROID: sdcardfs: Add missing path_put
  ANDROID: sdcardfs: Fix incorrect hash
  ANDROID: ext4 crypto: Disables zeroing on truncation when there's no key
  ANDROID: ext4: add a non-reversible key derivation method
  ANDROID: ext4: allow encrypting filenames using HEH algorithm
  ANDROID: arm64/crypto: add ARMv8-CE optimized poly_hash algorithm
  ANDROID: crypto: heh - factor out poly_hash algorithm
  ANDROID: crypto: heh - Add Hash-Encrypt-Hash (HEH) algorithm
  ANDROID: crypto: gf128mul - Add ble multiplication functions
  ANDROID: crypto: gf128mul - Refactor gf128 overflow macros and tables
  UPSTREAM: crypto: gf128mul - Zero memory when freeing multiplication table
  ANDROID: crypto: shash - Add crypto_grab_shash() and crypto_spawn_shash_alg()
  ANDROID: crypto: allow blkcipher walks over ablkcipher data
  UPSTREAM: arm/arm64: crypto: assure that ECB modes don't require an IV
  ANDROID: Refactor fs readpage/write tracepoints.
  ANDROID: export security_path_chown
  Squashfs: optimize reading uncompressed data
  Squashfs: implement .readpages()
  Squashfs: replace buffer_head with BIO
  Squashfs: refactor page_actor
  Squashfs: remove the FILE_CACHE option
  ANDROID: android-recommended.cfg: CONFIG_CPU_SW_DOMAIN_PAN=y
  FROMLIST: 9p: fix a potential acl leak
  BACKPORT: posix_acl: Clear SGID bit when setting file permissions
  UPSTREAM: udp: properly support MSG_PEEK with truncated buffers
  UPSTREAM: arm64: Allow hw watchpoint of length 3,5,6 and 7
  BACKPORT: arm64: hw_breakpoint: Handle inexact watchpoint addresses
  UPSTREAM: arm64: Allow hw watchpoint at varied offset from base address
  BACKPORT: hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
  ANDROID: sdcardfs: Switch strcasecmp for internal call
  ANDROID: sdcardfs: switch to full_name_hash and qstr
  ANDROID: sdcardfs: Add GID Derivation to sdcardfs
  ANDROID: sdcardfs: Remove redundant operation
  ANDROID: sdcardfs: add support for user permission isolation
  ANDROID: sdcardfs: Refactor configfs interface
  ANDROID: sdcardfs: Allow non-owners to touch
  ANDROID: binder: fix format specifier for type binder_size_t
  ANDROID: fs: Export vfs_rmdir2
  ANDROID: fs: Export free_fs_struct and set_fs_pwd
  BACKPORT: Input: xpad - validate USB endpoint count during probe
  BACKPORT: Input: xpad - fix oops when attaching an unknown Xbox One gamepad
  ANDROID: mnt: remount should propagate to slaves of slaves
  ANDROID: sdcardfs: Switch ->d_inode to d_inode()
  ANDROID: sdcardfs: Fix locking issue with permision fix up
  ANDROID: sdcardfs: Change magic value
  ANDROID: sdcardfs: Use per mount permissions
  ANDROID: sdcardfs: Add gid and mask to private mount data
  ANDROID: sdcardfs: User new permission2 functions
  ANDROID: vfs: Add setattr2 for filesystems with per mount permissions
  ANDROID: vfs: Add permission2 for filesystems with per mount permissions
  ANDROID: vfs: Allow filesystems to access their private mount data
  ANDROID: mnt: Add filesystem private data to mount points
  ANDROID: sdcardfs: Move directory unlock before touch
  ANDROID: sdcardfs: fix external storage exporting incorrect uid
  ANDROID: sdcardfs: Added top to sdcardfs_inode_info
  ANDROID: sdcardfs: Switch package list to RCU
  ANDROID: sdcardfs: Fix locking for permission fix up
  ANDROID: sdcardfs: Check for other cases on path lookup
  ANDROID: sdcardfs: override umask on mkdir and create
  arm64: kernel: Fix build warning
  DEBUG: sched/fair: Fix sched_load_avg_cpu events for task_groups
  DEBUG: sched/fair: Fix missing sched_load_avg_cpu events
  UPSTREAM: l2tp: fix racy SOCK_ZAPPED flag check in l2tp_ip{,6}_bind()
  UPSTREAM: packet: fix race condition in packet_set_ring
  UPSTREAM: netlink: Fix dump skb leak/double free
  UPSTREAM: net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
  MIPS: Prevent "restoration" of MSA context in non-MSA kernels
  net: socket: don't set sk_uid to garbage value in ->setattr()
  ANDROID: configs: CONFIG_ARM64_SW_TTBR0_PAN=y
  UPSTREAM: arm64: Disable PAN on uaccess_enable()
  UPSTREAM: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN
  UPSTREAM: arm64: xen: Enable user access before a privcmd hvc call
  UPSTREAM: arm64: Handle faults caused by inadvertent user access with PAN enabled
  BACKPORT: arm64: Disable TTBR0_EL1 during normal kernel execution
  BACKPORT: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1
  BACKPORT: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro
  BACKPORT: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros
  UPSTREAM: arm64: alternative: add auto-nop infrastructure
  UPSTREAM: arm64: barriers: introduce nops and __nops macros for NOP sequences
  Revert "FROMLIST: arm64: Factor out PAN enabling/disabling into separate uaccess_* macros"
  Revert "FROMLIST: arm64: Factor out TTBR0_EL1 post-update workaround into a specific asm macro"
  Revert "FROMLIST: arm64: Introduce uaccess_{disable,enable} functionality based on TTBR0_EL1"
  Revert "FROMLIST: arm64: Disable TTBR0_EL1 during normal kernel execution"
  Revert "FROMLIST: arm64: Handle faults caused by inadvertent user access with PAN enabled"
  Revert "FROMLIST: arm64: xen: Enable user access before a privcmd hvc call"
  Revert "FROMLIST: arm64: Enable CONFIG_ARM64_SW_TTBR0_PAN"
  ANDROID: sched/walt: fix build failure if FAIR_GROUP_SCHED=n
  ANDROID: trace: net: use %pK for kernel pointers
  ANDROID: android-base: Enable QUOTA related configs
  net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
  net: inet: Support UID-based routing in IP protocols.
  net: core: add UID to flows, rules, and routes
  net: core: Add a UID field to struct sock.
  Revert "net: core: Support UID-based routing."
  UPSTREAM: efi/arm64: Don't apply MEMBLOCK_NOMAP to UEFI memory map mapping
  UPSTREAM: arm64: mm: always take dirty state from new pte in ptep_set_access_flags
  UPSTREAM: arm64: Implement pmdp_set_access_flags() for hardware AF/DBM
  UPSTREAM: arm64: Fix typo in the pmdp_huge_get_and_clear() definition
  UPSTREAM: arm64: enable CONFIG_DEBUG_RODATA by default
  goldfish: enable CONFIG_INET_DIAG_DESTROY
  sched/walt: kill {min,max}_capacity
  sched: fix wrong truncation of walt_avg
  build: fix build config kernel_dir
  ANDROID: dm verity: add minimum prefetch size
  build: add build server configs for goldfish
  usb: gadget: Fix compilation problem with tx_qlen field

Conflicts:
	android/configs/android-base.cfg
	arch/arm64/Makefile
	arch/arm64/include/asm/cpufeature.h
	arch/arm64/kernel/vdso/gettimeofday.S
	arch/arm64/mm/cache.S
	drivers/md/Kconfig
	drivers/misc/Makefile
	drivers/mmc/host/sdhci.c
	drivers/usb/core/hcd.c
	drivers/usb/gadget/function/u_ether.c
	fs/sdcardfs/derived_perm.c
	fs/sdcardfs/file.c
	fs/sdcardfs/inode.c
	fs/sdcardfs/lookup.c
	fs/sdcardfs/main.c
	fs/sdcardfs/multiuser.h
	fs/sdcardfs/packagelist.c
	fs/sdcardfs/sdcardfs.h
	fs/sdcardfs/super.c
	include/linux/mmc/card.h
	include/linux/mmc/mmc.h
	include/trace/events/android_fs.h
	include/trace/events/android_fs_template.h
	drivers/android/binder.c
	fs/exec.c
	fs/ext4/crypto_key.c
	fs/ext4/ext4.h
	fs/ext4/inline.c
	fs/ext4/inode.c
	fs/ext4/readpage.c
	fs/f2fs/data.c
	fs/f2fs/inline.c
	fs/mpage.c
	include/linux/dcache.h
	include/trace/events/sched.h
	include/uapi/linux/ipv6.h
	net/ipv4/tcp_ipv4.c
	net/netfilter/xt_IDLETIMER.c

Change-Id: Ie345db6a14869fe0aa794aef4b71b5d0d503690b
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-04-20 15:19:15 -07:00
Blagovest Kolenichev
a4b9c109c2 Merge tag v4.4.55 into branch 'msm-4.4'
refs/heads/tmp-28ec98b:
  Linux 4.4.55
  ext4: don't BUG when truncating encrypted inodes on the orphan list
  dm: flush queued bios when process blocks to avoid deadlock
  nfit, libnvdimm: fix interleave set cookie calculation
  s390/kdump: Use "LINUX" ELF note name instead of "CORE"
  KVM: s390: Fix guest migration for huge guests resulting in panic
  mvsas: fix misleading indentation
  serial: samsung: Continue to work if DMA request fails
  USB: serial: io_ti: fix information leak in completion handler
  USB: serial: io_ti: fix NULL-deref in interrupt callback
  USB: iowarrior: fix NULL-deref in write
  USB: iowarrior: fix NULL-deref at probe
  USB: serial: omninet: fix reference leaks at open
  USB: serial: safe_serial: fix information leak in completion handler
  usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
  usb: host: xhci-dbg: HCIVERSION should be a binary number
  usb: gadget: function: f_fs: pass companion descriptor along
  usb: dwc3: gadget: make Set Endpoint Configuration macros safe
  usb: gadget: dummy_hcd: clear usb_gadget region before registration
  powerpc: Emulation support for load/store instructions on LE
  tracing: Add #undef to fix compile error
  MIPS: Netlogic: Fix CP0_EBASE redefinition warnings
  MIPS: DEC: Avoid la pseudo-instruction in delay slots
  mm: memcontrol: avoid unused function warning
  cpmac: remove hopeless #warning
  MIPS: ralink: Remove unused rt*_wdt_reset functions
  MIPS: ralink: Cosmetic change to prom_init().
  mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy
  MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change
  MIPS: ip22: Fix ip28 build for modern gcc
  MIPS: Update ip27_defconfig for SCSI_DH change
  MIPS: ip27: Disable qlge driver in defconfig
  MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change
  crypto: improve gcc optimization flags for serpent and wp512
  USB: serial: digi_acceleport: fix OOB-event processing
  USB: serial: digi_acceleport: fix OOB data sanity check
  Linux 4.4.54
  drivers: hv: Turn off write permission on the hypercall page
  fat: fix using uninitialized fields of fat_inode/fsinfo_inode
  libceph: use BUG() instead of BUG_ON(1)
  drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
  fakelb: fix schedule while atomic
  drm/atomic: fix an error code in mode_fixup()
  drm/ttm: Make sure BOs being swapped out are cacheable
  drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058
  drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS
  drm/ast: Call open_key before enable_mmio in POST code
  drm/ast: Fix test for VGA enabled
  drm/amdgpu: add more cases to DCE11 possible crtc mask setup
  mac80211: flush delayed work when entering suspend
  xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
  pwm: pca9685: Fix period change with same duty cycle
  nlm: Ensure callback code also checks that the files match
  target: Fix NULL dereference during LUN lookup + active I/O shutdown
  ceph: remove req from unsafe list when unregistering it
  ktest: Fix child exit code processing
  IB/srp: Fix race conditions related to task management
  IB/srp: Avoid that duplicate responses trigger a kernel bug
  IB/IPoIB: Add destination address when re-queue packet
  IB/ipoib: Fix deadlock between rmmod and set_mode
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put()
  s390: use correct input data address for setup_randomness
  s390: make setup_randomness work
  s390: TASK_SIZE for kernel threads
  s390/dcssblk: fix device size calculation in dcssblk_direct_access()
  s390/qdio: clear DSCI prior to scanning multiple input queues
  Bluetooth: Add another AR3012 04ca:3018 device
  KVM: VMX: use correct vmcs_read/write for guest segment selector/base
  KVM: s390: Disable dirty log retrieval for UCONTROL guests
  serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards
  tty: n_hdlc: get rid of racy n_hdlc.tbuf
  TTY: n_hdlc, fix lockdep false positive
  Linux 4.4.53
  scsi: lpfc: Correct WQ creation for pagesize
  MIPS: IP22: Fix build error due to binutils 2.25 uselessnes.
  MIPS: IP22: Reformat inline assembler code to modern standards.
  powerpc/xmon: Fix data-breakpoint
  dmaengine: ipu: Make sure the interrupt routine checks all interrupts.
  bcma: use (get|put)_device when probing/removing device driver
  md linear: fix a race between linear_add() and linear_congested()
  rtc: sun6i: Switch to the external oscillator
  rtc: sun6i: Add some locking
  NFSv4: fix getacl ERANGE for some ACL buffer sizes
  NFSv4: fix getacl head length estimation
  NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
  nfsd: special case truncates some more
  nfsd: minor nfsd_setattr cleanup
  rtlwifi: rtl8192c-common: Fix "BUG: KASAN:
  rtlwifi: Fix alignment issues
  gfs2: Add missing rcu locking for glock lookup
  rdma_cm: fail iwarp accepts w/o connection params
  RDMA/core: Fix incorrect structure packing for booleans
  Drivers: hv: util: Backup: Fix a rescind processing issue
  Drivers: hv: util: Fcopy: Fix a rescind processing issue
  Drivers: hv: util: kvp: Fix a rescind processing issue
  hv: init percpu_list in hv_synic_alloc()
  hv: allocate synic pages for all present CPUs
  usb: gadget: udc: fsl: Add missing complete function.
  usb: host: xhci: plat: check hcc_params after add hcd
  usb: musb: da8xx: Remove CPPI 3.0 quirk and methods
  w1: ds2490: USB transfer buffers need to be DMAable
  w1: don't leak refcount on slave attach failure in w1_attach_slave_device()
  can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
  iio: pressure: mpl3115: do not rely on structure field ordering
  iio: pressure: mpl115: do not rely on structure field ordering
  arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2
  fuse: add missing FR_FORCE
  crypto: testmgr - Pad aes_ccm_enc_tv_template vector
  ath9k: use correct OTP register offsets for the AR9340 and AR9550
  ath9k: fix race condition in enabling/disabling IRQs
  ath5k: drop bogus warning on drv_set_key with unsupported cipher
  target: Fix multi-session dynamic se_node_acl double free OOPs
  target: Obtain se_node_acl->acl_kref during get_initiator_node_acl
  samples/seccomp: fix 64-bit comparison macros
  ext4: return EROFS if device is r/o and journal replay is needed
  ext4: preserve the needs_recovery flag when the journal is aborted
  ext4: fix inline data error paths
  ext4: fix data corruption in data=journal mode
  ext4: trim allocation requests to group size
  ext4: do not polute the extents cache while shifting extents
  ext4: Include forgotten start block on fallocate insert range
  loop: fix LO_FLAGS_PARTSCAN hang
  block/loop: fix race between I/O and set_status
  jbd2: don't leak modified metadata buffers on an aborted journal
  Fix: Disable sys_membarrier when nohz_full is enabled
  sd: get disk reference in sd_check_events()
  scsi: use 'scsi_device_from_queue()' for scsi_dh
  scsi: aacraid: Reorder Adapter status check
  scsi: storvsc: properly set residual data length on errors
  scsi: storvsc: properly handle SRB_ERROR when sense message is present
  scsi: storvsc: use tagged SRB requests if supported by the device
  dm stats: fix a leaked s->histogram_boundaries array
  dm cache: fix corruption seen when using cache > 2TB
  ipc/shm: Fix shmat mmap nil-page protection
  mm: do not access page->mapping directly on page_endio
  mm: vmpressure: fix sending wrong events on underflow
  mm/page_alloc: fix nodes for reclaim in fast path
  iommu/vt-d: Tylersburg isoch identity map check is done too late.
  iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu
  regulator: Fix regulator_summary for deviceless consumers
  staging: rtl: fix possible NULL pointer dereference
  ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine
  ALSA: hda - Add subwoofer support for Dell Inspiron 17 7000 Gaming
  ALSA: seq: Fix link corruption by event error handling
  ALSA: ctxfi: Fallback DMA mask to 32bit
  ALSA: timer: Reject user params with too small ticks
  ALSA: hda - fix Lewisburg audio issue
  ALSA: hda/realtek - Cannot adjust speaker's volume on a Dell AIO
  ARM: dts: at91: Enable DMA on sama5d2_xplained console
  ARM: dts: at91: Enable DMA on sama5d4_xplained console
  ARM: at91: define LPDDR types
  media: fix dm1105.c build error
  uvcvideo: Fix a wrong macro
  am437x-vpfe: always assign bpp variable
  MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
  MIPS: Calculate microMIPS ra properly when unwinding the stack
  MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
  MIPS: Fix get_frame_info() handling of microMIPS function size
  MIPS: Prevent unaligned accesses during stack unwinding
  MIPS: Clear ISA bit correctly in get_frame_info()
  MIPS: Lantiq: Keep ethernet enabled during boot
  MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
  MIPS: BCM47XX: Fix button inversion for Asus WL-500W
  MIPS: Fix special case in 64 bit IP checksumming.
  samples: move mic/mpssd example code from Documentation
  Linux 4.4.52
  kvm: vmx: ensure VMCS is current while enabling PML
  Revert "usb: chipidea: imx: enable CI_HDRC_SET_NON_ZERO_TTHA"
  rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down
  block: fix double-free in the failure path of cgwb_bdi_init()
  goldfish: Sanitize the broken interrupt handler
  x86/platform/goldfish: Prevent unconditional loading
  USB: serial: ark3116: fix register-accessor error handling
  USB: serial: opticon: fix CTS retrieval at open
  USB: serial: spcp8x5: fix modem-status handling
  USB: serial: ftdi_sio: fix line-status over-reporting
  USB: serial: ftdi_sio: fix extreme low-latency setting
  USB: serial: ftdi_sio: fix modem-status error handling
  USB: serial: cp210x: add new IDs for GE Bx50v3 boards
  USB: serial: mos7840: fix another NULL-deref at open
  tty: serial: msm: Fix module autoload
  net: socket: fix recvmmsg not returning error from sock_error
  ip: fix IP_CHECKSUM handling
  irda: Fix lockdep annotations in hashbin_delete().
  dccp: fix freeing skb too early for IPV6_RECVPKTINFO
  packet: Do not call fanout_release from atomic contexts
  packet: fix races in fanout_add()
  net/llc: avoid BUG_ON() in skb_orphan()
  blk-mq: really fix plug list flushing for nomerge queues
  rtc: interface: ignore expired timers when enqueuing new timers
  rtlwifi: rtl_usb: Fix missing entry in USB driver's private data
  Linux 4.4.51
  mmc: core: fix multi-bit bus width without high-speed mode
  bcache: Make gc wakeup sane, remove set_task_state()
  ntb_transport: Pick an unused queue
  NTB: ntb_transport: fix debugfs_remove_recursive
  printk: use rcuidle console tracepoint
  ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
  futex: Move futex_init() to core_initcall
  drm/dp/mst: fix kernel oops when turning off secondary monitor
  drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
  Input: elan_i2c - add ELAN0605 to the ACPI table
  Fix missing sanity check in /dev/sg
  scsi: don't BUG_ON() empty DMA transfers
  fuse: fix use after free issue in fuse_dev_do_read()
  siano: make it work again with CONFIG_VMAP_STACK
  vfs: fix uninitialized flags in splice_to_pipe()
  Linux 4.4.50
  l2tp: do not use udp_ioctl()
  ping: fix a null pointer dereference
  packet: round up linear to header len
  net: introduce device min_header_len
  sit: fix a double free on error path
  sctp: avoid BUG_ON on sctp_wait_for_sndbuf
  mlx4: Invoke softirqs after napi_reschedule
  macvtap: read vnet_hdr_size once
  tun: read vnet_hdr_sz once
  tcp: avoid infinite loop in tcp_splice_read()
  ipv6: tcp: add a missing tcp_v6_restore_cb()
  ip6_gre: fix ip6gre_err() invalid reads
  netlabel: out of bound access in cipso_v4_validate()
  ipv4: keep skb->dst around in presence of IP options
  net: use a work queue to defer net_disable_timestamp() work
  tcp: fix 0 divide in __tcp_select_window()
  ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
  ipv6: fix ip6_tnl_parse_tlv_enc_lim()
  can: Fix kernel panic at security_sock_rcv_skb

Conflicts:
	drivers/scsi/sd.c
	drivers/usb/gadget/function/f_fs.c
	drivers/usb/host/xhci-plat.c

CRs-Fixed: 2023471
Change-Id: I396051a8de30271af77b3890d4b19787faa1c31e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-03-23 03:22:14 -07:00
Peter Zijlstra
99d403faba futex: Add missing error handling to FUTEX_REQUEUE_PI
commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream.

Thomas spotted that fixup_pi_state_owner() can return errors and we
fail to unlock the rt_mutex in that case.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22 12:04:19 +01:00
Peter Zijlstra
44854c191e futex: Fix potential use-after-free in FUTEX_REQUEUE_PI
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream.

While working on the futex code, I stumbled over this potential
use-after-free scenario. Dmitry triggered it later with syzkaller.

pi_mutex is a pointer into pi_state, which we drop the reference on in
unqueue_me_pi(). So any access to that pointer after that is bad.

Since other sites already do rt_mutex_unlock() with hb->lock held, see
for example futex_lock_pi(), simply move the unlock before
unqueue_me_pi().

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22 12:04:19 +01:00
Yang Yang
e6394c7d1c futex: Move futex_init() to core_initcall
commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream.

The UEVENT user mode helper is enabled before the initcalls are executed
and is available when the root filesystem has been mounted.

The user mode helper is triggered by device init calls and the executable
might use the futex syscall.

futex_init() is marked __initcall which maps to device_initcall, but there
is no guarantee that futex_init() is invoked _before_ the first device init
call which triggers the UEVENT user mode helper.

If the user mode helper uses the futex syscall before futex_init() then the
syscall crashes with a NULL pointer dereference because the futex subsystem
has not been initialized yet.

Move futex_init() to core_initcall so futexes are initialized before the
root filesystem is mounted and the usermode helper becomes available.

[ tglx: Rewrote changelog ]

Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Cc: jiang.biao2@zte.com.cn
Cc: jiang.zhengxiong@zte.com.cn
Cc: zhong.weidong@zte.com.cn
Cc: deng.huali@zte.com.cn
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-02-23 17:43:09 +01:00
Linus Torvalds
41a69b502d x86: remove more uaccess_32.h complexity
I'm looking at trying to possibly merge the 32-bit and 64-bit versions
of the x86 uaccess.h implementation, but first this needs to be cleaned
up.

For example, the 32-bit version of "__copy_from_user_inatomic()" is
mostly the special cases for the constant size, and it's actually almost
never relevant.  Most users aren't actually using a constant size
anyway, and the few cases that do small constant copies are better off
just using __get_user() instead.

So get rid of the unnecessary complexity.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit bd28b14591b98f696bc9f94c5ba2e598ca487dfd)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
2016-08-27 11:23:38 +08:00
Davidlohr Bueso
ad4b209d19 futex: Acknowledge a new waiter in counter before plist
commit fe1bce9e2107ba3a8faffe572483b6974201a0e6 upstream.

Otherwise an incoming waker on the dest hash bucket can miss
the waiter adding itself to the plist during the lockless
check optimization (small window but still the correct way
of doing this); similarly to the decrement counterpart.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: bigeasy@linutronix.de
Cc: dvhart@infradead.org
Link: http://lkml.kernel.org/r/1461208164-29150-1-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:43 -07:00
Sebastian Andrzej Siewior
61fc0ae42c futex: Handle unlock_pi race gracefully
commit 89e9e66ba1b3bde9d8ea90566c2aee20697ad681 upstream.

If userspace calls UNLOCK_PI unconditionally without trying the TID -> 0
transition in user space first then the user space value might not have the
waiters bit set. This opens the following race:

CPU0	    	      	    CPU1
uval = get_user(futex)
			    lock(hb)
lock(hb)
			    futex |= FUTEX_WAITERS
			    ....
			    unlock(hb)

cmpxchg(futex, uval, newval)

So the cmpxchg fails and returns -EINVAL to user space, which is wrong because
the futex value is valid.

To handle this (yes, yet another) corner case gracefully, check for a flag
change and retry.

[ tglx: Massaged changelog and slightly reworked implementation ]

Fixes: ccf9e6a80d ("futex: Make unlock_pi more robust")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1460723739-5195-1-git-send-email-bigeasy@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:43 -07:00
Thomas Gleixner
acaf84251f futex: Drop refcount if requeue_pi() acquired the rtmutex
commit fb75a4282d0d9a3c7c44d940582c2d226cf3acfb upstream.

If the proxy lock in the requeue loop acquires the rtmutex for a
waiter then it acquired also refcount on the pi_state related to the
futex, but the waiter side does not drop the reference count.

Add the missing free_pi_state() call.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Bhuvanesh_Surachari@mentor.com
Cc: Andy Lowe <Andy_Lowe@mentor.com>
Link: http://lkml.kernel.org/r/20151219200607.178132067@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:23 -08:00
Jann Horn
969624b7c1 ptrace: use fsuid, fsgid, effective creds for fs access checks
commit caaee6234d05a58c5b4d05e7bf766131b810a657 upstream.

By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.

To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.

The problem was that when a privileged task had temporarily dropped its
privileges, e.g.  by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.

While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.

In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:

 /proc/$pid/stat - uses the check for determining whether pointers
     should be visible, useful for bypassing ASLR
 /proc/$pid/maps - also useful for bypassing ASLR
 /proc/$pid/cwd - useful for gaining access to restricted
     directories that contain files with lax permissions, e.g. in
     this scenario:
     lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
     drwx------ root root /root
     drwxr-xr-x root root /root/foobar
     -rw-r--r-- root root /root/foobar/secret

Therefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).

[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:16 -08:00
Linus Torvalds
e880e87488 driver core update for 4.4-rc1
Here's the "big" driver core updates for 4.4-rc1.  Primarily a bunch of
 debugfs updates, with a smattering of minor driver core fixes and
 updates as well.
 
 All have been in linux-next for a long time.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlY6ePQACgkQMUfUDdst+ymNTgCgpP0CZw57GpwF/Hp2L/lMkVeo
 Kx8AoKhEi4iqD5fdCQS9qTfomB+2/M6g
 =g7ZO
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here's the "big" driver core updates for 4.4-rc1.  Primarily a bunch
  of debugfs updates, with a smattering of minor driver core fixes and
  updates as well.

  All have been in linux-next for a long time"

* tag 'driver-core-4.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  debugfs: Add debugfs_create_ulong()
  of: to support binding numa node to specified device in devicetree
  debugfs: Add read-only/write-only bool file ops
  debugfs: Add read-only/write-only size_t file ops
  debugfs: Add read-only/write-only x64 file ops
  debugfs: Consolidate file mode checks in debugfs_create_*()
  Revert "mm: Check if section present during memory block (un)registering"
  driver-core: platform: Provide helpers for multi-driver modules
  mm: Check if section present during memory block (un)registering
  devres: fix a for loop bounds check
  CMA: fix CONFIG_CMA_SIZE_MBYTES overflow in 64bit
  base/platform: assert that dev_pm_domain callbacks are called unconditionally
  sysfs: correctly handle short reads on PREALLOC attrs.
  base: soc: siplify ida usage
  kobject: move EXPORT_SYMBOL() macros next to corresponding definitions
  kobject: explain what kobject's sd field is
  debugfs: document that debugfs_remove*() accepts NULL and error values
  debugfs: Pass bool pointer to debugfs_create_bool()
  ACPI / EC: Fix broken 64bit big-endian users of 'global_lock'
2015-11-04 21:50:37 -08:00
Viresh Kumar
621a5f7ad9 debugfs: Pass bool pointer to debugfs_create_bool()
Its a bit odd that debugfs_create_bool() takes 'u32 *' as an argument,
when all it needs is a boolean pointer.

It would be better to update this API to make it accept 'bool *'
instead, as that will make it more consistent and often more convenient.
Over that bool takes just a byte.

That required updates to all user sites as well, in the same commit
updating the API. regmap core was also using
debugfs_{read|write}_file_bool(), directly and variable types were
updated for that to be bool as well.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Mark Brown <broonie@kernel.org>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-10-04 11:36:07 +01:00
Rasmus Villemoes
ac742d3718 futex: Force hot variables into a single cache line
futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.

We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other (and even if they do, they
may still end up in different cache lines). So put the two variables
in a small singleton struct with sufficient alignment and mark that as
__read_mostly.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 16:23:15 +02:00
kbuild test robot
5d285a7f35 futex: Make should_fail_futex() static
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: kbuild-all@01.org
Cc: tipbuild@zytor.com
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Brian Silverman <bsilver16384@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 21:43:54 +02:00
Davidlohr Bueso
ab51fbab39 futex: Fault/error injection capabilities
Although futexes are well known for being a royal pita,
we really have very little debugging capabilities - except
for relying on tglx's eye half the time.

By simply making use of the existing fault-injection machinery,
we can improve this situation, allowing generating artificial
uaddress faults and deadlock scenarios. Of course, when this is
disabled in production systems, the overhead for failure checks
is practically zero -- so this is very cheap at the same time.
Future work would be nice to now enhance trinity to make use of
this.

There is a special tunable 'ignore-private', which can filter
out private futexes. Given the tsk->make_it_fail filter and
this option, pi futexes can be narrowed down pretty closely.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 11:45:45 +02:00
Davidlohr Bueso
767f509ca1 futex: Enhance comments in futex_lock_pi() for blocking paths
... serves a bit better to clarify between blocking
and non-blocking code paths.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/1435645562-975-2-git-send-email-dave@stgolabs.net
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-07-20 11:45:45 +02:00
Linus Torvalds
a262948335 Merge branch 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Thomas Gleixner:
 "These locking updates depend on the alreay merged sched/core branch:

   - Lockless top waiter wakeup for rtmutex (Davidlohr)

   - Reduce hash bucket lock contention for PI futexes (Sebastian)

   - Documentation update (Davidlohr)"

* 'sched-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/rtmutex: Update stale plist comments
  futex: Lower the lock contention on the HB lock during wake up
  locking/rtmutex: Implement lockless top-waiter wakeup
2015-06-24 14:46:01 -07:00
Linus Torvalds
43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
Linus Torvalds
23b7776290 Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
 "The main changes are:

   - lockless wakeup support for futexes and IPC message queues
     (Davidlohr Bueso, Peter Zijlstra)

   - Replace spinlocks with atomics in thread_group_cputimer(), to
     improve scalability (Jason Low)

   - NUMA balancing improvements (Rik van Riel)

   - SCHED_DEADLINE improvements (Wanpeng Li)

   - clean up and reorganize preemption helpers (Frederic Weisbecker)

   - decouple page fault disabling machinery from the preemption
     counter, to improve debuggability and robustness (David
     Hildenbrand)

   - SCHED_DEADLINE documentation updates (Luca Abeni)

   - topology CPU masks cleanups (Bartosz Golaszewski)

   - /proc/sched_debug improvements (Srikar Dronamraju)"

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (79 commits)
  sched/deadline: Remove needless parameter in dl_runtime_exceeded()
  sched: Remove superfluous resetting of the p->dl_throttled flag
  sched/deadline: Drop duplicate init_sched_dl_class() declaration
  sched/deadline: Reduce rq lock contention by eliminating locking of non-feasible target
  sched/deadline: Make init_sched_dl_class() __init
  sched/deadline: Optimize pull_dl_task()
  sched/preempt: Add static_key() to preempt_notifiers
  sched/preempt: Fix preempt notifiers documentation about hlist_del() within unsafe iteration
  sched/stop_machine: Fix deadlock between multiple stop_two_cpus()
  sched/debug: Add sum_sleep_runtime to /proc/<pid>/sched
  sched/debug: Replace vruntime with wait_sum in /proc/sched_debug
  sched/debug: Properly format runnable tasks in /proc/sched_debug
  sched/numa: Only consider less busy nodes as numa balancing destinations
  Revert 095bebf61a ("sched/numa: Do not move past the balance point if unbalanced")
  sched/fair: Prevent throttling in early pick_next_task_fair()
  preempt: Reorganize the notrace definitions a bit
  preempt: Use preempt_schedule_context() as the official tracing preemption point
  sched: Make preempt_schedule_context() function-tracing safe
  x86: Remove cpu_sibling_mask() and cpu_core_mask()
  x86: Replace cpu_**_mask() with topology_**_cpumask()
  ...
2015-06-22 15:52:04 -07:00
Sebastian Andrzej Siewior
802ab58da7 futex: Lower the lock contention on the HB lock during wake up
wake_futex_pi() wakes the task before releasing the hash bucket lock
(HB). The first thing the woken up task usually does is to acquire the
lock which requires the HB lock. On SMP Systems this leads to blocking
on the HB lock which is released by the owner shortly after.
This patch rearranges the unlock path by first releasing the HB lock and
then waking up the task.

[ tglx: Fixed up the rtmutex unlock path ]

Originally-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 21:26:38 +02:00
Peter Zijlstra
b92b8b35a2 locking/arch: Rename set_mb() to smp_store_mb()
Since set_mb() is really about an smp_mb() -- not a IO/DMA barrier
like mb() rename it to match the recent smp_load_acquire() and
smp_store_release().

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-19 08:32:00 +02:00
Davidlohr Bueso
1d0dcb3ad9 futex: Implement lockless wakeups
Given the overall futex architecture, any chance of reducing
hb->lock contention is welcome. In this particular case, using
wake-queues to enable lockless wakeups addresses very much real
world performance concerns, even cases of soft-lockups in cases
of large amounts of blocked tasks (which is not hard to find in
large boxes, using but just a handful of futex).

At the lowest level, this patch can reduce latency of a single thread
attempting to acquire hb->lock in highly contended scenarios by a
up to 2x. At lower counts of nr_wake there are no regressions,
confirming, of course, that the wake_q handling overhead is practically
non existent. For instance, while a fair amount of variation,
the extended pef-bench wakeup benchmark shows for a 20 core machine
the following avg per-thread time to wakeup its share of tasks:

	nr_thr	ms-before	ms-after
	16 	0.0590		0.0215
	32 	0.0396		0.0220
	48 	0.0417		0.0182
	64 	0.0536		0.0236
	80 	0.0414		0.0097
	96 	0.0672		0.0152

Naturally, this can cause spurious wakeups. However there is no core code
that cannot handle them afaict, and furthermore tglx does have the point
that other events can already trigger them anyway.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chris Mason <clm@fb.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: George Spelvin <linux@horizon.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1430494072-30283-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-08 12:21:40 +02:00
Thomas Gleixner
2e4b0d3fe8 futex: Remove bogus hrtimer_active() check
The check for hrtimer_active() after starting the timer is
pointless. If the timer is inactive it has expired already and
therefor the task pointer is already NULL.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/20150414203502.985825453@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-04-22 17:06:51 +02:00
Ingo Molnar
2ae7902681 Linux 34.0-rc1
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU6pFJAAoJEHm+PkMAQRiG2OwH/24nDK+l9zkaRs0xJsVh+qiW
 8A2N1od0ickz43iMk48jfeWGkFOkd4izyvan/daJshJOE1Y5lCdSs7jq/OXVOv9L
 G0+KQUoC5NL0hqYKn1XJPFluNQ1yqMvrDwQt99grDGzruNGBbwHuBhAQmgzpj1nU
 do8KrGjr7ft1Rzm4mOAdET/ExWiF+mRSJSxxOv598HbsIRdM5wgn0hHjPlqDxmLN
 KH4r3YYEm0cHyjf4Krse0+YdhqdamRGJlmYxJgEsYNwCoMwkmHlLTc71diseUhrg
 r/VYIYQvpAA6Yvgw8rJ0N5gk/sJJig+WyyPhfQuc2bD5sbL9eO7mPnz2UP7z7ss=
 =vXB6
 -----END PGP SIGNATURE-----

Merge tag 'v4.0-rc1' into locking/core, to refresh the tree before merging new changes

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-24 08:41:07 +01:00
Oleg Nesterov
a212946446 locking/futex: Check PF_KTHREAD rather than !p->mm to filter out kthreads
attach_to_pi_owner() checks p->mm to prevent attaching to kthreads and
this looks doubly wrong:

1. It should actually check PF_KTHREAD, kthread can do use_mm().

2. If this task is not kthread and it is actually the lock owner we can
   wrongly return -EPERM instead of -ESRCH or retry-if-EAGAIN.

   And note that this wrong EPERM is the likely case unless the exiting
   task is (auto)reaped quickly, we check ->mm before PF_EXITING.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jerome Marchand <jmarchan@redhat.com>
Cc: Larry Woodman <lwoodman@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mateusz Guzik <mguzik@redhat.com>
Link: http://lkml.kernel.org/r/20150202140536.GA26406@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 16:57:09 +01:00
Andy Lutomirski
f56141e3e2 all arches, signal: move restart_block to struct task_struct
If an attacker can cause a controlled kernel stack overflow, overwriting
the restart block is a very juicy exploit target.  This is because the
restart_block is held in the same memory allocation as the kernel stack.

Moving the restart block to struct task_struct prevents this exploit by
making the restart_block harder to locate.

Note that there are other fields in thread_info that are also easy
targets, at least on some architectures.

It's also a decent simplification, since the restart code is more or less
identical on all architectures.

[james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Acked-by: Richard Weinberger <richard@nod.at>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Steven Miao <realmz6@gmail.com>
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@ezchip.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:12 -08:00
Michael Kerrisk
996636ddae futex: Fix argument handling in futex_lock_pi() calls
This patch fixes two separate buglets in calls to futex_lock_pi():

  * Eliminate unused 'detect' argument
  * Change unused 'timeout' argument of FUTEX_TRYLOCK_PI to NULL

The 'detect' argument of futex_lock_pi() seems never to have been
used (when it was included with the initial PI mutex implementation
in Linux 2.6.18, all checks against its value were disabled by
ANDing against 0 (i.e., if (detect... && 0)), and with
commit 778e9a9c3e, any mention of
this argument in futex_lock_pi() went way altogether. Its presence
now serves only to confuse readers of the code, by giving the
impression that the futex() FUTEX_LOCK_PI operation actually does
use the 'val' argument. This patch removes the argument.

The futex_lock_pi() call that corresponds to FUTEX_TRYLOCK_PI includes
'timeout' as one of its arguments. This misleads the reader into thinking
that the FUTEX_TRYLOCK_PI operation does employ timeouts for some sensible
purpose; but it does not.  Indeed, it cannot, because the checks at the
start of sys_futex() exclude FUTEX_TRYLOCK_PI from the set of operations
that do copy_from_user() on the timeout argument. So, in the
FUTEX_TRYLOCK_PI futex_lock_pi() call it would be simplest to change
'timeout' to 'NULL'. This patch does that.

Signed-off-by: Michael Kerrisk <mtk.manpages@gmail.com>
Reviewed-by: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/54B96646.8010200@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-01-19 12:05:32 +01:00
Brian Silverman
30a6b8031f futex: Fix a race condition between REQUEUE_PI and task death
free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:

task A                            |  task B
exit_pi_state_list                |
  pi_state =                      |
      curr->pi_state_list->next   |
                                  |  futex_requeue(requeue_pi=1)
                                  |    // pi_state is the same as
                                  |    // the one in task A
                                  |    free_pi_state(pi_state)
                                  |      list_del_init(&pi_state->list)
                                  |      kfree(pi_state)
  list_del_init(&pi_state->list)  |

Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.

[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
  	debugging. The latter comes via a seperate patch ]

Signed-off-by: Brian Silverman <bsilver16384@gmail.com>
Cc: austin.linux@gmail.com
Cc: darren@dvhart.com
Cc: peterz@infradead.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/1414282837-23092-1-git-send-email-bsilver16384@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-26 16:16:18 +01:00
Davidlohr Bueso
993b2ff221 futex: Mention key referencing differences between shared and private futexes
Update our documentation as of fix 76835b0ebf (futex: Ensure
get_futex_key_refs() always implies a barrier). Explicitly
state that we don't do key referencing for private futexes.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Matteo Franchin <Matteo.Franchin@arm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Link: http://lkml.kernel.org/r/1414121220.817.0.camel@linux-t7sj.site
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-26 16:16:18 +01:00
Catalin Marinas
76835b0ebf futex: Ensure get_futex_key_refs() always implies a barrier
Commit b0c29f79ec (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd0 (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Matteo Franchin <Matteo.Franchin@arm.com>
Fixes: b0c29f79ec (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <dave@stgolabs.net>
Tested-by: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-18 09:28:51 -07:00
Thomas Gleixner
13c42c2f43 futex: Unlock hb->lock in futex_wait_requeue_pi() error path
futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

        if (match_futex(&q.key, &key2)) {
	   	ret = -EINVAL;
		goto out_put_keys;
	}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones <davej@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-09-12 22:04:36 +02:00
Thomas Gleixner
af54d6a1c3 futex: Simplify futex_lock_pi_atomic() and make it more robust
futex_lock_pi_atomic() is a maze of retry hoops and loops.

Reduce it to simple and understandable states:

First step is to lookup existing waiters (state) in the kernel.

If there is an existing waiter, validate it and attach to it.

If there is no existing waiter, check the user space value

If the TID encoded in the user space value is 0, take over the futex
preserving the owner died bit.

If the TID encoded in the user space value is != 0, lookup the owner
task, validate it and attach to it.

Reduces text size by 128 bytes on x8664.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Cc: Darren Hart <darren@dvhart.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1406131137020.5170@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 22:26:24 +02:00
Thomas Gleixner
04e1b2e52b futex: Split out the first waiter attachment from lookup_pi_state()
We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the code which attaches the first
waiter to the owner into a separate function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.271300614@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 22:26:23 +02:00
Thomas Gleixner
e60cbc5cea futex: Split out the waiter check from lookup_pi_state()
We want to be a bit more clever in futex_lock_pi_atomic() and separate
the possible states. Split out the waiter verification into a separate
function. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.180458410@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 22:26:23 +02:00
Thomas Gleixner
bd1dbcc67c futex: Use futex_top_waiter() in lookup_pi_state()
No point in open coding the same function again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <darren@dvhart.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.092947239@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 22:26:23 +02:00
Thomas Gleixner
ccf9e6a80d futex: Make unlock_pi more robust
The kernel tries to atomically unlock the futex without checking
whether there is kernel state associated to the futex.

So if user space manipulated the user space value, this will leave
kernel internal state around associated to the owner task. 

For robustness sake, lookup first whether there are waiters on the
futex. If there are waiters, wake the top priority waiter with all the
proper sanity checks applied.

If there are no waiters, do the atomic release. We do not have to
preserve the waiters bit in this case, because a potentially incoming
waiter is blocked on the hb->lock and will acquire the futex
atomically. We neither have to preserve the owner died bit. The caller
is the owner and it was supposed to cleanup the mess.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Kees Cook <kees@outflux.net>
Cc: wad@chromium.org
Link: http://lkml.kernel.org/r/20140611204237.016987332@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-06-21 22:26:23 +02:00
Thomas Gleixner
c051b21f71 rtmutex: Confine deadlock logic to futex
The deadlock logic is only required for futexes.

Remove the extra arguments for the public functions and also for the
futex specific ones which get always called with deadlock detection
enabled.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2014-06-21 22:05:30 +02:00
Linus Torvalds
3f17ea6dea Merge branch 'next' (accumulated 3.16 merge window patches) into master
Now that 3.15 is released, this merges the 'next' branch into 'master',
bringing us to the normal situation where my 'master' branch is the
merge window.

* accumulated work in next: (6809 commits)
  ufs: sb mutex merge + mutex_destroy
  powerpc: update comments for generic idle conversion
  cris: update comments for generic idle conversion
  idle: remove cpu_idle() forward declarations
  nbd: zero from and len fields in NBD_CMD_DISCONNECT.
  mm: convert some level-less printks to pr_*
  MAINTAINERS: adi-buildroot-devel is moderated
  MAINTAINERS: add linux-api for review of API/ABI changes
  mm/kmemleak-test.c: use pr_fmt for logging
  fs/dlm/debug_fs.c: replace seq_printf by seq_puts
  fs/dlm/lockspace.c: convert simple_str to kstr
  fs/dlm/config.c: convert simple_str to kstr
  mm: mark remap_file_pages() syscall as deprecated
  mm: memcontrol: remove unnecessary memcg argument from soft limit functions
  mm: memcontrol: clean up memcg zoneinfo lookup
  mm/memblock.c: call kmemleak directly from memblock_(alloc|free)
  mm/mempool.c: update the kmemleak stack trace for mempool allocations
  lib/radix-tree.c: update the kmemleak stack trace for radix tree allocations
  mm: introduce kmemleak_update_trace()
  mm/kmemleak.c: use %u to print ->checksum
  ...
2014-06-08 11:31:16 -07:00
Thomas Gleixner
54a217887a futex: Make lookup_pi_state more robust
The current implementation of lookup_pi_state has ambigous handling of
the TID value 0 in the user space futex.  We can get into the kernel
even if the TID value is 0, because either there is a stale waiters bit
or the owner died bit is set or we are called from the requeue_pi path
or from user space just for fun.

The current code avoids an explicit sanity check for pid = 0 in case
that kernel internal state (waiters) are found for the user space
address.  This can lead to state leakage and worse under some
circumstances.

Handle the cases explicit:

       Waiter | pi_state | pi->owner | uTID      | uODIED | ?

  [1]  NULL   | ---      | ---       | 0         | 0/1    | Valid
  [2]  NULL   | ---      | ---       | >0        | 0/1    | Valid

  [3]  Found  | NULL     | --        | Any       | 0/1    | Invalid

  [4]  Found  | Found    | NULL      | 0         | 1      | Valid
  [5]  Found  | Found    | NULL      | >0        | 1      | Invalid

  [6]  Found  | Found    | task      | 0         | 1      | Valid

  [7]  Found  | Found    | NULL      | Any       | 0      | Invalid

  [8]  Found  | Found    | task      | ==taskTID | 0/1    | Valid
  [9]  Found  | Found    | task      | 0         | 0      | Invalid
  [10] Found  | Found    | task      | !=taskTID | 0/1    | Invalid

 [1] Indicates that the kernel can acquire the futex atomically. We
     came came here due to a stale FUTEX_WAITERS/FUTEX_OWNER_DIED bit.

 [2] Valid, if TID does not belong to a kernel thread. If no matching
     thread is found then it indicates that the owner TID has died.

 [3] Invalid. The waiter is queued on a non PI futex

 [4] Valid state after exit_robust_list(), which sets the user space
     value to FUTEX_WAITERS | FUTEX_OWNER_DIED.

 [5] The user space value got manipulated between exit_robust_list()
     and exit_pi_state_list()

 [6] Valid state after exit_pi_state_list() which sets the new owner in
     the pi_state but cannot access the user space value.

 [7] pi_state->owner can only be NULL when the OWNER_DIED bit is set.

 [8] Owner and user space value match

 [9] There is no transient state which sets the user space TID to 0
     except exit_robust_list(), but this is indicated by the
     FUTEX_OWNER_DIED bit. See [4]

[10] There is no transient state which leaves owner and user space
     TID out of sync.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
13fbca4c6e futex: Always cleanup owner tid in unlock_pi
If the owner died bit is set at futex_unlock_pi, we currently do not
cleanup the user space futex.  So the owner TID of the current owner
(the unlocker) persists.  That's observable inconsistant state,
especially when the ownership of the pi state got transferred.

Clean it up unconditionally.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
b3eaa9fc5c futex: Validate atomic acquisition in futex_lock_pi_atomic()
We need to protect the atomic acquisition in the kernel against rogue
user space which sets the user space futex to 0, so the kernel side
acquisition succeeds while there is existing state in the kernel
associated to the real owner.

Verify whether the futex has waiters associated with kernel state.  If
it has, return -EINVAL.  The state is corrupted already, so no point in
cleaning it up.  Subsequent calls will fail as well.  Not our problem.

[ tglx: Use futex_top_waiter() and explain why we do not need to try
  	restoring the already corrupted user space state. ]

Signed-off-by: Darren Hart <dvhart@linux.intel.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Will Drewry <wad@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Thomas Gleixner
e9c243a5a6 futex-prevent-requeue-pi-on-same-futex.patch futex: Forbid uaddr == uaddr2 in futex_requeue(..., requeue_pi=1)
If uaddr == uaddr2, then we have broken the rule of only requeueing from
a non-pi futex to a pi futex with this call.  If we attempt this, then
dangling pointers may be left for rt_waiter resulting in an exploitable
condition.

This change brings futex_requeue() in line with futex_wait_requeue_pi()
which performs the same check as per commit 6f7b0a2a5c ("futex: Forbid
uaddr == uaddr2 in futex_wait_requeue_pi()")

[ tglx: Compare the resulting keys as well, as uaddrs might be
  	different depending on the mapping ]

Fixes CVE-2014-3153.

Reported-by: Pinkie Pie
Signed-off-by: Will Drewry <wad@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-05 12:31:07 -07:00
Linus Torvalds
776edb5931 Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next
Pull core locking updates from Ingo Molnar:
 "The main changes in this cycle were:

   - reduced/streamlined smp_mb__*() interface that allows more usecases
     and makes the existing ones less buggy, especially in rarer
     architectures

   - add rwsem implementation comments

   - bump up lockdep limits"

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits)
  rwsem: Add comments to explain the meaning of the rwsem's count field
  lockdep: Increase static allocations
  arch: Mass conversion of smp_mb__*()
  arch,doc: Convert smp_mb__*()
  arch,xtensa: Convert smp_mb__*()
  arch,x86: Convert smp_mb__*()
  arch,tile: Convert smp_mb__*()
  arch,sparc: Convert smp_mb__*()
  arch,sh: Convert smp_mb__*()
  arch,score: Convert smp_mb__*()
  arch,s390: Convert smp_mb__*()
  arch,powerpc: Convert smp_mb__*()
  arch,parisc: Convert smp_mb__*()
  arch,openrisc: Convert smp_mb__*()
  arch,mn10300: Convert smp_mb__*()
  arch,mips: Convert smp_mb__*()
  arch,metag: Convert smp_mb__*()
  arch,m68k: Convert smp_mb__*()
  arch,m32r: Convert smp_mb__*()
  arch,ia64: Convert smp_mb__*()
  ...
2014-06-03 12:57:53 -07:00
Thomas Gleixner
f0d71b3dcb futex: Prevent attaching to kernel threads
We happily allow userspace to declare a random kernel thread to be the
owner of a user space PI futex.

Found while analysing the fallout of Dave Jones syscall fuzzer.

We also should validate the thread group for private futexes and find
some fast way to validate whether the "alleged" owner has RW access on
the file which backs the SHM, but that's a separate issue.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Carlos ODonell <carlos@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20140512201701.194824402@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2014-05-19 21:18:49 +09:00
Thomas Gleixner
866293ee54 futex: Add another early deadlock detection check
Dave Jones trinity syscall fuzzer exposed an issue in the deadlock
detection code of rtmutex:
  http://lkml.kernel.org/r/20140429151655.GA14277@redhat.com

That underlying issue has been fixed with a patch to the rtmutex code,
but the futex code must not call into rtmutex in that case because
    - it can detect that issue early
    - it avoids a different and more complex fixup for backing out

If the user space variable got manipulated to 0x80000000 which means
no lock holder, but the waiters bit set and an active pi_state in the
kernel is found we can figure out the recursive locking issue by
looking at the pi_state owner. If that is the current task, then we
can safely return -EDEADLK.

The check should have been added in commit 59fa62451 (futex: Handle
futex_pi OWNER_DIED take over correctly) already, but I did not see
the above issue caused by user space manipulation back then.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Jones <davej@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <darren@dvhart.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Clark Williams <williams@redhat.com>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Carlos ODonell <carlos@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/20140512201701.097349971@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
2014-05-19 21:18:49 +09:00
Peter Zijlstra
4e857c58ef arch: Mass conversion of smp_mb__*()
Mostly scripted conversion of the smp_mb__* barriers.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-04-18 14:20:48 +02:00
Davidlohr Bueso
d7e8af1afe futex: update documentation for ordering guarantees
Commits 11d4616bd0 ("futex: revert back to the explicit waiter
counting code") and 69cd9eba38 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.

The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-12 17:57:51 -07:00
Linus Torvalds
69cd9eba38 futex: avoid race between requeue and wake
Jan Stancek reported:
 "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
  occasionally fails, because some threads fail to wake up.

  Testcase creates 5 threads, which are all waiting on same condition.
  Main thread then calls pthread_cond_broadcast() without holding mutex,
  which calls:

      futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

  This immediately wakes up single thread A, which unlocks mutex and
  tries to wake up another thread:

      futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

  If thread A manages to call futex_wake() before any waiters are
  requeued for uaddr2, no other thread is woken up"

The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.

This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks.  It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.

Reported-and-tested-by: Jan Stancek <jstancek@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-04-09 08:02:12 -07:00
Linus Torvalds
462bf234a8 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking updates from Ingo Molnar:
 "The biggest change is the MCS spinlock generalization changes from Tim
  Chen, Peter Zijlstra, Jason Low et al.  There's also lockdep
  fixes/enhancements from Oleg Nesterov, in particular a false negative
  fix related to lockdep_set_novalidate_class() usage"

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
  locking/mutex: Fix debug checks
  locking/mutexes: Add extra reschedule point
  locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
  locking/mutexes: Unlock the mutex without the wait_lock
  locking/mutexes: Modify the way optimistic spinners are queued
  locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
  locking: Move mcs_spinlock.h into kernel/locking/
  m68k: Skip futex_atomic_cmpxchg_inatomic() test
  futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
  Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
  lockdep: Change lockdep_set_novalidate_class() to use _and_name
  lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
  lockdep: Don't create the wrong dependency on hlock->check == 0
  lockdep: Make held_lock->check and "int check" argument bool
  locking/mcs: Allow architecture specific asm files to be used for contended case
  locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
  sched/wait: Suppress Sparse 'variable shadowing' warning
  hung_task/Documentation: Fix hung_task_warnings description
  locking/mcs: Allow architectures to hook in to contended paths
  locking/mcs: Micro-optimize the MCS code, add extra comments
  ...
2014-03-31 10:59:39 -07:00
Linus Torvalds
11d4616bd0 futex: revert back to the explicit waiter counting code
Srikar Dronamraju reports that commit b0c29f79ec ("futexes: Avoid
taking the hb->lock if there's nothing to wake up") causes java threads
getting stuck on futexes when runing specjbb on a power7 numa box.

The cause appears to be that the powerpc spinlocks aren't using the same
ticket lock model that we use on x86 (and other) architectures, which in
turn result in the "spin_is_locked()" test in hb_waiters_pending()
occasionally reporting an unlocked spinlock even when there are pending
waiters.

So this reinstates Davidlohr Bueso's original explicit waiter counting
code, which I had convinced Davidlohr to drop in favor of figuring out
the pending waiters by just using the existing state of the spinlock and
the wait queue.

Reported-and-tested-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Original-code-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-20 22:11:17 -07:00