Commit graph

174 commits

Author SHA1 Message Date
Srinivasarao P
f9cff13b5d Merge android-4.4.135 (c9d74f2) into msm-4.4
* refs/heads/tmp-c9d74f2
  Linux 4.4.135
  Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU"
  Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU"
  Linux 4.4.134
  s390/ftrace: use expoline for indirect branches
  kdb: make "mdr" command repeat
  Bluetooth: btusb: Add device ID for RTL8822BE
  ASoC: samsung: i2s: Ensure the RCLK rate is properly determined
  regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
  scsi: lpfc: Fix frequency of Release WQE CQEs
  scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing
  scsi: lpfc: Fix issue_lip if link is disabled
  netlabel: If PF_INET6, check sk_buff ip header version
  selftests/net: fixes psock_fanout eBPF test case
  perf report: Fix memory corruption in --branch-history mode --branch-history
  perf tests: Use arch__compare_symbol_names to compare symbols
  x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
  drm/rockchip: Respect page offset for PRIME mmap calls
  MIPS: Octeon: Fix logging messages with spurious periods after newlines
  audit: return on memory error to avoid null pointer dereference
  crypto: sunxi-ss - Add MODULE_ALIAS to sun4i-ss
  clk: samsung: exynos3250: Fix PLL rates
  clk: samsung: exynos5250: Fix PLL rates
  clk: samsung: exynos5433: Fix PLL rates
  clk: samsung: exynos5260: Fix PLL rates
  clk: samsung: s3c2410: Fix PLL rates
  media: cx25821: prevent out-of-bounds read on array card
  udf: Provide saner default for invalid uid / gid
  PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
  serial: arc_uart: Fix out-of-bounds access through DT alias
  serial: fsl_lpuart: Fix out-of-bounds access through DT alias
  serial: imx: Fix out-of-bounds access through serial port index
  serial: mxs-auart: Fix out-of-bounds access through serial port index
  serial: samsung: Fix out-of-bounds access through serial port index
  serial: xuartps: Fix out-of-bounds access through DT alias
  rtc: tx4939: avoid unintended sign extension on a 24 bit shift
  staging: rtl8192u: return -ENOMEM on failed allocation of priv->oldaddr
  hwrng: stm32 - add reset during probe
  enic: enable rq before updating rq descriptors
  clk: rockchip: Prevent calculating mmc phase if clock rate is zero
  media: em28xx: USB bulk packet size fix
  dmaengine: pl330: fix a race condition in case of threaded irqs
  media: s3c-camif: fix out-of-bounds array access
  media: cx23885: Set subdev host data to clk_freq pointer
  media: cx23885: Override 888 ImpactVCBe crystal frequency
  ALSA: vmaster: Propagate slave error
  x86/devicetree: Fix device IRQ settings in DT
  x86/devicetree: Initialize device tree before using it
  usb: gadget: composite: fix incorrect handling of OS desc requests
  usb: gadget: udc: change comparison to bitshift when dealing with a mask
  gfs2: Fix fallocate chunk size
  cdrom: do not call check_disk_change() inside cdrom_open()
  hwmon: (pmbus/adm1275) Accept negative page register values
  hwmon: (pmbus/max8688) Accept negative page register values
  perf/core: Fix perf_output_read_group()
  ASoC: topology: create TLV data for dapm widgets
  powerpc: Add missing prototype for arch_irq_work_raise()
  usb: gadget: ffs: Execute copy_to_user() with USER_DS set
  usb: gadget: ffs: Let setup() return USB_GADGET_DELAYED_STATUS
  usb: dwc2: Fix interval type issue
  ipmi_ssif: Fix kernel panic at msg_done_handler
  PCI: Restore config space on runtime resume despite being unbound
  MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
  xhci: zero usb device slot_id member when disabling and freeing a xhci slot
  KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
  i2c: mv64xxx: Apply errata delay only in standard mode
  ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
  ACPICA: Events: add a return on failure from acpi_hw_register_read
  bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
  zorro: Set up z->dev.dma_mask for the DMA API
  clk: Don't show the incorrect clock phase
  cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
  usb: dwc3: Update DWC_usb31 GTXFIFOSIZ reg fields
  arm: dts: socfpga: fix GIC PPI warning
  virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
  ima: Fallback to the builtin hash algorithm
  ima: Fix Kconfig to select TPM 2.0 CRB interface
  ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
  net/mlx5: Protect from command bit overflow
  selftests: Print the test we're running to /dev/kmsg
  tools/thermal: tmon: fix for segfault
  powerpc/perf: Fix kernel address leak via sampling registers
  powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
  rtc: hctosys: Ensure system time doesn't overflow time_t
  hwmon: (nct6775) Fix writing pwmX_mode
  parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
  m68k: set dma and coherent masks for platform FEC ethernets
  powerpc/mpic: Check if cpu_possible() in mpic_physmask()
  ACPI: acpi_pad: Fix memory leak in power saving threads
  xen/acpi: off by one in read_acpi_id()
  btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
  Btrfs: fix copy_items() return value when logging an inode
  btrfs: tests/qgroup: Fix wrong tree backref level
  Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB
  net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
  rtc: snvs: Fix usage of snvs_rtc_enable
  sparc64: Make atomic_xchg() an inline function rather than a macro.
  fscache: Fix hanging wait on page discarded by writeback
  KVM: VMX: raise internal error for exception during invalid protected mode state
  sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
  ocfs2/dlm: don't handle migrate lockres if already in shutdown
  btrfs: Fix possible softlock on single core machines
  Btrfs: fix NULL pointer dereference in log_dir_items
  Btrfs: bail out on error during replay_dir_deletes
  mm: fix races between address_space dereference and free in page_evicatable
  mm/ksm: fix interaction with THP
  dp83640: Ensure against premature access to PHY registers after reset
  scsi: aacraid: Insure command thread is not recursively stopped
  cpufreq: CPPC: Initialize shared perf capabilities of CPUs
  Force log to disk before reading the AGF during a fstrim
  sr: get/drop reference to device in revalidate and check_events
  swap: divide-by-zero when zero length swap file on ssd
  fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
  x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
  sh: fix debug trap failure to process signals before return to user
  net: mvneta: fix enable of all initialized RXQs
  net: Fix untag for vlan packets without ethernet header
  mm/kmemleak.c: wait for scan completion before disabling free
  llc: properly handle dev_queue_xmit() return value
  net-usb: add qmi_wwan if on lte modem wistron neweb d18q1
  net/usb/qmi_wwan.c: Add USB id for lt4120 modem
  net: qmi_wwan: add BroadMobi BM806U 2020:2033
  ARM: 8748/1: mm: Define vdso_start, vdso_end as array
  batman-adv: fix packet loss for broadcasted DHCP packets to a server
  batman-adv: fix multicast-via-unicast transmission with AP isolation
  selftests: ftrace: Add a testcase for probepoint
  selftests: ftrace: Add a testcase for string type with kprobe_event
  selftests: ftrace: Add probe event argument syntax testcase
  mm/mempolicy.c: avoid use uninitialized preferred_node
  RDMA/ucma: Correct option size check using optlen
  perf/cgroup: Fix child event counting bug
  vti4: Don't override MTU passed on link creation via IFLA_MTU
  vti4: Don't count header length twice on tunnel setup
  batman-adv: fix header size check in batadv_dbg_arp()
  net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
  sunvnet: does not support GSO for sctp
  ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
  workqueue: use put_device() instead of kfree()
  bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
  netfilter: ebtables: fix erroneous reject of last rule
  USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM
  xen: xenbus: use put_device() instead of kfree()
  fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
  scsi: sd: Keep disk read-only when re-reading partition
  scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
  usb: musb: call pm_runtime_{get,put}_sync before reading vbus registers
  e1000e: allocate ring descriptors with dma_zalloc_coherent
  e1000e: Fix check_for_link return value with autoneg off
  watchdog: f71808e_wdt: Fix magic close handling
  KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
  selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
  Btrfs: send, fix issuing write op when processing hole in no data mode
  xen/pirq: fix error path cleanup when binding MSIs
  net/tcp/illinois: replace broken algorithm reference link
  gianfar: Fix Rx byte accounting for ndev stats
  sit: fix IFLA_MTU ignored on NEWLINK
  bcache: fix kcrashes with fio in RAID5 backend dev
  dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
  virtio-gpu: fix ioctl and expose the fixed status to userspace.
  r8152: fix tx packets accounting
  clocksource/drivers/fsl_ftm_timer: Fix error return checking
  nvme-pci: Fix nvme queue cleanup if IRQ setup fails
  netfilter: ebtables: convert BUG_ONs to WARN_ONs
  batman-adv: invalidate checksum on fragment reassembly
  batman-adv: fix packet checksum in receive path
  md/raid1: fix NULL pointer dereference
  media: dmxdev: fix error code for invalid ioctls
  x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
  locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
  regulatory: add NUL to request alpha2
  smsc75xx: fix smsc75xx_set_features()
  ARM: OMAP: Fix dmtimer init for omap1
  s390/cio: clear timer when terminating driver I/O
  s390/cio: fix return code after missing interrupt
  powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
  kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
  md: raid5: avoid string overflow warning
  locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
  usb: musb: fix enumeration after resume
  drm/exynos: fix comparison to bitshift when dealing with a mask
  md raid10: fix NULL deference in handle_write_completed()
  mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
  NFC: llcp: Limit size of SDP URI
  ARM: OMAP1: clock: Fix debugfs_create_*() usage
  ARM: OMAP3: Fix prm wake interrupt for resume
  ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
  scsi: qla4xxx: skip error recovery in case of register disconnect.
  scsi: aacraid: fix shutdown crash when init fails
  scsi: storvsc: Increase cmd_per_lun for higher speed devices
  selftests: memfd: add config fragment for fuse
  usb: dwc2: Fix dwc2_hsotg_core_init_disconnected()
  usb: gadget: fsl_udc_core: fix ep valid checks
  usb: gadget: f_uac2: fix bFirstInterface in composite gadget
  ARC: Fix malformed ARC_EMUL_UNALIGNED default
  scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
  scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
  scsi: sym53c8xx_2: iterator underflow in sym_getsync()
  scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
  scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
  irqchip/gic-v3: Change pr_debug message to pr_devel
  locking/qspinlock: Ensure node->count is updated before initialising node
  tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
  bcache: return attach error when no cache set exist
  bcache: fix for data collapse after re-attaching an attached device
  bcache: fix for allocator and register thread race
  bcache: properly set task state in bch_writeback_thread()
  cifs: silence compiler warnings showing up with gcc-8.0.0
  proc: fix /proc/*/map_files lookup
  arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
  RDS: IB: Fix null pointer issue
  xen/grant-table: Use put_page instead of free_page
  xen-netfront: Fix race between device setup and open
  MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
  bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
  ACPI: processor_perflib: Do not send _PPC change notification if not ready
  firmware: dmi_scan: Fix handling of empty DMI strings
  x86/power: Fix swsusp_arch_resume prototype
  IB/ipoib: Fix for potential no-carrier state
  mm: pin address_space before dereferencing it while isolating an LRU page
  asm-generic: provide generic_pmdp_establish()
  mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
  mm/mempolicy: fix the check of nodemask from user
  ocfs2: return error when we attempt to access a dirty bh in jbd2
  ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
  ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
  ntb_transport: Fix bug with max_mw_size parameter
  RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
  powerpc/numa: Ensure nodes initialized for hotplug
  powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
  jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
  HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
  scsi: fas216: fix sense buffer initialization
  Btrfs: fix scrub to repair raid6 corruption
  btrfs: Fix out of bounds access in btrfs_search_slot
  Btrfs: set plug for fsync
  ipmi/powernv: Fix error return code in ipmi_powernv_probe()
  mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
  kconfig: Fix expr_free() E_NOT leak
  kconfig: Fix automatic menu creation mem leak
  kconfig: Don't leak main menus during parsing
  watchdog: sp5100_tco: Fix watchdog disable bit
  nfs: Do not convert nfs_idmap_cache_timeout to jiffies
  dm thin: fix documentation relative to low water mark threshold
  tools lib traceevent: Fix get_field_str() for dynamic strings
  perf callchain: Fix attr.sample_max_stack setting
  tools lib traceevent: Simplify pointer print logic and fix %pF
  PCI: Add function 1 DMA alias quirk for Marvell 9128
  tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
  kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
  ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
  ALSA: hda - Use IS_REACHABLE() for dependency on input
  NFSv4: always set NFS_LOCK_LOST when a lock is lost.
  firewire-ohci: work around oversized DMA reads on JMicron controllers
  do d_instantiate/unlock_new_inode combinations safely
  xfs: remove racy hasattr check from attr ops
  kernel/signal.c: avoid undefined behaviour in kill_something_info
  kernel/sys.c: fix potential Spectre v1 issue
  kasan: fix memory hotplug during boot
  ipc/shm: fix shmat() nil address after round-down when remapping
  Revert "ipc/shm: Fix shmat mmap nil-page protection"
  xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
  libata: blacklist Micron 500IT SSD with MU01 firmware
  libata: Blacklist some Sandisk SSDs for NCQ
  mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
  ALSA: timer: Fix pause event notification
  aio: fix io_destroy(2) vs. lookup_ioctx() race
  affs_lookup(): close a race with affs_remove_link()
  KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
  MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
  MIPS: ptrace: Expose FIR register through FP regset
  UPSTREAM: sched/fair: Consider RT/IRQ pressure in capacity_spare_wake

Conflicts:
	drivers/media/dvb-core/dmxdev.c
	drivers/scsi/sd.c
	drivers/scsi/ufs/ufshcd.c
	drivers/usb/gadget/function/f_fs.c
	fs/ecryptfs/inode.c

Change-Id: I15751ed8c82ec65ba7eedcb0d385b9f803c333f7
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-06-27 14:42:55 +05:30
Davidlohr Bueso
a4e37ca958 ipc/shm: fix shmat() nil address after round-down when remapping
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.

shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for.  Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil.  As
of this patch, such cases will return -EINVAL.

Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:48:51 +02:00
Davidlohr Bueso
bd4792b31a Revert "ipc/shm: Fix shmat mmap nil-page protection"
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.

Patch series "ipc/shm: shmat() fixes around nil-page".

These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.

The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED.  This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.

I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour.  Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).

[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805

This patch (of 2):

Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED.  However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.

For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].

[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:48:51 +02:00
Srinivasarao P
ee76c85f71 Merge android-4.4.129 (b1c4836) into msm-4.4
* refs/heads/tmp-b1c4836
  Linux 4.4.129
  writeback: safer lock nesting
  fanotify: fix logic of events on child
  ext4: bugfix for mmaped pages in mpage_release_unused_pages()
  mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
  mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
  autofs: mount point create should honour passed in mode
  Don't leak MNT_INTERNAL away from internal mounts
  rpc_pipefs: fix double-dput()
  hypfs_kill_super(): deal with failed allocations
  jffs2_kill_sb(): deal with failed allocations
  powerpc/lib: Fix off-by-one in alternate feature patching
  powerpc/eeh: Fix enabling bridge MMIO windows
  MIPS: memset.S: Fix clobber of v1 in last_fixup
  MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
  MIPS: memset.S: EVA & fault support for small_memset
  MIPS: uaccess: Add micromips clobbers to bzero invocation
  HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
  ALSA: hda - New VIA controller suppor no-snoop path
  ALSA: rawmidi: Fix missing input substream checks in compat ioctls
  ALSA: line6: Use correct endpoint type for midi output
  ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
  ext4: fix crashes in dioread_nolock mode
  drm/radeon: Fix PCIe lane width calculation
  ext4: don't allow r/w mounts if metadata blocks overlap the superblock
  vfio/pci: Virtualize Maximum Read Request Size
  vfio/pci: Virtualize Maximum Payload Size
  vfio-pci: Virtualize PCIe & AF FLR
  ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
  ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
  ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
  ALSA: pcm: Avoid potential races between OSS ioctls and read/write
  ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
  ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
  watchdog: f71808e_wdt: Fix WD_EN register read
  thermal: imx: Fix race condition in imx_thermal_probe()
  clk: bcm2835: De-assert/assert PLL reset signal when appropriate
  clk: mvebu: armada-38x: add support for missing clocks
  clk: mvebu: armada-38x: add support for 1866MHz variants
  mmc: jz4740: Fix race condition in IRQ mask update
  iommu/vt-d: Fix a potential memory leak
  um: Use POSIX ucontext_t instead of struct ucontext
  dmaengine: at_xdmac: fix rare residue corruption
  IB/srp: Fix completion vector assignment algorithm
  IB/srp: Fix srp_abort()
  ALSA: pcm: Fix UAF at PCM release via PCM timer access
  RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
  ext4: fail ext4_iget for root directory if unallocated
  ext4: don't update checksum of new initialized bitmaps
  jbd2: if the journal is aborted then don't allow update of the log tail
  random: use a tighter cap in credit_entropy_bits_safe()
  thunderbolt: Resume control channel after hibernation image is created
  ASoC: ssm2602: Replace reg_default_raw with reg_default
  HID: core: Fix size as type u32
  HID: Fix hid_report_len usage
  powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
  powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
  powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
  powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
  HID: i2c-hid: fix size check and type usage
  usb: dwc3: pci: Properly cleanup resource
  USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
  ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
  ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
  regmap: Fix reversed bounds check in regmap_raw_write()
  xen-netfront: Fix hang on device removal
  ARM: dts: at91: sama5d4: fix pinctrl compatible string
  ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
  usb: musb: gadget: misplaced out of bounds check
  mm, slab: reschedule cache_reap() on the same CPU
  ipc/shm: fix use-after-free of shm file via remap_file_pages()
  resource: fix integer overflow at reallocation
  fs/reiserfs/journal.c: add missing resierfs_warning() arg
  ubi: Reject MLC NAND
  ubi: Fix error for write access
  ubi: fastmap: Don't flush fastmap work on detach
  ubifs: Check ubifs_wbuf_sync() return code
  tty: make n_tty_read() always abort if hangup is in progress
  x86/hweight: Don't clobber %rdi
  x86/hweight: Get rid of the special calling convention
  lan78xx: Correctly indicate invalid OTP
  slip: Check if rstate is initialized before uncompressing
  cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
  hwmon: (ina2xx) Fix access to uninitialized mutex
  rtl8187: Fix NULL pointer dereference in priv->conf_mutex
  getname_kernel() needs to make sure that ->name != ->iname in long case
  s390/ipl: ensure loadparm valid flag is set
  s390/qdio: don't merge ERROR output buffers
  s390/qdio: don't retry EQBS after CCQ 96
  block/loop: fix deadlock after loop_set_status
  Revert "perf tests: Decompress kernel module before objdump"
  radeon: hide pointless #warning when compile testing
  perf intel-pt: Fix timestamp following overflow
  perf intel-pt: Fix error recovery from missing TIP packet
  perf intel-pt: Fix sync_switch
  perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
  parisc: Fix out of array access in match_pci_device()
  media: v4l2-compat-ioctl32: don't oops on overlay
  f2fs: check cap_resource only for data blocks
  Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
  f2fs: clear PageError on writepage
  UPSTREAM: timer: Export destroy_hrtimer_on_stack()
  BACKPORT: dm verity: add 'check_at_most_once' option to only validate hashes once
  f2fs: call unlock_new_inode() before d_instantiate()
  f2fs: refactor read path to allow multiple postprocessing steps
  fscrypt: allow synchronous bio decryption

Change-Id: I45f4ac10734d92023b53118d83dcd6c83974a283
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
2018-04-24 19:07:57 +05:30
Eric Biggers
b7e06a79c2 ipc/shm: fix use-after-free of shm file via remap_file_pages()
commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.

syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().

Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it.  When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file.  Between these steps, the shm ID can be
removed and reused for a new shm segment.  But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused.  Thus it can use the
wrong underlying file, one that was already freed.

Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.

Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).

Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.

The following program usually reproduces this bug:

	#include <stdlib.h>
	#include <sys/shm.h>
	#include <sys/syscall.h>
	#include <unistd.h>

	int main()
	{
		int is_parent = (fork() != 0);
		srand(getpid());
		for (;;) {
			int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
			if (is_parent) {
				void *addr = shmat(id, NULL, 0);
				usleep(rand() % 50);
				while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
			} else {
				usleep(rand() % 50);
				shmctl(id, IPC_RMID, NULL);
			}
		}
	}

It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed.  (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report.  But I think it's
possible with this bug; it would just take a more extraordinary race...)

	BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
	PGD 0 P4D 0
	Oops: 0000 [#1] SMP NOPTI
	CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
	Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
	RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
	RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
	[...]
	Call Trace:
	 file_accessed include/linux/fs.h:2063 [inline]
	 shmem_mmap+0x25/0x40 mm/shmem.c:2149
	 call_mmap include/linux/fs.h:1789 [inline]
	 shm_mmap+0x34/0x80 ipc/shm.c:465
	 call_mmap include/linux/fs.h:1789 [inline]
	 mmap_region+0x309/0x5b0 mm/mmap.c:1712
	 do_mmap+0x294/0x4a0 mm/mmap.c:1483
	 do_mmap_pgoff include/linux/mm.h:2235 [inline]
	 SYSC_remap_file_pages mm/mmap.c:2853 [inline]
	 SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
	 do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
	 entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ebiggers@google.com: add comment]
  Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1823 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-24 09:32:05 +02:00
Blagovest Kolenichev
a4b9c109c2 Merge tag v4.4.55 into branch 'msm-4.4'
refs/heads/tmp-28ec98b:
  Linux 4.4.55
  ext4: don't BUG when truncating encrypted inodes on the orphan list
  dm: flush queued bios when process blocks to avoid deadlock
  nfit, libnvdimm: fix interleave set cookie calculation
  s390/kdump: Use "LINUX" ELF note name instead of "CORE"
  KVM: s390: Fix guest migration for huge guests resulting in panic
  mvsas: fix misleading indentation
  serial: samsung: Continue to work if DMA request fails
  USB: serial: io_ti: fix information leak in completion handler
  USB: serial: io_ti: fix NULL-deref in interrupt callback
  USB: iowarrior: fix NULL-deref in write
  USB: iowarrior: fix NULL-deref at probe
  USB: serial: omninet: fix reference leaks at open
  USB: serial: safe_serial: fix information leak in completion handler
  usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
  usb: host: xhci-dbg: HCIVERSION should be a binary number
  usb: gadget: function: f_fs: pass companion descriptor along
  usb: dwc3: gadget: make Set Endpoint Configuration macros safe
  usb: gadget: dummy_hcd: clear usb_gadget region before registration
  powerpc: Emulation support for load/store instructions on LE
  tracing: Add #undef to fix compile error
  MIPS: Netlogic: Fix CP0_EBASE redefinition warnings
  MIPS: DEC: Avoid la pseudo-instruction in delay slots
  mm: memcontrol: avoid unused function warning
  cpmac: remove hopeless #warning
  MIPS: ralink: Remove unused rt*_wdt_reset functions
  MIPS: ralink: Cosmetic change to prom_init().
  mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy
  MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change
  MIPS: ip22: Fix ip28 build for modern gcc
  MIPS: Update ip27_defconfig for SCSI_DH change
  MIPS: ip27: Disable qlge driver in defconfig
  MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change
  crypto: improve gcc optimization flags for serpent and wp512
  USB: serial: digi_acceleport: fix OOB-event processing
  USB: serial: digi_acceleport: fix OOB data sanity check
  Linux 4.4.54
  drivers: hv: Turn off write permission on the hypercall page
  fat: fix using uninitialized fields of fat_inode/fsinfo_inode
  libceph: use BUG() instead of BUG_ON(1)
  drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
  fakelb: fix schedule while atomic
  drm/atomic: fix an error code in mode_fixup()
  drm/ttm: Make sure BOs being swapped out are cacheable
  drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058
  drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS
  drm/ast: Call open_key before enable_mmio in POST code
  drm/ast: Fix test for VGA enabled
  drm/amdgpu: add more cases to DCE11 possible crtc mask setup
  mac80211: flush delayed work when entering suspend
  xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
  pwm: pca9685: Fix period change with same duty cycle
  nlm: Ensure callback code also checks that the files match
  target: Fix NULL dereference during LUN lookup + active I/O shutdown
  ceph: remove req from unsafe list when unregistering it
  ktest: Fix child exit code processing
  IB/srp: Fix race conditions related to task management
  IB/srp: Avoid that duplicate responses trigger a kernel bug
  IB/IPoIB: Add destination address when re-queue packet
  IB/ipoib: Fix deadlock between rmmod and set_mode
  mnt: Tuck mounts under others instead of creating shadow/side mounts.
  net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put()
  s390: use correct input data address for setup_randomness
  s390: make setup_randomness work
  s390: TASK_SIZE for kernel threads
  s390/dcssblk: fix device size calculation in dcssblk_direct_access()
  s390/qdio: clear DSCI prior to scanning multiple input queues
  Bluetooth: Add another AR3012 04ca:3018 device
  KVM: VMX: use correct vmcs_read/write for guest segment selector/base
  KVM: s390: Disable dirty log retrieval for UCONTROL guests
  serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards
  tty: n_hdlc: get rid of racy n_hdlc.tbuf
  TTY: n_hdlc, fix lockdep false positive
  Linux 4.4.53
  scsi: lpfc: Correct WQ creation for pagesize
  MIPS: IP22: Fix build error due to binutils 2.25 uselessnes.
  MIPS: IP22: Reformat inline assembler code to modern standards.
  powerpc/xmon: Fix data-breakpoint
  dmaengine: ipu: Make sure the interrupt routine checks all interrupts.
  bcma: use (get|put)_device when probing/removing device driver
  md linear: fix a race between linear_add() and linear_congested()
  rtc: sun6i: Switch to the external oscillator
  rtc: sun6i: Add some locking
  NFSv4: fix getacl ERANGE for some ACL buffer sizes
  NFSv4: fix getacl head length estimation
  NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
  nfsd: special case truncates some more
  nfsd: minor nfsd_setattr cleanup
  rtlwifi: rtl8192c-common: Fix "BUG: KASAN:
  rtlwifi: Fix alignment issues
  gfs2: Add missing rcu locking for glock lookup
  rdma_cm: fail iwarp accepts w/o connection params
  RDMA/core: Fix incorrect structure packing for booleans
  Drivers: hv: util: Backup: Fix a rescind processing issue
  Drivers: hv: util: Fcopy: Fix a rescind processing issue
  Drivers: hv: util: kvp: Fix a rescind processing issue
  hv: init percpu_list in hv_synic_alloc()
  hv: allocate synic pages for all present CPUs
  usb: gadget: udc: fsl: Add missing complete function.
  usb: host: xhci: plat: check hcc_params after add hcd
  usb: musb: da8xx: Remove CPPI 3.0 quirk and methods
  w1: ds2490: USB transfer buffers need to be DMAable
  w1: don't leak refcount on slave attach failure in w1_attach_slave_device()
  can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
  iio: pressure: mpl3115: do not rely on structure field ordering
  iio: pressure: mpl115: do not rely on structure field ordering
  arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2
  fuse: add missing FR_FORCE
  crypto: testmgr - Pad aes_ccm_enc_tv_template vector
  ath9k: use correct OTP register offsets for the AR9340 and AR9550
  ath9k: fix race condition in enabling/disabling IRQs
  ath5k: drop bogus warning on drv_set_key with unsupported cipher
  target: Fix multi-session dynamic se_node_acl double free OOPs
  target: Obtain se_node_acl->acl_kref during get_initiator_node_acl
  samples/seccomp: fix 64-bit comparison macros
  ext4: return EROFS if device is r/o and journal replay is needed
  ext4: preserve the needs_recovery flag when the journal is aborted
  ext4: fix inline data error paths
  ext4: fix data corruption in data=journal mode
  ext4: trim allocation requests to group size
  ext4: do not polute the extents cache while shifting extents
  ext4: Include forgotten start block on fallocate insert range
  loop: fix LO_FLAGS_PARTSCAN hang
  block/loop: fix race between I/O and set_status
  jbd2: don't leak modified metadata buffers on an aborted journal
  Fix: Disable sys_membarrier when nohz_full is enabled
  sd: get disk reference in sd_check_events()
  scsi: use 'scsi_device_from_queue()' for scsi_dh
  scsi: aacraid: Reorder Adapter status check
  scsi: storvsc: properly set residual data length on errors
  scsi: storvsc: properly handle SRB_ERROR when sense message is present
  scsi: storvsc: use tagged SRB requests if supported by the device
  dm stats: fix a leaked s->histogram_boundaries array
  dm cache: fix corruption seen when using cache > 2TB
  ipc/shm: Fix shmat mmap nil-page protection
  mm: do not access page->mapping directly on page_endio
  mm: vmpressure: fix sending wrong events on underflow
  mm/page_alloc: fix nodes for reclaim in fast path
  iommu/vt-d: Tylersburg isoch identity map check is done too late.
  iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu
  regulator: Fix regulator_summary for deviceless consumers
  staging: rtl: fix possible NULL pointer dereference
  ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine
  ALSA: hda - Add subwoofer support for Dell Inspiron 17 7000 Gaming
  ALSA: seq: Fix link corruption by event error handling
  ALSA: ctxfi: Fallback DMA mask to 32bit
  ALSA: timer: Reject user params with too small ticks
  ALSA: hda - fix Lewisburg audio issue
  ALSA: hda/realtek - Cannot adjust speaker's volume on a Dell AIO
  ARM: dts: at91: Enable DMA on sama5d2_xplained console
  ARM: dts: at91: Enable DMA on sama5d4_xplained console
  ARM: at91: define LPDDR types
  media: fix dm1105.c build error
  uvcvideo: Fix a wrong macro
  am437x-vpfe: always assign bpp variable
  MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
  MIPS: Calculate microMIPS ra properly when unwinding the stack
  MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
  MIPS: Fix get_frame_info() handling of microMIPS function size
  MIPS: Prevent unaligned accesses during stack unwinding
  MIPS: Clear ISA bit correctly in get_frame_info()
  MIPS: Lantiq: Keep ethernet enabled during boot
  MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
  MIPS: BCM47XX: Fix button inversion for Asus WL-500W
  MIPS: Fix special case in 64 bit IP checksumming.
  samples: move mic/mpssd example code from Documentation
  Linux 4.4.52
  kvm: vmx: ensure VMCS is current while enabling PML
  Revert "usb: chipidea: imx: enable CI_HDRC_SET_NON_ZERO_TTHA"
  rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down
  block: fix double-free in the failure path of cgwb_bdi_init()
  goldfish: Sanitize the broken interrupt handler
  x86/platform/goldfish: Prevent unconditional loading
  USB: serial: ark3116: fix register-accessor error handling
  USB: serial: opticon: fix CTS retrieval at open
  USB: serial: spcp8x5: fix modem-status handling
  USB: serial: ftdi_sio: fix line-status over-reporting
  USB: serial: ftdi_sio: fix extreme low-latency setting
  USB: serial: ftdi_sio: fix modem-status error handling
  USB: serial: cp210x: add new IDs for GE Bx50v3 boards
  USB: serial: mos7840: fix another NULL-deref at open
  tty: serial: msm: Fix module autoload
  net: socket: fix recvmmsg not returning error from sock_error
  ip: fix IP_CHECKSUM handling
  irda: Fix lockdep annotations in hashbin_delete().
  dccp: fix freeing skb too early for IPV6_RECVPKTINFO
  packet: Do not call fanout_release from atomic contexts
  packet: fix races in fanout_add()
  net/llc: avoid BUG_ON() in skb_orphan()
  blk-mq: really fix plug list flushing for nomerge queues
  rtc: interface: ignore expired timers when enqueuing new timers
  rtlwifi: rtl_usb: Fix missing entry in USB driver's private data
  Linux 4.4.51
  mmc: core: fix multi-bit bus width without high-speed mode
  bcache: Make gc wakeup sane, remove set_task_state()
  ntb_transport: Pick an unused queue
  NTB: ntb_transport: fix debugfs_remove_recursive
  printk: use rcuidle console tracepoint
  ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
  futex: Move futex_init() to core_initcall
  drm/dp/mst: fix kernel oops when turning off secondary monitor
  drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
  Input: elan_i2c - add ELAN0605 to the ACPI table
  Fix missing sanity check in /dev/sg
  scsi: don't BUG_ON() empty DMA transfers
  fuse: fix use after free issue in fuse_dev_do_read()
  siano: make it work again with CONFIG_VMAP_STACK
  vfs: fix uninitialized flags in splice_to_pipe()
  Linux 4.4.50
  l2tp: do not use udp_ioctl()
  ping: fix a null pointer dereference
  packet: round up linear to header len
  net: introduce device min_header_len
  sit: fix a double free on error path
  sctp: avoid BUG_ON on sctp_wait_for_sndbuf
  mlx4: Invoke softirqs after napi_reschedule
  macvtap: read vnet_hdr_size once
  tun: read vnet_hdr_sz once
  tcp: avoid infinite loop in tcp_splice_read()
  ipv6: tcp: add a missing tcp_v6_restore_cb()
  ip6_gre: fix ip6gre_err() invalid reads
  netlabel: out of bound access in cipso_v4_validate()
  ipv4: keep skb->dst around in presence of IP options
  net: use a work queue to defer net_disable_timestamp() work
  tcp: fix 0 divide in __tcp_select_window()
  ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
  ipv6: fix ip6_tnl_parse_tlv_enc_lim()
  can: Fix kernel panic at security_sock_rcv_skb

Conflicts:
	drivers/scsi/sd.c
	drivers/usb/gadget/function/f_fs.c
	drivers/usb/host/xhci-plat.c

CRs-Fixed: 2023471
Change-Id: I396051a8de30271af77b3890d4b19787faa1c31e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-03-23 03:22:14 -07:00
Davidlohr Bueso
f0ae01568e ipc/shm: Fix shmat mmap nil-page protection
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.

The issue is described here, with a nice testcase:

    https://bugzilla.kernel.org/show_bug.cgi?id=192931

The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0.  For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address.  So by the time we do security_mmap_addr(0) things
get funky for shmat().

The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page.  There are two possible fixes
to this.  The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(...  |MAP_FIXED).  While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags.  This makes the
behavior of shmat() identical to the mmap() case.  The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.

Passes shm related ltp tests.

Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-12 06:37:26 +01:00
Jeevan Shriram
643a137249 net: initialize variables to avoid UML compilation failure
While compiling for usermode linux for x86 architecture, observed
compilation issues with probable usage of uninitialized variables.
This change initializes the variables.

Signed-off-by: Jeevan Shriram <jshriram@codeaurora.org>
2016-03-23 21:24:21 -07:00
Kirill A. Shutemov
15db15e2f1 ipc/shm: handle removed segments gracefully in shm_mmap()
commit 1ac0b6dec656f3f78d1c3dd216fad84cb4d0a01e upstream.

remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped.  It breaks expectations of
IPC subsystem.

Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):

	#define _GNU_SOURCE
	#include <stdlib.h>
	#include <sys/ipc.h>
	#include <sys/mman.h>
	#include <sys/shm.h>

	#define PAGE_SIZE 4096

	int main()
	{
		int id;
		void *p;

		id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
		p = shmat(id, NULL, 0);
		shmctl(id, IPC_RMID, NULL);
		remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);

	        return 0;
	}

The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().

[1] http://github.com/google/syzkaller

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:23 -08:00
Linus Torvalds
b9a5322779 Initialize msg/shm IPC objects before doing ipc_addid()
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state.  Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.

We already did this for the IPC semaphore code (see commit e8577d1f03:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-30 12:48:40 -04:00
Davidlohr Bueso
d0edd85283 ipc: convert invalid scenarios to use WARN_ON
Considering Linus' past rants about the (ab)use of BUG in the kernel, I
took a look at how we deal with such calls in ipc.  Given that any errors
or corruption in ipc code are most likely contained within the set of
processes participating in the broken mechanisms, there aren't really many
strong fatal system failure scenarios that would require a BUG call.
Also, if something is seriously wrong, ipc might not be the place for such
a BUG either.

1. For example, recently, a customer hit one of these BUG_ONs in shm
   after failing shm_lock().  A busted ID imho does not merit a BUG_ON,
   and WARN would have been better.

2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
   I don't see how we can hit this anyway -- at least it should be IS_ERR.
    The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
   first and foremost.  We could also probably drop this check altogether.
    Either way, it does not merit a BUG_ON.

3. No ->fault() callback for the fs getting the corresponding page --
   seems selfish to make the system unusable.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Stephen Smalley
e1832f2923 ipc: use private shmem or hugetlbfs inodes for shm segments.
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments.  As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.

This was motivated by the following lockdep warning:

  ======================================================
   [ INFO: possible circular locking dependency detected ]
   4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G        W
  -------------------------------------------------------
   httpd/1597 is trying to acquire lock:
   (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
   but task is already holding lock:
   (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
   which lock already depends on the new lock.
   the existing dependency chain (in reverse order) is:
   -> #3 (&mm->mmap_sem){++++++}:
        lock_acquire+0xc7/0x270
        __might_fault+0x7a/0xa0
        filldir+0x9e/0x130
        xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
        xfs_readdir+0x1b4/0x330 [xfs]
        xfs_file_readdir+0x2b/0x30 [xfs]
        iterate_dir+0x97/0x130
        SyS_getdents+0x91/0x120
        entry_SYSCALL_64_fastpath+0x12/0x76
   -> #2 (&xfs_dir_ilock_class){++++.+}:
        lock_acquire+0xc7/0x270
        down_read_nested+0x57/0xa0
        xfs_ilock+0x167/0x350 [xfs]
        xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
        xfs_attr_get+0xbd/0x190 [xfs]
        xfs_xattr_get+0x3d/0x70 [xfs]
        generic_getxattr+0x4f/0x70
        inode_doinit_with_dentry+0x162/0x670
        sb_finish_set_opts+0xd9/0x230
        selinux_set_mnt_opts+0x35c/0x660
        superblock_doinit+0x77/0xf0
        delayed_superblock_init+0x10/0x20
        iterate_supers+0xb3/0x110
        selinux_complete_init+0x2f/0x40
        security_load_policy+0x103/0x600
        sel_write_load+0xc1/0x750
        __vfs_write+0x37/0x100
        vfs_write+0xa9/0x1a0
        SyS_write+0x58/0xd0
        entry_SYSCALL_64_fastpath+0x12/0x76
  ...

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Morten Stevens <mstevens@fedoraproject.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:41 +03:00
Davidlohr Bueso
55b7ae5016 ipc: rename ipc_obtain_object
...  to ipc_obtain_object_idr, which is more meaningful and makes the code
slightly easier to follow.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:58 -07:00
Davidlohr Bueso
c5c8975b2e ipc,shm: move BUG_ON check into shm_lock
Upon every shm_lock call, we BUG_ON if an error was returned, indicating
racing either in idr or in shm_destroy.  Move this logic into the locking.

[akpm@linux-foundation.org: simplify code]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-06-30 19:44:58 -07:00
Linus Torvalds
9ec3a646fe Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull fourth vfs update from Al Viro:
 "d_inode() annotations from David Howells (sat in for-next since before
  the beginning of merge window) + four assorted fixes"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  RCU pathwalk breakage when running into a symlink overmounting something
  fix I_DIO_WAKEUP definition
  direct-io: only inc/dec inode->i_dio_count for file systems
  fs/9p: fix readdir()
  VFS: assorted d_backing_inode() annotations
  VFS: fs/inode.c helpers: d_inode() annotations
  VFS: fs/cachefiles: d_backing_inode() annotations
  VFS: fs library helpers: d_inode() annotations
  VFS: assorted weird filesystems: d_inode() annotations
  VFS: normal filesystems (and lustre): d_inode() annotations
  VFS: security/: d_inode() annotations
  VFS: security/: d_backing_inode() annotations
  VFS: net/: d_inode() annotations
  VFS: net/unix: d_backing_inode() annotations
  VFS: kernel/: d_inode() annotations
  VFS: audit: d_backing_inode() annotations
  VFS: Fix up some ->d_inode accesses in the chelsio driver
  VFS: Cachefiles should perform fs modifications on the top layer only
  VFS: AF_UNIX sockets should call mknod on the top layer only
2015-04-26 17:22:07 -07:00
Joe Perches
7f032d6ef6 ipc: remove use of seq_printf return value
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
     seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-04-15 16:35:24 -07:00
David Howells
75c3cfa855 VFS: assorted weird filesystems: d_inode() annotations
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:06:58 -04:00
Dave Hansen
07a46ed27d shmdt: use i_size_read() instead of ->i_size
Andrew Morton noted

	http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org

that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy().  This converts
both users over to use i_size_read().

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Dave Hansen
d3c97900b4 ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
This is a highly-contrived scenario.  But, a single shmdt() call can be
induced in to unmapping memory from mulitple shm segments.  Example code
is here:

	http://www.sr71.net/~dave/intel/shmfun.c

The fix is pretty simple: Record the 'struct file' for the first VMA we
encounter and then stick to it.  Decline to unmap anything not from the
same file and thus the same segment.

I found this by inspection and the odds of anyone hitting this in practice
are pretty darn small.

Lightly tested, but it's a pretty small patch.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-12-13 12:42:52 -08:00
Oleg Nesterov
bf77b94c99 ipc/shm: kill the historical/wrong mm->start_stack check
do_shmat() is the only user of ->start_stack (proc just reports its
value), and this check looks ugly and wrong.

The reason for this check is not clear at all, and it wrongly assumes that
the stack can only grow down.

But the main problem is that in general mm->start_stack has nothing to do
with stack_vma->vm_start.  Not only the application can switch to another
stack and even unmap this area, setup_arg_pages() expands the stack
without updating mm->start_stack during exec().  This means that in the
likely case "addr > start_stack - size - PAGE_SIZE * 5" is simply
impossible after find_vma_intersection() == F, or the stack can't grow
anyway because of RLIMIT_STACK.

Many thanks to Hugh for his explanations.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-14 02:18:23 +02:00
Jack Miller
83293c0f5a shm: allow exit_shm in parallel if only marking orphans
If shm_rmid_force (the default state) is not set then the shmids are only
marked as orphaned and does not require any add, delete, or locking of the
tree structure.

Seperate the sysctl on and off case, and only obtain the read lock.  The
newly added list head can be deleted under the read lock because we are
only called with current and will only change the semids allocated by this
task and not manipulate the list.

This commit assumes that up_read includes a sufficient memory barrier for
the writes to be seen my others that later obtain a write lock.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Jack Miller
ab602f7991 shm: make exit_shm work proportional to task activity
This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.

Anton wrote a simple test to cause the issue:

  http://ozlabs.org/~anton/junkcode/bust_shm_exit.c

Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.

After this patchset, it will still produce output like:

  root@somehost:~# ./bust_shm_exit 1024 160
  ...
  INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
  INFO: Stall ended before state dump start
  ...

But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.

This patch (of 3):

exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace.  Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.

In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.

Add a list to the task struct of all shmids allocated by this task.  Init
the list head in copy_process.  Use the ns->rwsem for locking.  Add
segments after id is added, remove before removing from id.

On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.

I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct.  Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.

Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08 15:57:26 -07:00
Manfred Spraul
1376327ce1 ipc/shm.c: check for integer overflow during shmget.
SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes.  The actual allocation is that size, rounded up to the next full
page.

Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
09c6eb1f65 ipc/shm.c: check for overflows of shm_tot
shm_tot counts the total number of pages used by shm segments.

If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow.  Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.

The patch adds a detection for overflows.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Manfred Spraul
247a8ce822 ipc/shm.c: check for ulong overflows in shmat
The increase of SHMMAX/SHMALL is a 4 patch series.

The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.

SHMMAX:

- shmmem_file_setup places a hard limit on the segment size:
  MAX_LFS_FILESIZE.

  On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
  possible. Rounded up to full pages the actual allocated size
  is 0. --> must be fixed, patch 3

- shmat:
  - find_vma_intersection does not handle overflows properly.
    --> must be fixed, patch 1

  - the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
    and checks for overflows (i.e.: map 2 GB, starting from
    addr=2.5GB fails).

SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot overflows and
  returns 0.  --> must be fixed, patch 2.

Userspace:
- Obviously, there could be overflows in userspace. There is nothing
  we can do, only use values smaller than ULONG_MAX.
  I ended with "ULONG_MAX - 1L<<24":

  - TASK_SIZE cannot be used because it is the size of the current
    task. Could be 4G if it's a 32-bit task on a 64-bit kernel.

  - The maximum size is not standardized across archs:
    I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64.

  - Just in case some arch revives a 4G/4G split, nearly
    ULONG_MAX is a valid segment size.

  - Using "0" as a magic value for infinity is even worse, because
    right now 0 means 0, i.e. fail all allocations.

This patch (of 4):

find_vma_intersection() does not work as intended if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul McQuade
46c0a8ca3e ipc, kernel: clear whitespace
trailing whitespace

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Paul McQuade
7153e40273 ipc, kernel: use Linux headers
Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>

Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Mathias Krause
eb66ec44f8 ipc: constify ipc_ops
There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget.  Just declare it static and be
done with it.  While at it, constify it as we don't modify the structure
at runtime.

Found in the PaX patch, written by the PaX Team.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-06-06 16:08:14 -07:00
Davidlohr Bueso
8001c85810 ipc: standardize code comments
IPC commenting style is all over the place, *specially* in util.c.  This
patch orders things a bit.

Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Manfred Spraul
239521f31d ipc: whitespace cleanup
The ipc code does not adhere the typical linux coding style.
This patch fixes lots of simple whitespace errors.

- mostly autogenerated by
  scripts/checkpatch.pl -f --fix \
	--types=pointer_location,spacing,space_before_tab
- one manual fixup (keep structure members tab-aligned)
- removal of additional space_before_tab that were not found by --fix

Tested with some of my msg and sem test apps.

Andrew: Could you include it in -mm and move it towards Linus' tree?

Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Suggested-by: Li Bin <huawei.libin@huawei.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Rafael Aquini
0f3d2b0135 ipc: introduce ipc_valid_object() helper to sort out IPC_RMID races
After the locking semantics for the SysV IPC API got improved, a couple
of IPC_RMID race windows were opened because we ended up dropping the
'kern_ipc_perm.deleted' check performed way down in ipc_lock().  The
spotted races got sorted out by re-introducing the old test within the
racy critical sections.

This patch introduces ipc_valid_object() to consolidate the way we cope
with IPC_RMID races by using the same abstraction across the API
implementation.

Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-01-27 21:02:39 -08:00
Jesper Nilsson
3a72660b07 ipc,shm: correct error return value in shmctl (SHM_UNLOCK)
Commit 2caacaa82a ("ipc,shm: shorten critical region for shmctl")
restructured the ipc shm to shorten critical region, but introduced a
path where the return value could be -EPERM, even if the operation
actually was performed.

Before the commit, the err return value was reset by the return value
from security_shm_shmctl() after the if (!ns_capable(...)) statement.

Now, we still exit the if statement with err set to -EPERM, and in the
case of SHM_UNLOCK, it is not reset at all, and used as the return value
from shmctl.

To fix this, we only set err when errors occur, leaving the fallthrough
case alone.

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21 16:42:28 -08:00
Greg Thelen
a399b29dfb ipc,shm: fix shm_file deletion races
When IPC_RMID races with other shm operations there's potential for
use-after-free of the shm object's associated file (shm_file).

Here's the race before this patch:

  TASK 1                     TASK 2
  ------                     ------
  shm_rmid()
    ipc_lock_object()
                             shmctl()
                             shp = shm_obtain_object_check()

    shm_destroy()
      shum_unlock()
      fput(shp->shm_file)
                             ipc_lock_object()
                             shmem_lock(shp->shm_file)
                             <OOPS>

The oops is caused because shm_destroy() calls fput() after dropping the
ipc_lock.  fput() clears the file's f_inode, f_path.dentry, and
f_path.mnt, which causes various NULL pointer references in task 2.  I
reliably see the oops in task 2 if with shmlock, shmu

This patch fixes the races by:
1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
2) modify at risk operations to check shm_file while holding
   ipc_object_lock().

Example workloads, which each trigger oops...

Workload 1:
  while true; do
    id=$(shmget 1 4096)
    shm_rmid $id &
    shmlock $id &
    wait
  done

  The oops stack shows accessing NULL f_inode due to racing fput:
    _raw_spin_lock
    shmem_lock
    SyS_shmctl

Workload 2:
  while true; do
    id=$(shmget 1 4096)
    shmat $id 4096 &
    shm_rmid $id &
    wait
  done

  The oops stack is similar to workload 1 due to NULL f_inode:
    touch_atime
    shmem_mmap
    shm_mmap
    mmap_region
    do_mmap_pgoff
    do_shmat
    SyS_shmat

Workload 3:
  while true; do
    id=$(shmget 1 4096)
    shmlock $id
    shm_rmid $id &
    shmunlock $id &
    wait
  done

  The oops stack shows second fput tripping on an NULL f_inode.  The
  first fput() completed via from shm_destroy(), but a racing thread did
  a get_file() and queued this fput():
    locks_remove_flock
    __fput
    ____fput
    task_work_run
    do_notify_resume
    int_signal

Fixes: c2c737a046 ("ipc,shm: shorten critical region for shmat")
Fixes: 2caacaa82a ("ipc,shm: shorten critical region for shmctl")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>  # 3.10.17+ 3.11.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-11-21 16:42:27 -08:00
Davidlohr Bueso
53dad6d3a8 ipc: fix race with LSMs
Currently, IPC mechanisms do security and auditing related checks under
RCU.  However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition.  Manfred illustrates this nicely,
for instance with shared mem and selinux:

 -> do_shmat calls rcu_read_lock()
 -> do_shmat calls shm_object_check().
     Checks that the object is still valid - but doesn't acquire any locks.
     Then it returns.
 -> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
 -> selinux_shm_shmat calls ipc_has_perm()
 -> ipc_has_perm accesses ipc_perms->security

shm_close()
 -> shm_close acquires rw_mutex & shm_lock
 -> shm_close calls shm_destroy
 -> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
 -> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
 -> ipc_free_security calls kfree(ipc_perms->security)

This patch delays the freeing of the security structures after all RCU
readers are done.  Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept.  Linus states:

 "... the old behavior was suspect for another reason too: having the
  security blob go away from under a user sounds like it could cause
  various other problems anyway, so I think the old code was at least
  _prone_ to bugs even if it didn't have catastrophic behavior."

I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server.  In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-24 09:36:53 -07:00
Davidlohr Bueso
7a25dd9e04 ipc, shm: drop shm_lock_check
This function was replaced by a the lockless shm_obtain_object_check(),
and no longer has any users.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:44 -07:00
Davidlohr Bueso
530fcd16d8 ipc, shm: guard against non-existant vma in shmdt(2)
When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
VM area isn't found - check the return value of find_vma().

Also, remove the redundant -EINVAL return: retval is set to the proper
return code and *only* changed to 0, when we actually unmap the segments.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:44 -07:00
Davidlohr Bueso
d9a605e40b ipc: rename ids->rw_mutex
Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:42 -07:00
Davidlohr Bueso
c2c737a046 ipc,shm: shorten critical region for shmat
Similar to other system calls, acquire the kern_ipc_perm lock after doing
the initial permission and security checks.

[sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:42 -07:00
Davidlohr Bueso
f42569b138 ipc,shm: cleanup do_shmat pasta
Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels.  This makes shortening the critical
region of this function in the next patch a little easier to do and read.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:42 -07:00
Davidlohr Bueso
2caacaa82a ipc,shm: shorten critical region for shmctl
With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands.  Take the
shm_perm lock after doing the initial auditing and security checks.  The
rest of the logic remains unchanged.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:41 -07:00
Davidlohr Bueso
c97cb9ccab ipc,shm: make shmctl_nolock lockless
While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily.  We can do the permissions and security checks only
holding the rcu lock.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:39 -07:00
Davidlohr Bueso
68eccc1dc3 ipc,shm: introduce shmctl_nolock
Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.

Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl().  Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:39 -07:00
Davidlohr Bueso
79ccf0f8c8 ipc,shm: shorten critical region in shmctl_down
Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:39 -07:00
Davidlohr Bueso
8b8d52ac38 ipc,shm: introduce lockless functions to obtain the ipc object
This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock).  These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:

  http://lkml.org/lkml/2013/3/20/546 (sems)
  http://lkml.org/lkml/2013/5/15/584 (mqueues)

With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution
time by 50%.  A similar run, this time with IPC_SET, reduces the
execution time from 3 mins and 35 secs to 27 seconds.

Patches 1-8: replaces blindly taking the ipc lock for a smarter
combination of rcu and ipc_obtain_object, only acquiring the spinlock
when updating.

Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

Patch 10: is a trivial mqueue leftover cleanup

Patch 11: adds a brief lock scheme description, requested by Andrew.

This patch:

Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock.  Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-09-11 15:59:38 -07:00
Davidlohr Bueso
7b4cc5d841 ipc: move locking out of ipcctl_pre_down_nolock
This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.

Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:27 -07:00
Davidlohr Bueso
cf9d5d78d0 ipc: close open coded spin lock calls
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:27 -07:00
Davidlohr Bueso
dbfcd91f06 ipc: move rcu lock out of ipc_addid
This patchset continues the work that began in the sysv ipc semaphore
scaling series, see

  https://lkml.org/lkml/2013/3/20/546

Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.

This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first.  It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.

This patch:

Make all callers explicitly take and release the RCU read lock.

This addresses the two level locking seen in newary(), newseg() and
newqueue().  For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions.  The next patch will deal with the open coded locking for
->perm.lock

Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Andrew Morton
c103a4dc4a ipc/shmc.c: eliminate ugly 80-col tricks
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-07-09 10:33:26 -07:00
Li Zefan
091d0d55b2 shm: fix null pointer deref when userspace specifies invalid hugepage size
Dave reported an oops triggered by trinity:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
  IP: newseg+0x10d/0x390
  PGD cf8c1067 PUD cf8c2067 PMD 0
  Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
  CPU: 2 PID: 7636 Comm: trinity-child2 Not tainted 3.9.0+#67
  ...
  Call Trace:
    ipcget+0x182/0x380
    SyS_shmget+0x5a/0x60
    tracesys+0xdd/0xe2

This bug was introduced by commit af73e4d950 ("hugetlbfs: fix mmap
failure in unaligned size request").

Reported-by: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Li Zefan <lizfan@huawei.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-09 14:22:47 -07:00
Naoya Horiguchi
af73e4d950 hugetlbfs: fix mmap failure in unaligned size request
The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.

This is a regression introduced in commit 40716e2924 ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.

To fix this, this patch partially reverts that commit, and adds
alignment code in caller side.  And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881

[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: <iceman_dvd@yahoo.com>
Cc: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 18:38:27 -07:00