* refs/heads/tmp-c9d74f2
Linux 4.4.135
Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU"
Revert "vti4: Don't override MTU passed on link creation via IFLA_MTU"
Linux 4.4.134
s390/ftrace: use expoline for indirect branches
kdb: make "mdr" command repeat
Bluetooth: btusb: Add device ID for RTL8822BE
ASoC: samsung: i2s: Ensure the RCLK rate is properly determined
regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
scsi: lpfc: Fix frequency of Release WQE CQEs
scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing
scsi: lpfc: Fix issue_lip if link is disabled
netlabel: If PF_INET6, check sk_buff ip header version
selftests/net: fixes psock_fanout eBPF test case
perf report: Fix memory corruption in --branch-history mode --branch-history
perf tests: Use arch__compare_symbol_names to compare symbols
x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
drm/rockchip: Respect page offset for PRIME mmap calls
MIPS: Octeon: Fix logging messages with spurious periods after newlines
audit: return on memory error to avoid null pointer dereference
crypto: sunxi-ss - Add MODULE_ALIAS to sun4i-ss
clk: samsung: exynos3250: Fix PLL rates
clk: samsung: exynos5250: Fix PLL rates
clk: samsung: exynos5433: Fix PLL rates
clk: samsung: exynos5260: Fix PLL rates
clk: samsung: s3c2410: Fix PLL rates
media: cx25821: prevent out-of-bounds read on array card
udf: Provide saner default for invalid uid / gid
PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
serial: arc_uart: Fix out-of-bounds access through DT alias
serial: fsl_lpuart: Fix out-of-bounds access through DT alias
serial: imx: Fix out-of-bounds access through serial port index
serial: mxs-auart: Fix out-of-bounds access through serial port index
serial: samsung: Fix out-of-bounds access through serial port index
serial: xuartps: Fix out-of-bounds access through DT alias
rtc: tx4939: avoid unintended sign extension on a 24 bit shift
staging: rtl8192u: return -ENOMEM on failed allocation of priv->oldaddr
hwrng: stm32 - add reset during probe
enic: enable rq before updating rq descriptors
clk: rockchip: Prevent calculating mmc phase if clock rate is zero
media: em28xx: USB bulk packet size fix
dmaengine: pl330: fix a race condition in case of threaded irqs
media: s3c-camif: fix out-of-bounds array access
media: cx23885: Set subdev host data to clk_freq pointer
media: cx23885: Override 888 ImpactVCBe crystal frequency
ALSA: vmaster: Propagate slave error
x86/devicetree: Fix device IRQ settings in DT
x86/devicetree: Initialize device tree before using it
usb: gadget: composite: fix incorrect handling of OS desc requests
usb: gadget: udc: change comparison to bitshift when dealing with a mask
gfs2: Fix fallocate chunk size
cdrom: do not call check_disk_change() inside cdrom_open()
hwmon: (pmbus/adm1275) Accept negative page register values
hwmon: (pmbus/max8688) Accept negative page register values
perf/core: Fix perf_output_read_group()
ASoC: topology: create TLV data for dapm widgets
powerpc: Add missing prototype for arch_irq_work_raise()
usb: gadget: ffs: Execute copy_to_user() with USER_DS set
usb: gadget: ffs: Let setup() return USB_GADGET_DELAYED_STATUS
usb: dwc2: Fix interval type issue
ipmi_ssif: Fix kernel panic at msg_done_handler
PCI: Restore config space on runtime resume despite being unbound
MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
xhci: zero usb device slot_id member when disabling and freeing a xhci slot
KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
i2c: mv64xxx: Apply errata delay only in standard mode
ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
ACPICA: Events: add a return on failure from acpi_hw_register_read
bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
zorro: Set up z->dev.dma_mask for the DMA API
clk: Don't show the incorrect clock phase
cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
usb: dwc3: Update DWC_usb31 GTXFIFOSIZ reg fields
arm: dts: socfpga: fix GIC PPI warning
virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
ima: Fallback to the builtin hash algorithm
ima: Fix Kconfig to select TPM 2.0 CRB interface
ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
net/mlx5: Protect from command bit overflow
selftests: Print the test we're running to /dev/kmsg
tools/thermal: tmon: fix for segfault
powerpc/perf: Fix kernel address leak via sampling registers
powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
rtc: hctosys: Ensure system time doesn't overflow time_t
hwmon: (nct6775) Fix writing pwmX_mode
parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
m68k: set dma and coherent masks for platform FEC ethernets
powerpc/mpic: Check if cpu_possible() in mpic_physmask()
ACPI: acpi_pad: Fix memory leak in power saving threads
xen/acpi: off by one in read_acpi_id()
btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
Btrfs: fix copy_items() return value when logging an inode
btrfs: tests/qgroup: Fix wrong tree backref level
Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB
net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
rtc: snvs: Fix usage of snvs_rtc_enable
sparc64: Make atomic_xchg() an inline function rather than a macro.
fscache: Fix hanging wait on page discarded by writeback
KVM: VMX: raise internal error for exception during invalid protected mode state
sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
ocfs2/dlm: don't handle migrate lockres if already in shutdown
btrfs: Fix possible softlock on single core machines
Btrfs: fix NULL pointer dereference in log_dir_items
Btrfs: bail out on error during replay_dir_deletes
mm: fix races between address_space dereference and free in page_evicatable
mm/ksm: fix interaction with THP
dp83640: Ensure against premature access to PHY registers after reset
scsi: aacraid: Insure command thread is not recursively stopped
cpufreq: CPPC: Initialize shared perf capabilities of CPUs
Force log to disk before reading the AGF during a fstrim
sr: get/drop reference to device in revalidate and check_events
swap: divide-by-zero when zero length swap file on ssd
fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
sh: fix debug trap failure to process signals before return to user
net: mvneta: fix enable of all initialized RXQs
net: Fix untag for vlan packets without ethernet header
mm/kmemleak.c: wait for scan completion before disabling free
llc: properly handle dev_queue_xmit() return value
net-usb: add qmi_wwan if on lte modem wistron neweb d18q1
net/usb/qmi_wwan.c: Add USB id for lt4120 modem
net: qmi_wwan: add BroadMobi BM806U 2020:2033
ARM: 8748/1: mm: Define vdso_start, vdso_end as array
batman-adv: fix packet loss for broadcasted DHCP packets to a server
batman-adv: fix multicast-via-unicast transmission with AP isolation
selftests: ftrace: Add a testcase for probepoint
selftests: ftrace: Add a testcase for string type with kprobe_event
selftests: ftrace: Add probe event argument syntax testcase
mm/mempolicy.c: avoid use uninitialized preferred_node
RDMA/ucma: Correct option size check using optlen
perf/cgroup: Fix child event counting bug
vti4: Don't override MTU passed on link creation via IFLA_MTU
vti4: Don't count header length twice on tunnel setup
batman-adv: fix header size check in batadv_dbg_arp()
net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
sunvnet: does not support GSO for sctp
ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
workqueue: use put_device() instead of kfree()
bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
netfilter: ebtables: fix erroneous reject of last rule
USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM
xen: xenbus: use put_device() instead of kfree()
fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
scsi: sd: Keep disk read-only when re-reading partition
scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
usb: musb: call pm_runtime_{get,put}_sync before reading vbus registers
e1000e: allocate ring descriptors with dma_zalloc_coherent
e1000e: Fix check_for_link return value with autoneg off
watchdog: f71808e_wdt: Fix magic close handling
KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
Btrfs: send, fix issuing write op when processing hole in no data mode
xen/pirq: fix error path cleanup when binding MSIs
net/tcp/illinois: replace broken algorithm reference link
gianfar: Fix Rx byte accounting for ndev stats
sit: fix IFLA_MTU ignored on NEWLINK
bcache: fix kcrashes with fio in RAID5 backend dev
dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
virtio-gpu: fix ioctl and expose the fixed status to userspace.
r8152: fix tx packets accounting
clocksource/drivers/fsl_ftm_timer: Fix error return checking
nvme-pci: Fix nvme queue cleanup if IRQ setup fails
netfilter: ebtables: convert BUG_ONs to WARN_ONs
batman-adv: invalidate checksum on fragment reassembly
batman-adv: fix packet checksum in receive path
md/raid1: fix NULL pointer dereference
media: dmxdev: fix error code for invalid ioctls
x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
regulatory: add NUL to request alpha2
smsc75xx: fix smsc75xx_set_features()
ARM: OMAP: Fix dmtimer init for omap1
s390/cio: clear timer when terminating driver I/O
s390/cio: fix return code after missing interrupt
powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
md: raid5: avoid string overflow warning
locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
usb: musb: fix enumeration after resume
drm/exynos: fix comparison to bitshift when dealing with a mask
md raid10: fix NULL deference in handle_write_completed()
mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
NFC: llcp: Limit size of SDP URI
ARM: OMAP1: clock: Fix debugfs_create_*() usage
ARM: OMAP3: Fix prm wake interrupt for resume
ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
scsi: qla4xxx: skip error recovery in case of register disconnect.
scsi: aacraid: fix shutdown crash when init fails
scsi: storvsc: Increase cmd_per_lun for higher speed devices
selftests: memfd: add config fragment for fuse
usb: dwc2: Fix dwc2_hsotg_core_init_disconnected()
usb: gadget: fsl_udc_core: fix ep valid checks
usb: gadget: f_uac2: fix bFirstInterface in composite gadget
ARC: Fix malformed ARC_EMUL_UNALIGNED default
scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
scsi: sym53c8xx_2: iterator underflow in sym_getsync()
scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
irqchip/gic-v3: Change pr_debug message to pr_devel
locking/qspinlock: Ensure node->count is updated before initialising node
tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
bcache: return attach error when no cache set exist
bcache: fix for data collapse after re-attaching an attached device
bcache: fix for allocator and register thread race
bcache: properly set task state in bch_writeback_thread()
cifs: silence compiler warnings showing up with gcc-8.0.0
proc: fix /proc/*/map_files lookup
arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
RDS: IB: Fix null pointer issue
xen/grant-table: Use put_page instead of free_page
xen-netfront: Fix race between device setup and open
MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
ACPI: processor_perflib: Do not send _PPC change notification if not ready
firmware: dmi_scan: Fix handling of empty DMI strings
x86/power: Fix swsusp_arch_resume prototype
IB/ipoib: Fix for potential no-carrier state
mm: pin address_space before dereferencing it while isolating an LRU page
asm-generic: provide generic_pmdp_establish()
mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
mm/mempolicy: fix the check of nodemask from user
ocfs2: return error when we attempt to access a dirty bh in jbd2
ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
ntb_transport: Fix bug with max_mw_size parameter
RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
powerpc/numa: Ensure nodes initialized for hotplug
powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
scsi: fas216: fix sense buffer initialization
Btrfs: fix scrub to repair raid6 corruption
btrfs: Fix out of bounds access in btrfs_search_slot
Btrfs: set plug for fsync
ipmi/powernv: Fix error return code in ipmi_powernv_probe()
mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
kconfig: Fix expr_free() E_NOT leak
kconfig: Fix automatic menu creation mem leak
kconfig: Don't leak main menus during parsing
watchdog: sp5100_tco: Fix watchdog disable bit
nfs: Do not convert nfs_idmap_cache_timeout to jiffies
dm thin: fix documentation relative to low water mark threshold
tools lib traceevent: Fix get_field_str() for dynamic strings
perf callchain: Fix attr.sample_max_stack setting
tools lib traceevent: Simplify pointer print logic and fix %pF
PCI: Add function 1 DMA alias quirk for Marvell 9128
tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
ALSA: hda - Use IS_REACHABLE() for dependency on input
NFSv4: always set NFS_LOCK_LOST when a lock is lost.
firewire-ohci: work around oversized DMA reads on JMicron controllers
do d_instantiate/unlock_new_inode combinations safely
xfs: remove racy hasattr check from attr ops
kernel/signal.c: avoid undefined behaviour in kill_something_info
kernel/sys.c: fix potential Spectre v1 issue
kasan: fix memory hotplug during boot
ipc/shm: fix shmat() nil address after round-down when remapping
Revert "ipc/shm: Fix shmat mmap nil-page protection"
xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
libata: blacklist Micron 500IT SSD with MU01 firmware
libata: Blacklist some Sandisk SSDs for NCQ
mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
ALSA: timer: Fix pause event notification
aio: fix io_destroy(2) vs. lookup_ioctx() race
affs_lookup(): close a race with affs_remove_link()
KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
MIPS: ptrace: Expose FIR register through FP regset
UPSTREAM: sched/fair: Consider RT/IRQ pressure in capacity_spare_wake
Conflicts:
drivers/media/dvb-core/dmxdev.c
drivers/scsi/sd.c
drivers/scsi/ufs/ufshcd.c
drivers/usb/gadget/function/f_fs.c
fs/ecryptfs/inode.c
Change-Id: I15751ed8c82ec65ba7eedcb0d385b9f803c333f7
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
commit 8f89c007b6dec16a1793cb88de88fcc02117bbbc upstream.
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As
of this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Joe Lawrence <joe.lawrence@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a73ab244f0dad8fffb3291b905f73e2d3eaa7c00 upstream.
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be
modified to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
Commit 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
worked on the idea that we should not be mapping as root addr=0 and
MAP_FIXED. However, it was reported that this scenario is in fact
valid, thus making the patch both bogus and breaks userspace as well.
For example X11's libint10.so relies on shmat(1, SHM_RND) for lowmem
initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/int10/linux.c#n347
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Reported-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* refs/heads/tmp-b1c4836
Linux 4.4.129
writeback: safer lock nesting
fanotify: fix logic of events on child
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
autofs: mount point create should honour passed in mode
Don't leak MNT_INTERNAL away from internal mounts
rpc_pipefs: fix double-dput()
hypfs_kill_super(): deal with failed allocations
jffs2_kill_sb(): deal with failed allocations
powerpc/lib: Fix off-by-one in alternate feature patching
powerpc/eeh: Fix enabling bridge MMIO windows
MIPS: memset.S: Fix clobber of v1 in last_fixup
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: EVA & fault support for small_memset
MIPS: uaccess: Add micromips clobbers to bzero invocation
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
ALSA: hda - New VIA controller suppor no-snoop path
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: line6: Use correct endpoint type for midi output
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ext4: fix crashes in dioread_nolock mode
drm/radeon: Fix PCIe lane width calculation
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
vfio/pci: Virtualize Maximum Read Request Size
vfio/pci: Virtualize Maximum Payload Size
vfio-pci: Virtualize PCIe & AF FLR
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
watchdog: f71808e_wdt: Fix WD_EN register read
thermal: imx: Fix race condition in imx_thermal_probe()
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
clk: mvebu: armada-38x: add support for missing clocks
clk: mvebu: armada-38x: add support for 1866MHz variants
mmc: jz4740: Fix race condition in IRQ mask update
iommu/vt-d: Fix a potential memory leak
um: Use POSIX ucontext_t instead of struct ucontext
dmaengine: at_xdmac: fix rare residue corruption
IB/srp: Fix completion vector assignment algorithm
IB/srp: Fix srp_abort()
ALSA: pcm: Fix UAF at PCM release via PCM timer access
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ext4: fail ext4_iget for root directory if unallocated
ext4: don't update checksum of new initialized bitmaps
jbd2: if the journal is aborted then don't allow update of the log tail
random: use a tighter cap in credit_entropy_bits_safe()
thunderbolt: Resume control channel after hibernation image is created
ASoC: ssm2602: Replace reg_default_raw with reg_default
HID: core: Fix size as type u32
HID: Fix hid_report_len usage
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
HID: i2c-hid: fix size check and type usage
usb: dwc3: pci: Properly cleanup resource
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
regmap: Fix reversed bounds check in regmap_raw_write()
xen-netfront: Fix hang on device removal
ARM: dts: at91: sama5d4: fix pinctrl compatible string
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
usb: musb: gadget: misplaced out of bounds check
mm, slab: reschedule cache_reap() on the same CPU
ipc/shm: fix use-after-free of shm file via remap_file_pages()
resource: fix integer overflow at reallocation
fs/reiserfs/journal.c: add missing resierfs_warning() arg
ubi: Reject MLC NAND
ubi: Fix error for write access
ubi: fastmap: Don't flush fastmap work on detach
ubifs: Check ubifs_wbuf_sync() return code
tty: make n_tty_read() always abort if hangup is in progress
x86/hweight: Don't clobber %rdi
x86/hweight: Get rid of the special calling convention
lan78xx: Correctly indicate invalid OTP
slip: Check if rstate is initialized before uncompressing
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
hwmon: (ina2xx) Fix access to uninitialized mutex
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
getname_kernel() needs to make sure that ->name != ->iname in long case
s390/ipl: ensure loadparm valid flag is set
s390/qdio: don't merge ERROR output buffers
s390/qdio: don't retry EQBS after CCQ 96
block/loop: fix deadlock after loop_set_status
Revert "perf tests: Decompress kernel module before objdump"
radeon: hide pointless #warning when compile testing
perf intel-pt: Fix timestamp following overflow
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix sync_switch
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
parisc: Fix out of array access in match_pci_device()
media: v4l2-compat-ioctl32: don't oops on overlay
f2fs: check cap_resource only for data blocks
Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
f2fs: clear PageError on writepage
UPSTREAM: timer: Export destroy_hrtimer_on_stack()
BACKPORT: dm verity: add 'check_at_most_once' option to only validate hashes once
f2fs: call unlock_new_inode() before d_instantiate()
f2fs: refactor read path to allow multiple postprocessing steps
fscrypt: allow synchronous bio decryption
Change-Id: I45f4ac10734d92023b53118d83dcd6c83974a283
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
commit 3f05317d9889ab75c7190dcd39491d2a97921984 upstream.
syzbot reported a use-after-free of shm_file_data(file)->file->f_op in
shm_get_unmapped_area(), called via sys_remap_file_pages().
Unfortunately it couldn't generate a reproducer, but I found a bug which
I think caused it. When remap_file_pages() is passed a full System V
shared memory segment, the memory is first unmapped, then a new map is
created using the ->vm_file. Between these steps, the shm ID can be
removed and reused for a new shm segment. But, shm_mmap() only checks
whether the ID is currently valid before calling the underlying file's
->mmap(); it doesn't check whether it was reused. Thus it can use the
wrong underlying file, one that was already freed.
Fix this by making the "outer" shm file (the one that gets put in
->vm_file) hold a reference to the real shm file, and by making
__shm_open() require that the file associated with the shm ID matches
the one associated with the "outer" file.
Taking the reference to the real shm file is needed to fully solve the
problem, since otherwise sfd->file could point to a freed file, which
then could be reallocated for the reused shm ID, causing the wrong shm
segment to be mapped (and without the required permission checks).
Commit 1ac0b6dec656 ("ipc/shm: handle removed segments gracefully in
shm_mmap()") almost fixed this bug, but it didn't go far enough because
it didn't consider the case where the shm ID is reused.
The following program usually reproduces this bug:
#include <stdlib.h>
#include <sys/shm.h>
#include <sys/syscall.h>
#include <unistd.h>
int main()
{
int is_parent = (fork() != 0);
srand(getpid());
for (;;) {
int id = shmget(0xF00F, 4096, IPC_CREAT|0700);
if (is_parent) {
void *addr = shmat(id, NULL, 0);
usleep(rand() % 50);
while (!syscall(__NR_remap_file_pages, addr, 4096, 0, 0, 0));
} else {
usleep(rand() % 50);
shmctl(id, IPC_RMID, NULL);
}
}
}
It causes the following NULL pointer dereference due to a 'struct file'
being used while it's being freed. (I couldn't actually get a KASAN
use-after-free splat like in the syzbot report. But I think it's
possible with this bug; it would just take a more extraordinary race...)
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
PGD 0 P4D 0
Oops: 0000 [#1] SMP NOPTI
CPU: 9 PID: 258 Comm: syz_ipc Not tainted 4.16.0-05140-gf8cf2f16a7c95 #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-20171110_100015-anatol 04/01/2014
RIP: 0010:d_inode include/linux/dcache.h:519 [inline]
RIP: 0010:touch_atime+0x25/0xd0 fs/inode.c:1724
[...]
Call Trace:
file_accessed include/linux/fs.h:2063 [inline]
shmem_mmap+0x25/0x40 mm/shmem.c:2149
call_mmap include/linux/fs.h:1789 [inline]
shm_mmap+0x34/0x80 ipc/shm.c:465
call_mmap include/linux/fs.h:1789 [inline]
mmap_region+0x309/0x5b0 mm/mmap.c:1712
do_mmap+0x294/0x4a0 mm/mmap.c:1483
do_mmap_pgoff include/linux/mm.h:2235 [inline]
SYSC_remap_file_pages mm/mmap.c:2853 [inline]
SyS_remap_file_pages+0x232/0x310 mm/mmap.c:2769
do_syscall_64+0x64/0x1a0 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x42/0xb7
[ebiggers@google.com: add comment]
Link: http://lkml.kernel.org/r/20180410192850.235835-1-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20180409043039.28915-1-ebiggers3@gmail.com
Reported-by: syzbot+d11f321e7f1923157eac80aa990b446596f46439@syzkaller.appspotmail.com
Fixes: c8d78c1823 ("mm: replace remap_file_pages() syscall with emulation")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
refs/heads/tmp-28ec98b:
Linux 4.4.55
ext4: don't BUG when truncating encrypted inodes on the orphan list
dm: flush queued bios when process blocks to avoid deadlock
nfit, libnvdimm: fix interleave set cookie calculation
s390/kdump: Use "LINUX" ELF note name instead of "CORE"
KVM: s390: Fix guest migration for huge guests resulting in panic
mvsas: fix misleading indentation
serial: samsung: Continue to work if DMA request fails
USB: serial: io_ti: fix information leak in completion handler
USB: serial: io_ti: fix NULL-deref in interrupt callback
USB: iowarrior: fix NULL-deref in write
USB: iowarrior: fix NULL-deref at probe
USB: serial: omninet: fix reference leaks at open
USB: serial: safe_serial: fix information leak in completion handler
usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers
usb: host: xhci-dbg: HCIVERSION should be a binary number
usb: gadget: function: f_fs: pass companion descriptor along
usb: dwc3: gadget: make Set Endpoint Configuration macros safe
usb: gadget: dummy_hcd: clear usb_gadget region before registration
powerpc: Emulation support for load/store instructions on LE
tracing: Add #undef to fix compile error
MIPS: Netlogic: Fix CP0_EBASE redefinition warnings
MIPS: DEC: Avoid la pseudo-instruction in delay slots
mm: memcontrol: avoid unused function warning
cpmac: remove hopeless #warning
MIPS: ralink: Remove unused rt*_wdt_reset functions
MIPS: ralink: Cosmetic change to prom_init().
mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy
MIPS: Update lemote2f_defconfig for CPU_FREQ_STAT change
MIPS: ip22: Fix ip28 build for modern gcc
MIPS: Update ip27_defconfig for SCSI_DH change
MIPS: ip27: Disable qlge driver in defconfig
MIPS: Update defconfigs for NF_CT_PROTO_DCCP/UDPLITE change
crypto: improve gcc optimization flags for serpent and wp512
USB: serial: digi_acceleport: fix OOB-event processing
USB: serial: digi_acceleport: fix OOB data sanity check
Linux 4.4.54
drivers: hv: Turn off write permission on the hypercall page
fat: fix using uninitialized fields of fat_inode/fsinfo_inode
libceph: use BUG() instead of BUG_ON(1)
drm/i915/dsi: Do not clear DPOUNIT_CLOCK_GATE_DISABLE from vlv_init_display_clock_gating
fakelb: fix schedule while atomic
drm/atomic: fix an error code in mode_fixup()
drm/ttm: Make sure BOs being swapped out are cacheable
drm/edid: Add EDID_QUIRK_FORCE_8BPC quirk for Rotel RSX-1058
drm/ast: Fix AST2400 POST failure without BMC FW or VBIOS
drm/ast: Call open_key before enable_mmio in POST code
drm/ast: Fix test for VGA enabled
drm/amdgpu: add more cases to DCE11 possible crtc mask setup
mac80211: flush delayed work when entering suspend
xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD
pwm: pca9685: Fix period change with same duty cycle
nlm: Ensure callback code also checks that the files match
target: Fix NULL dereference during LUN lookup + active I/O shutdown
ceph: remove req from unsafe list when unregistering it
ktest: Fix child exit code processing
IB/srp: Fix race conditions related to task management
IB/srp: Avoid that duplicate responses trigger a kernel bug
IB/IPoIB: Add destination address when re-queue packet
IB/ipoib: Fix deadlock between rmmod and set_mode
mnt: Tuck mounts under others instead of creating shadow/side mounts.
net: mvpp2: fix DMA address calculation in mvpp2_txq_inc_put()
s390: use correct input data address for setup_randomness
s390: make setup_randomness work
s390: TASK_SIZE for kernel threads
s390/dcssblk: fix device size calculation in dcssblk_direct_access()
s390/qdio: clear DSCI prior to scanning multiple input queues
Bluetooth: Add another AR3012 04ca:3018 device
KVM: VMX: use correct vmcs_read/write for guest segment selector/base
KVM: s390: Disable dirty log retrieval for UCONTROL guests
serial: 8250_pci: Add MKS Tenta SCOM-0800 and SCOM-0801 cards
tty: n_hdlc: get rid of racy n_hdlc.tbuf
TTY: n_hdlc, fix lockdep false positive
Linux 4.4.53
scsi: lpfc: Correct WQ creation for pagesize
MIPS: IP22: Fix build error due to binutils 2.25 uselessnes.
MIPS: IP22: Reformat inline assembler code to modern standards.
powerpc/xmon: Fix data-breakpoint
dmaengine: ipu: Make sure the interrupt routine checks all interrupts.
bcma: use (get|put)_device when probing/removing device driver
md linear: fix a race between linear_add() and linear_congested()
rtc: sun6i: Switch to the external oscillator
rtc: sun6i: Add some locking
NFSv4: fix getacl ERANGE for some ACL buffer sizes
NFSv4: fix getacl head length estimation
NFSv4: Fix memory and state leak in _nfs4_open_and_get_state
nfsd: special case truncates some more
nfsd: minor nfsd_setattr cleanup
rtlwifi: rtl8192c-common: Fix "BUG: KASAN:
rtlwifi: Fix alignment issues
gfs2: Add missing rcu locking for glock lookup
rdma_cm: fail iwarp accepts w/o connection params
RDMA/core: Fix incorrect structure packing for booleans
Drivers: hv: util: Backup: Fix a rescind processing issue
Drivers: hv: util: Fcopy: Fix a rescind processing issue
Drivers: hv: util: kvp: Fix a rescind processing issue
hv: init percpu_list in hv_synic_alloc()
hv: allocate synic pages for all present CPUs
usb: gadget: udc: fsl: Add missing complete function.
usb: host: xhci: plat: check hcc_params after add hcd
usb: musb: da8xx: Remove CPPI 3.0 quirk and methods
w1: ds2490: USB transfer buffers need to be DMAable
w1: don't leak refcount on slave attach failure in w1_attach_slave_device()
can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer
iio: pressure: mpl3115: do not rely on structure field ordering
iio: pressure: mpl115: do not rely on structure field ordering
arm/arm64: KVM: Enforce unconditional flush to PoC when mapping to stage-2
fuse: add missing FR_FORCE
crypto: testmgr - Pad aes_ccm_enc_tv_template vector
ath9k: use correct OTP register offsets for the AR9340 and AR9550
ath9k: fix race condition in enabling/disabling IRQs
ath5k: drop bogus warning on drv_set_key with unsupported cipher
target: Fix multi-session dynamic se_node_acl double free OOPs
target: Obtain se_node_acl->acl_kref during get_initiator_node_acl
samples/seccomp: fix 64-bit comparison macros
ext4: return EROFS if device is r/o and journal replay is needed
ext4: preserve the needs_recovery flag when the journal is aborted
ext4: fix inline data error paths
ext4: fix data corruption in data=journal mode
ext4: trim allocation requests to group size
ext4: do not polute the extents cache while shifting extents
ext4: Include forgotten start block on fallocate insert range
loop: fix LO_FLAGS_PARTSCAN hang
block/loop: fix race between I/O and set_status
jbd2: don't leak modified metadata buffers on an aborted journal
Fix: Disable sys_membarrier when nohz_full is enabled
sd: get disk reference in sd_check_events()
scsi: use 'scsi_device_from_queue()' for scsi_dh
scsi: aacraid: Reorder Adapter status check
scsi: storvsc: properly set residual data length on errors
scsi: storvsc: properly handle SRB_ERROR when sense message is present
scsi: storvsc: use tagged SRB requests if supported by the device
dm stats: fix a leaked s->histogram_boundaries array
dm cache: fix corruption seen when using cache > 2TB
ipc/shm: Fix shmat mmap nil-page protection
mm: do not access page->mapping directly on page_endio
mm: vmpressure: fix sending wrong events on underflow
mm/page_alloc: fix nodes for reclaim in fast path
iommu/vt-d: Tylersburg isoch identity map check is done too late.
iommu/vt-d: Fix some macros that are incorrectly specified in intel-iommu
regulator: Fix regulator_summary for deviceless consumers
staging: rtl: fix possible NULL pointer dereference
ALSA: hda - Fix micmute hotkey problem for a lenovo AIO machine
ALSA: hda - Add subwoofer support for Dell Inspiron 17 7000 Gaming
ALSA: seq: Fix link corruption by event error handling
ALSA: ctxfi: Fallback DMA mask to 32bit
ALSA: timer: Reject user params with too small ticks
ALSA: hda - fix Lewisburg audio issue
ALSA: hda/realtek - Cannot adjust speaker's volume on a Dell AIO
ARM: dts: at91: Enable DMA on sama5d2_xplained console
ARM: dts: at91: Enable DMA on sama5d4_xplained console
ARM: at91: define LPDDR types
media: fix dm1105.c build error
uvcvideo: Fix a wrong macro
am437x-vpfe: always assign bpp variable
MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps
MIPS: Calculate microMIPS ra properly when unwinding the stack
MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions
MIPS: Fix get_frame_info() handling of microMIPS function size
MIPS: Prevent unaligned accesses during stack unwinding
MIPS: Clear ISA bit correctly in get_frame_info()
MIPS: Lantiq: Keep ethernet enabled during boot
MIPS: OCTEON: Fix copy_from_user fault handling for large buffers
MIPS: BCM47XX: Fix button inversion for Asus WL-500W
MIPS: Fix special case in 64 bit IP checksumming.
samples: move mic/mpssd example code from Documentation
Linux 4.4.52
kvm: vmx: ensure VMCS is current while enabling PML
Revert "usb: chipidea: imx: enable CI_HDRC_SET_NON_ZERO_TTHA"
rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down
block: fix double-free in the failure path of cgwb_bdi_init()
goldfish: Sanitize the broken interrupt handler
x86/platform/goldfish: Prevent unconditional loading
USB: serial: ark3116: fix register-accessor error handling
USB: serial: opticon: fix CTS retrieval at open
USB: serial: spcp8x5: fix modem-status handling
USB: serial: ftdi_sio: fix line-status over-reporting
USB: serial: ftdi_sio: fix extreme low-latency setting
USB: serial: ftdi_sio: fix modem-status error handling
USB: serial: cp210x: add new IDs for GE Bx50v3 boards
USB: serial: mos7840: fix another NULL-deref at open
tty: serial: msm: Fix module autoload
net: socket: fix recvmmsg not returning error from sock_error
ip: fix IP_CHECKSUM handling
irda: Fix lockdep annotations in hashbin_delete().
dccp: fix freeing skb too early for IPV6_RECVPKTINFO
packet: Do not call fanout_release from atomic contexts
packet: fix races in fanout_add()
net/llc: avoid BUG_ON() in skb_orphan()
blk-mq: really fix plug list flushing for nomerge queues
rtc: interface: ignore expired timers when enqueuing new timers
rtlwifi: rtl_usb: Fix missing entry in USB driver's private data
Linux 4.4.51
mmc: core: fix multi-bit bus width without high-speed mode
bcache: Make gc wakeup sane, remove set_task_state()
ntb_transport: Pick an unused queue
NTB: ntb_transport: fix debugfs_remove_recursive
printk: use rcuidle console tracepoint
ARM: 8658/1: uaccess: fix zeroing of 64-bit get_user()
futex: Move futex_init() to core_initcall
drm/dp/mst: fix kernel oops when turning off secondary monitor
drm/radeon: Use mode h/vdisplay fields to hide out of bounds HW cursor
Input: elan_i2c - add ELAN0605 to the ACPI table
Fix missing sanity check in /dev/sg
scsi: don't BUG_ON() empty DMA transfers
fuse: fix use after free issue in fuse_dev_do_read()
siano: make it work again with CONFIG_VMAP_STACK
vfs: fix uninitialized flags in splice_to_pipe()
Linux 4.4.50
l2tp: do not use udp_ioctl()
ping: fix a null pointer dereference
packet: round up linear to header len
net: introduce device min_header_len
sit: fix a double free on error path
sctp: avoid BUG_ON on sctp_wait_for_sndbuf
mlx4: Invoke softirqs after napi_reschedule
macvtap: read vnet_hdr_size once
tun: read vnet_hdr_sz once
tcp: avoid infinite loop in tcp_splice_read()
ipv6: tcp: add a missing tcp_v6_restore_cb()
ip6_gre: fix ip6gre_err() invalid reads
netlabel: out of bound access in cipso_v4_validate()
ipv4: keep skb->dst around in presence of IP options
net: use a work queue to defer net_disable_timestamp() work
tcp: fix 0 divide in __tcp_select_window()
ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim()
ipv6: fix ip6_tnl_parse_tlv_enc_lim()
can: Fix kernel panic at security_sock_rcv_skb
Conflicts:
drivers/scsi/sd.c
drivers/usb/gadget/function/f_fs.c
drivers/usb/host/xhci-plat.c
CRs-Fixed: 2023471
Change-Id: I396051a8de30271af77b3890d4b19787faa1c31e
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
commit 95e91b831f87ac8e1f8ed50c14d709089b4e01b8 upstream.
The issue is described here, with a nice testcase:
https://bugzilla.kernel.org/show_bug.cgi?id=192931
The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and
the address rounded down to 0. For the regular mmap case, the
protection mentioned above is that the kernel gets to generate the
address -- arch_get_unmapped_area() will always check for MAP_FIXED and
return that address. So by the time we do security_mmap_addr(0) things
get funky for shmat().
The testcase itself shows that while a regular user crashes, root will
not have a problem attaching a nil-page. There are two possible fixes
to this. The first, and which this patch does, is to simply allow root
to crash as well -- this is also regular mmap behavior, ie when hacking
up the testcase and adding mmap(... |MAP_FIXED). While this approach
is the safer option, the second alternative is to ignore SHM_RND if the
rounded address is 0, thus only having MAP_SHARED flags. This makes the
behavior of shmat() identical to the mmap() case. The downside of this
is obviously user visible, but does make sense in that it maintains
semantics after the round-down wrt 0 address and mmap.
Passes shm related ltp tests.
Link: http://lkml.kernel.org/r/1486050195-18629-1-git-send-email-dave@stgolabs.net
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reported-by: Gareth Evans <gareth.evans@contextis.co.uk>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
While compiling for usermode linux for x86 architecture, observed
compilation issues with probable usage of uninitialized variables.
This change initializes the variables.
Signed-off-by: Jeevan Shriram <jshriram@codeaurora.org>
commit 1ac0b6dec656f3f78d1c3dd216fad84cb4d0a01e upstream.
remap_file_pages(2) emulation can reach file which represents removed
IPC ID as long as a memory segment is mapped. It breaks expectations of
IPC subsystem.
Test case (rewritten to be more human readable, originally autogenerated
by syzkaller[1]):
#define _GNU_SOURCE
#include <stdlib.h>
#include <sys/ipc.h>
#include <sys/mman.h>
#include <sys/shm.h>
#define PAGE_SIZE 4096
int main()
{
int id;
void *p;
id = shmget(IPC_PRIVATE, 3 * PAGE_SIZE, 0);
p = shmat(id, NULL, 0);
shmctl(id, IPC_RMID, NULL);
remap_file_pages(p, 3 * PAGE_SIZE, 0, 7, 0);
return 0;
}
The patch changes shm_mmap() and code around shm_lock() to propagate
locking error back to caller of shm_mmap().
[1] http://github.com/google/syzkaller
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As reported by Dmitry Vyukov, we really shouldn't do ipc_addid() before
having initialized the IPC object state. Yes, we initialize the IPC
object in a locked state, but with all the lockless RCU lookup work,
that IPC object lock no longer means that the state cannot be seen.
We already did this for the IPC semaphore code (see commit e8577d1f03:
"ipc/sem.c: fully initialize sem_array before making it visible") but we
clearly forgot about msg and shm.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Considering Linus' past rants about the (ab)use of BUG in the kernel, I
took a look at how we deal with such calls in ipc. Given that any errors
or corruption in ipc code are most likely contained within the set of
processes participating in the broken mechanisms, there aren't really many
strong fatal system failure scenarios that would require a BUG call.
Also, if something is seriously wrong, ipc might not be the place for such
a BUG either.
1. For example, recently, a customer hit one of these BUG_ONs in shm
after failing shm_lock(). A busted ID imho does not merit a BUG_ON,
and WARN would have been better.
2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
I don't see how we can hit this anyway -- at least it should be IS_ERR.
The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
first and foremost. We could also probably drop this check altogether.
Either way, it does not merit a BUG_ON.
3. No ->fault() callback for the fs getting the corresponding page --
seems selfish to make the system unusable.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The shm implementation internally uses shmem or hugetlbfs inodes for shm
segments. As these inodes are never directly exposed to userspace and
only accessed through the shm operations which are already hooked by
security modules, mark the inodes with the S_PRIVATE flag so that inode
security initialization and permission checking is skipped.
This was motivated by the following lockdep warning:
======================================================
[ INFO: possible circular locking dependency detected ]
4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W
-------------------------------------------------------
httpd/1597 is trying to acquire lock:
(&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
but task is already holding lock:
(&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&mm->mmap_sem){++++++}:
lock_acquire+0xc7/0x270
__might_fault+0x7a/0xa0
filldir+0x9e/0x130
xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
xfs_readdir+0x1b4/0x330 [xfs]
xfs_file_readdir+0x2b/0x30 [xfs]
iterate_dir+0x97/0x130
SyS_getdents+0x91/0x120
entry_SYSCALL_64_fastpath+0x12/0x76
-> #2 (&xfs_dir_ilock_class){++++.+}:
lock_acquire+0xc7/0x270
down_read_nested+0x57/0xa0
xfs_ilock+0x167/0x350 [xfs]
xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
xfs_attr_get+0xbd/0x190 [xfs]
xfs_xattr_get+0x3d/0x70 [xfs]
generic_getxattr+0x4f/0x70
inode_doinit_with_dentry+0x162/0x670
sb_finish_set_opts+0xd9/0x230
selinux_set_mnt_opts+0x35c/0x660
superblock_doinit+0x77/0xf0
delayed_superblock_init+0x10/0x20
iterate_supers+0xb3/0x110
selinux_complete_init+0x2f/0x40
security_load_policy+0x103/0x600
sel_write_load+0xc1/0x750
__vfs_write+0x37/0x100
vfs_write+0xa9/0x1a0
SyS_write+0x58/0xd0
entry_SYSCALL_64_fastpath+0x12/0x76
...
Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Reported-by: Morten Stevens <mstevens@fedoraproject.org>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
... to ipc_obtain_object_idr, which is more meaningful and makes the code
slightly easier to follow.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Upon every shm_lock call, we BUG_ON if an error was returned, indicating
racing either in idr or in shm_destroy. Move this logic into the locking.
[akpm@linux-foundation.org: simplify code]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull fourth vfs update from Al Viro:
"d_inode() annotations from David Howells (sat in for-next since before
the beginning of merge window) + four assorted fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
RCU pathwalk breakage when running into a symlink overmounting something
fix I_DIO_WAKEUP definition
direct-io: only inc/dec inode->i_dio_count for file systems
fs/9p: fix readdir()
VFS: assorted d_backing_inode() annotations
VFS: fs/inode.c helpers: d_inode() annotations
VFS: fs/cachefiles: d_backing_inode() annotations
VFS: fs library helpers: d_inode() annotations
VFS: assorted weird filesystems: d_inode() annotations
VFS: normal filesystems (and lustre): d_inode() annotations
VFS: security/: d_inode() annotations
VFS: security/: d_backing_inode() annotations
VFS: net/: d_inode() annotations
VFS: net/unix: d_backing_inode() annotations
VFS: kernel/: d_inode() annotations
VFS: audit: d_backing_inode() annotations
VFS: Fix up some ->d_inode accesses in the chelsio driver
VFS: Cachefiles should perform fs modifications on the top layer only
VFS: AF_UNIX sockets should call mknod on the top layer only
The seq_printf return value, because it's frequently misused,
will eventually be converted to void.
See: commit 1f33c41c03 ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton noted
http://lkml.kernel.org/r/20141104142027.a7a0d010772d84560b445f59@linux-foundation.org
that the shmdt uses inode->i_size outside of i_mutex being held.
There is one more case in shm.c in shm_destroy(). This converts
both users over to use i_size_read().
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is a highly-contrived scenario. But, a single shmdt() call can be
induced in to unmapping memory from mulitple shm segments. Example code
is here:
http://www.sr71.net/~dave/intel/shmfun.c
The fix is pretty simple: Record the 'struct file' for the first VMA we
encounter and then stick to it. Decline to unmap anything not from the
same file and thus the same segment.
I found this by inspection and the odds of anyone hitting this in practice
are pretty darn small.
Lightly tested, but it's a pretty small patch.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
do_shmat() is the only user of ->start_stack (proc just reports its
value), and this check looks ugly and wrong.
The reason for this check is not clear at all, and it wrongly assumes that
the stack can only grow down.
But the main problem is that in general mm->start_stack has nothing to do
with stack_vma->vm_start. Not only the application can switch to another
stack and even unmap this area, setup_arg_pages() expands the stack
without updating mm->start_stack during exec(). This means that in the
likely case "addr > start_stack - size - PAGE_SIZE * 5" is simply
impossible after find_vma_intersection() == F, or the stack can't grow
anyway because of RLIMIT_STACK.
Many thanks to Hugh for his explanations.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If shm_rmid_force (the default state) is not set then the shmids are only
marked as orphaned and does not require any add, delete, or locking of the
tree structure.
Seperate the sysctl on and off case, and only obtain the read lock. The
newly added list head can be deleted under the read lock because we are
only called with current and will only change the semids allocated by this
task and not manipulate the list.
This commit assumes that up_read includes a sufficient memory barrier for
the writes to be seen my others that later obtain a write lock.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is small set of patches our team has had kicking around for a few
versions internally that fixes tasks getting hung on shm_exit when there
are many threads hammering it at once.
Anton wrote a simple test to cause the issue:
http://ozlabs.org/~anton/junkcode/bust_shm_exit.c
Before applying this patchset, this test code will cause either hanging
tracebacks or pthread out of memory errors.
After this patchset, it will still produce output like:
root@somehost:~# ./bust_shm_exit 1024 160
...
INFO: rcu_sched detected stalls on CPUs/tasks: {} (detected by 116, t=2111 jiffies, g=241, c=240, q=7113)
INFO: Stall ended before state dump start
...
But the task will continue to run along happily, so we consider this an
improvement over hanging, even if it's a bit noisy.
This patch (of 3):
exit_shm obtains the ipc_ns shm rwsem for write and holds it while it
walks every shared memory segment in the namespace. Thus the amount of
work is related to the number of shm segments in the namespace not the
number of segments that might need to be cleaned.
In addition, this occurs after the task has been notified the thread has
exited, so the number of tasks waiting for the ns shm rwsem can grow
without bound until memory is exausted.
Add a list to the task struct of all shmids allocated by this task. Init
the list head in copy_process. Use the ns->rwsem for locking. Add
segments after id is added, remove before removing from id.
On unshare of NEW_IPCNS orphan any ids as if the task had exited, similar
to handling of semaphore undo.
I chose a define for the init sequence since its a simple list init,
otherwise it would require a function call to avoid include loops between
the semaphore code and the task struct. Converting the list_del to
list_del_init for the unshare cases would remove the exit followed by
init, but I left it blow up if not inited.
Signed-off-by: Milton Miller <miltonm@bga.com>
Signed-off-by: Jack Miller <millerjo@us.ibm.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Anton Blanchard <anton@samba.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SHMMAX is the upper limit for the size of a shared memory segment, counted
in bytes. The actual allocation is that size, rounded up to the next full
page.
Add a check that prevents the creation of segments where the rounded up
size causes an integer overflow.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
shm_tot counts the total number of pages used by shm segments.
If SHMALL is ULONG_MAX (or nearly ULONG_MAX), then the number can
overflow. Subsequent calls to shmctl(,SHM_INFO,) would return wrong
values for shm_tot.
The patch adds a detection for overflows.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The increase of SHMMAX/SHMALL is a 4 patch series.
The change itself is trivial, the only problem are interger overflows.
The overflows are not new, but if we make huge values the default, then
the code should be free from overflows.
SHMMAX:
- shmmem_file_setup places a hard limit on the segment size:
MAX_LFS_FILESIZE.
On 32-bit, the limit is > 1 TB, i.e. 4 GB-1 byte segments are
possible. Rounded up to full pages the actual allocated size
is 0. --> must be fixed, patch 3
- shmat:
- find_vma_intersection does not handle overflows properly.
--> must be fixed, patch 1
- the rest is fine, do_mmap_pgoff limits mappings to TASK_SIZE
and checks for overflows (i.e.: map 2 GB, starting from
addr=2.5GB fails).
SHMALL:
- after creating 8192 segments size (1L<<63)-1, shm_tot overflows and
returns 0. --> must be fixed, patch 2.
Userspace:
- Obviously, there could be overflows in userspace. There is nothing
we can do, only use values smaller than ULONG_MAX.
I ended with "ULONG_MAX - 1L<<24":
- TASK_SIZE cannot be used because it is the size of the current
task. Could be 4G if it's a 32-bit task on a 64-bit kernel.
- The maximum size is not standardized across archs:
I found TASK_MAX_SIZE, TASK_SIZE_MAX and TASK_SIZE_64.
- Just in case some arch revives a 4G/4G split, nearly
ULONG_MAX is a valid segment size.
- Using "0" as a magic value for infinity is even worse, because
right now 0 means 0, i.e. fail all allocations.
This patch (of 4):
find_vma_intersection() does not work as intended if addr+size overflows.
The patch adds a manual check before the call to find_vma_intersection.
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Acked-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use #include <linux/uaccess.h> instead of <asm/uaccess.h>
Use #include <linux/types.h> instead of <asm/types.h>
Signed-off-by: Paul McQuade <paulmcquad@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
There is no need to recreate the very same ipc_ops structure on every
kernel entry for msgget/semget/shmget. Just declare it static and be
done with it. While at it, constify it as we don't modify the structure
at runtime.
Found in the PaX patch, written by the PaX Team.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: PaX Team <pageexec@freemail.hu>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
IPC commenting style is all over the place, *specially* in util.c. This
patch orders things a bit.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The ipc code does not adhere the typical linux coding style.
This patch fixes lots of simple whitespace errors.
- mostly autogenerated by
scripts/checkpatch.pl -f --fix \
--types=pointer_location,spacing,space_before_tab
- one manual fixup (keep structure members tab-aligned)
- removal of additional space_before_tab that were not found by --fix
Tested with some of my msg and sem test apps.
Andrew: Could you include it in -mm and move it towards Linus' tree?
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Suggested-by: Li Bin <huawei.libin@huawei.com>
Cc: Joe Perches <joe@perches.com>
Acked-by: Rafael Aquini <aquini@redhat.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
After the locking semantics for the SysV IPC API got improved, a couple
of IPC_RMID race windows were opened because we ended up dropping the
'kern_ipc_perm.deleted' check performed way down in ipc_lock(). The
spotted races got sorted out by re-introducing the old test within the
racy critical sections.
This patch introduces ipc_valid_object() to consolidate the way we cope
with IPC_RMID races by using the same abstraction across the API
implementation.
Signed-off-by: Rafael Aquini <aquini@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 2caacaa82a ("ipc,shm: shorten critical region for shmctl")
restructured the ipc shm to shorten critical region, but introduced a
path where the return value could be -EPERM, even if the operation
actually was performed.
Before the commit, the err return value was reset by the return value
from security_shm_shmctl() after the if (!ns_capable(...)) statement.
Now, we still exit the if statement with err set to -EPERM, and in the
case of SHM_UNLOCK, it is not reset at all, and used as the return value
from shmctl.
To fix this, we only set err when errors occur, leaving the fallthrough
case alone.
Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org> [3.12.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When IPC_RMID races with other shm operations there's potential for
use-after-free of the shm object's associated file (shm_file).
Here's the race before this patch:
TASK 1 TASK 2
------ ------
shm_rmid()
ipc_lock_object()
shmctl()
shp = shm_obtain_object_check()
shm_destroy()
shum_unlock()
fput(shp->shm_file)
ipc_lock_object()
shmem_lock(shp->shm_file)
<OOPS>
The oops is caused because shm_destroy() calls fput() after dropping the
ipc_lock. fput() clears the file's f_inode, f_path.dentry, and
f_path.mnt, which causes various NULL pointer references in task 2. I
reliably see the oops in task 2 if with shmlock, shmu
This patch fixes the races by:
1) set shm_file=NULL in shm_destroy() while holding ipc_object_lock().
2) modify at risk operations to check shm_file while holding
ipc_object_lock().
Example workloads, which each trigger oops...
Workload 1:
while true; do
id=$(shmget 1 4096)
shm_rmid $id &
shmlock $id &
wait
done
The oops stack shows accessing NULL f_inode due to racing fput:
_raw_spin_lock
shmem_lock
SyS_shmctl
Workload 2:
while true; do
id=$(shmget 1 4096)
shmat $id 4096 &
shm_rmid $id &
wait
done
The oops stack is similar to workload 1 due to NULL f_inode:
touch_atime
shmem_mmap
shm_mmap
mmap_region
do_mmap_pgoff
do_shmat
SyS_shmat
Workload 3:
while true; do
id=$(shmget 1 4096)
shmlock $id
shm_rmid $id &
shmunlock $id &
wait
done
The oops stack shows second fput tripping on an NULL f_inode. The
first fput() completed via from shm_destroy(), but a racing thread did
a get_file() and queued this fput():
locks_remove_flock
__fput
____fput
task_work_run
do_notify_resume
int_signal
Fixes: c2c737a046 ("ipc,shm: shorten critical region for shmat")
Fixes: 2caacaa82a ("ipc,shm: shorten critical region for shmctl")
Signed-off-by: Greg Thelen <gthelen@google.com>
Cc: Davidlohr Bueso <davidlohr@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: <stable@vger.kernel.org> # 3.10.17+ 3.11.6+
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Currently, IPC mechanisms do security and auditing related checks under
RCU. However, since security modules can free the security structure,
for example, through selinux_[sem,msg_queue,shm]_free_security(), we can
race if the structure is freed before other tasks are done with it,
creating a use-after-free condition. Manfred illustrates this nicely,
for instance with shared mem and selinux:
-> do_shmat calls rcu_read_lock()
-> do_shmat calls shm_object_check().
Checks that the object is still valid - but doesn't acquire any locks.
Then it returns.
-> do_shmat calls security_shm_shmat (e.g. selinux_shm_shmat)
-> selinux_shm_shmat calls ipc_has_perm()
-> ipc_has_perm accesses ipc_perms->security
shm_close()
-> shm_close acquires rw_mutex & shm_lock
-> shm_close calls shm_destroy
-> shm_destroy calls security_shm_free (e.g. selinux_shm_free_security)
-> selinux_shm_free_security calls ipc_free_security(&shp->shm_perm)
-> ipc_free_security calls kfree(ipc_perms->security)
This patch delays the freeing of the security structures after all RCU
readers are done. Furthermore it aligns the security life cycle with
that of the rest of IPC - freeing them based on the reference counter.
For situations where we need not free security, the current behavior is
kept. Linus states:
"... the old behavior was suspect for another reason too: having the
security blob go away from under a user sounds like it could cause
various other problems anyway, so I think the old code was at least
_prone_ to bugs even if it didn't have catastrophic behavior."
I have tested this patch with IPC testcases from LTP on both my
quad-core laptop and on a 64 core NUMA server. In both cases selinux is
enabled, and tests pass for both voluntary and forced preemption models.
While the mentioned races are theoretical (at least no one as reported
them), I wanted to make sure that this new logic doesn't break anything
we weren't aware of.
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Acked-by: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function was replaced by a the lockless shm_obtain_object_check(),
and no longer has any users.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
VM area isn't found - check the return value of find_vma().
Also, remove the redundant -EINVAL return: retval is set to the proper
return code and *only* changed to 0, when we actually unmap the segments.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Since in some situations the lock can be shared for readers, we shouldn't
be calling it a mutex, rename it to rwsem.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to other system calls, acquire the kern_ipc_perm lock after doing
the initial permission and security checks.
[sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Clean up some of the messy do_shmat() spaghetti code, getting rid of
out_free and out_put_dentry labels. This makes shortening the critical
region of this function in the next patch a little easier to do and read.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
shm_perm lock after doing the initial auditing and security checks. The
rest of the logic remains unchanged.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
it unnecessarily. We can do the permissions and security checks only
holding the rcu lock.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
commands can be performed without acquiring the ipc object.
Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
of msgctl(). Since we are just moving functionality, this change still
takes the lock and it will be properly lockless in the next patch.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of holding the ipc lock for the entire function, use the
ipcctl_pre_down_nolock and only acquire the lock for specific commands:
RMID and SET.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This is the third and final patchset that deals with reducing the amount
of contention we impose on the ipc lock (kern_ipc_perm.lock). These
changes mostly deal with shared memory, previous work has already been
done for semaphores and message queues:
http://lkml.org/lkml/2013/3/20/546 (sems)
http://lkml.org/lkml/2013/5/15/584 (mqueues)
With these patches applied, a custom shm microbenchmark stressing shmctl
doing IPC_STAT with 4 threads a million times, reduces the execution
time by 50%. A similar run, this time with IPC_SET, reduces the
execution time from 3 mins and 35 secs to 27 seconds.
Patches 1-8: replaces blindly taking the ipc lock for a smarter
combination of rcu and ipc_obtain_object, only acquiring the spinlock
when updating.
Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.
Patch 10: is a trivial mqueue leftover cleanup
Patch 11: adds a brief lock scheme description, requested by Andrew.
This patch:
Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
to get the ipc object without acquiring the lock. Just as with other
forms of ipc, these functions are basically wrappers around
ipc_obtain_object*().
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function currently acquires both the rw_mutex and the rcu lock on
successful lookups, leaving the callers to explicitly unlock them,
creating another two level locking situation.
Make the callers (including those that still use ipcctl_pre_down())
explicitly lock and unlock the rwsem and rcu lock.
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patchset continues the work that began in the sysv ipc semaphore
scaling series, see
https://lkml.org/lkml/2013/3/20/546
Just like semaphores used to be, sysv shared memory and msg queues also
abuse the ipc lock, unnecessarily holding it for operations such as
permission and security checks.
This patchset mostly deals with mqueues, and while shared mem can be
done in a very similar way, I want to get these patches out in the open
first. It also does some pending cleanups, mostly focused on the two
level locking we have in ipc code, taking care of ipc_addid() and
ipcctl_pre_down_nolock() - yes there are still functions that need to be
updated as well.
This patch:
Make all callers explicitly take and release the RCU read lock.
This addresses the two level locking seen in newary(), newseg() and
newqueue(). For the last two, explicitly unlock the ipc object and the
rcu lock, instead of calling the custom shm_unlock and msg_unlock
functions. The next patch will deal with the open coded locking for
->perm.lock
Signed-off-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The current kernel returns -EINVAL unless a given mmap length is
"almost" hugepage aligned. This is because in sys_mmap_pgoff() the
given length is passed to vm_mmap_pgoff() as it is without being aligned
with hugepage boundary.
This is a regression introduced in commit 40716e2924 ("hugetlbfs: fix
alignment of huge page requests"), where alignment code is pushed into
hugetlb_file_setup() and the variable len in caller side is not changed.
To fix this, this patch partially reverts that commit, and adds
alignment code in caller side. And it also introduces hstate_sizelog()
in order to get proper hstate to specified hugepage size.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
[akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: <iceman_dvd@yahoo.com>
Cc: Steven Truelove <steven.truelove@utoronto.ca>
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>