Commit graph

421 commits

Author SHA1 Message Date
Greg Kroah-Hartman
6e37ae0e7a This is the 4.4.134 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlsOO14ACgkQONu9yGCS
 aT4ulhAAhMVYSRa/cOFm0BHxSL/59WmJTa3Na8TJqkTrJy+LRluBiKCywyiMZknp
 4rIffv4jcxcFNCpqYTjNTSStGLWCCkBLNSzxuzFv5M89Jdx4Gz1Ww1hzMESP3gxK
 puHUewSJQm7qtVOiC2l4YcW3Q6nFK0kqbCWpSkHoGVfZoX9JS2P1V8n+KFZpUH1a
 UyhVW48ainUpXfhSKJZ5xABiWYM2hcSq52RW1edNZvwuKwulZ+2EME26HgGCK7ff
 WHzGHECE6Lem+iunR26J/QtbTo8LKEyU0F039X21E7FIxf33S0xyPx+MGjJfWBOo
 Q6A23mAEWwEhlMomNKzdd/iUzSVlWSzKe8LJa7GI5G6BxftN8Z0TGTnKzIDkw++M
 T6RfK03CP6c9rQ756d0fTPxdZh6ae9EN8WSot/Sbbc9SvGSfy6o4I8Y/uJygShmF
 j13JfMweC+t7/6fyUqc5dcgY0Xy7LUFiWqfPxQj6axDiT82Mx2AvQaczrPUAKr1K
 KQsetmyhHC+Cpy7ILrhUGYjEWlvQm11ZiFoX8BkocFLFWk736QA63iB7mOUpCOQR
 SKLK00dF163GJdQC6nb4wCtyBxnCg4pSoP/72Z1foPtaSd3ccJ4CLsIE6GY5sP/I
 sDlPnIlnzEDfDPIxtVfKC8e1JINP6awXwtoJJo6MnuCuP3LDb58=
 =ogZQ
 -----END PGP SIGNATURE-----

Merge 4.4.134 into android-4.4

Changes in 4.4.134
	MIPS: ptrace: Expose FIR register through FP regset
	MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRs
	KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"
	affs_lookup(): close a race with affs_remove_link()
	aio: fix io_destroy(2) vs. lookup_ioctx() race
	ALSA: timer: Fix pause event notification
	mmc: sdhci-iproc: fix 32bit writes for TRANSFER_MODE register
	libata: Blacklist some Sandisk SSDs for NCQ
	libata: blacklist Micron 500IT SSD with MU01 firmware
	xen-swiotlb: fix the check condition for xen_swiotlb_free_coherent
	Revert "ipc/shm: Fix shmat mmap nil-page protection"
	ipc/shm: fix shmat() nil address after round-down when remapping
	kasan: fix memory hotplug during boot
	kernel/sys.c: fix potential Spectre v1 issue
	kernel/signal.c: avoid undefined behaviour in kill_something_info
	xfs: remove racy hasattr check from attr ops
	do d_instantiate/unlock_new_inode combinations safely
	firewire-ohci: work around oversized DMA reads on JMicron controllers
	NFSv4: always set NFS_LOCK_LOST when a lock is lost.
	ALSA: hda - Use IS_REACHABLE() for dependency on input
	ASoC: au1x: Fix timeout tests in au1xac97c_ac97_read()
	kvm: x86: fix KVM_XEN_HVM_CONFIG ioctl
	tracing/hrtimer: Fix tracing bugs by taking all clock bases and modes into account
	PCI: Add function 1 DMA alias quirk for Marvell 9128
	tools lib traceevent: Simplify pointer print logic and fix %pF
	perf callchain: Fix attr.sample_max_stack setting
	tools lib traceevent: Fix get_field_str() for dynamic strings
	dm thin: fix documentation relative to low water mark threshold
	nfs: Do not convert nfs_idmap_cache_timeout to jiffies
	watchdog: sp5100_tco: Fix watchdog disable bit
	kconfig: Don't leak main menus during parsing
	kconfig: Fix automatic menu creation mem leak
	kconfig: Fix expr_free() E_NOT leak
	mac80211_hwsim: fix possible memory leak in hwsim_new_radio_nl()
	ipmi/powernv: Fix error return code in ipmi_powernv_probe()
	Btrfs: set plug for fsync
	btrfs: Fix out of bounds access in btrfs_search_slot
	Btrfs: fix scrub to repair raid6 corruption
	scsi: fas216: fix sense buffer initialization
	HID: roccat: prevent an out of bounds read in kovaplus_profile_activated()
	jffs2: Fix use-after-free bug in jffs2_iget()'s error handling path
	powerpc/numa: Use ibm,max-associativity-domains to discover possible nodes
	powerpc/numa: Ensure nodes initialized for hotplug
	RDMA/mlx5: Avoid memory leak in case of XRCD dealloc failure
	ntb_transport: Fix bug with max_mw_size parameter
	ocfs2: return -EROFS to mount.ocfs2 if inode block is invalid
	ocfs2/acl: use 'ip_xattr_sem' to protect getting extended attribute
	ocfs2: return error when we attempt to access a dirty bh in jbd2
	mm/mempolicy: fix the check of nodemask from user
	mm/mempolicy: add nodes_empty check in SYSC_migrate_pages
	asm-generic: provide generic_pmdp_establish()
	mm: pin address_space before dereferencing it while isolating an LRU page
	IB/ipoib: Fix for potential no-carrier state
	x86/power: Fix swsusp_arch_resume prototype
	firmware: dmi_scan: Fix handling of empty DMI strings
	ACPI: processor_perflib: Do not send _PPC change notification if not ready
	bpf: fix selftests/bpf test_kmod.sh failure when CONFIG_BPF_JIT_ALWAYS_ON=y
	MIPS: TXx9: use IS_BUILTIN() for CONFIG_LEDS_CLASS
	xen-netfront: Fix race between device setup and open
	xen/grant-table: Use put_page instead of free_page
	RDS: IB: Fix null pointer issue
	arm64: spinlock: Fix theoretical trylock() A-B-A with LSE atomics
	proc: fix /proc/*/map_files lookup
	cifs: silence compiler warnings showing up with gcc-8.0.0
	bcache: properly set task state in bch_writeback_thread()
	bcache: fix for allocator and register thread race
	bcache: fix for data collapse after re-attaching an attached device
	bcache: return attach error when no cache set exist
	tools/libbpf: handle issues with bpf ELF objects containing .eh_frames
	locking/qspinlock: Ensure node->count is updated before initialising node
	irqchip/gic-v3: Change pr_debug message to pr_devel
	scsi: ufs: Enable quirk to ignore sending WRITE_SAME command
	scsi: bnx2fc: Fix check in SCSI completion handler for timed out request
	scsi: sym53c8xx_2: iterator underflow in sym_getsync()
	scsi: mptfusion: Add bounds check in mptctl_hp_targetinfo()
	scsi: qla2xxx: Avoid triggering undefined behavior in qla2x00_mbx_completion()
	ARC: Fix malformed ARC_EMUL_UNALIGNED default
	usb: gadget: f_uac2: fix bFirstInterface in composite gadget
	usb: gadget: fsl_udc_core: fix ep valid checks
	usb: dwc2: Fix dwc2_hsotg_core_init_disconnected()
	selftests: memfd: add config fragment for fuse
	scsi: storvsc: Increase cmd_per_lun for higher speed devices
	scsi: aacraid: fix shutdown crash when init fails
	scsi: qla4xxx: skip error recovery in case of register disconnect.
	ARM: OMAP2+: timer: fix a kmemleak caused in omap_get_timer_dt
	ARM: OMAP3: Fix prm wake interrupt for resume
	ARM: OMAP1: clock: Fix debugfs_create_*() usage
	NFC: llcp: Limit size of SDP URI
	mac80211: round IEEE80211_TX_STATUS_HEADROOM up to multiple of 4
	md raid10: fix NULL deference in handle_write_completed()
	drm/exynos: fix comparison to bitshift when dealing with a mask
	usb: musb: fix enumeration after resume
	locking/xchg/alpha: Add unconditional memory barrier to cmpxchg()
	md: raid5: avoid string overflow warning
	kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE
	powerpc/bpf/jit: Fix 32-bit JIT for seccomp_data access
	s390/cio: fix return code after missing interrupt
	s390/cio: clear timer when terminating driver I/O
	ARM: OMAP: Fix dmtimer init for omap1
	smsc75xx: fix smsc75xx_set_features()
	regulatory: add NUL to request alpha2
	locking/xchg/alpha: Fix xchg() and cmpxchg() memory ordering bugs
	x86/topology: Update the 'cpu cores' field in /proc/cpuinfo correctly across CPU hotplug operations
	media: dmxdev: fix error code for invalid ioctls
	md/raid1: fix NULL pointer dereference
	batman-adv: fix packet checksum in receive path
	batman-adv: invalidate checksum on fragment reassembly
	netfilter: ebtables: convert BUG_ONs to WARN_ONs
	nvme-pci: Fix nvme queue cleanup if IRQ setup fails
	clocksource/drivers/fsl_ftm_timer: Fix error return checking
	r8152: fix tx packets accounting
	virtio-gpu: fix ioctl and expose the fixed status to userspace.
	dmaengine: rcar-dmac: fix max_chunk_size for R-Car Gen3
	bcache: fix kcrashes with fio in RAID5 backend dev
	sit: fix IFLA_MTU ignored on NEWLINK
	gianfar: Fix Rx byte accounting for ndev stats
	net/tcp/illinois: replace broken algorithm reference link
	xen/pirq: fix error path cleanup when binding MSIs
	Btrfs: send, fix issuing write op when processing hole in no data mode
	selftests/powerpc: Skip the subpage_prot tests if the syscall is unavailable
	KVM: PPC: Book3S HV: Fix VRMA initialization with 2MB or 1GB memory backing
	watchdog: f71808e_wdt: Fix magic close handling
	e1000e: Fix check_for_link return value with autoneg off
	e1000e: allocate ring descriptors with dma_zalloc_coherent
	usb: musb: call pm_runtime_{get,put}_sync before reading vbus registers
	scsi: mpt3sas: Do not mark fw_event workqueue as WQ_MEM_RECLAIM
	scsi: sd: Keep disk read-only when re-reading partition
	fbdev: Fixing arbitrary kernel leak in case FBIOGETCMAP_SPARC in sbusfb_ioctl_helper().
	xen: xenbus: use put_device() instead of kfree()
	USB: OHCI: Fix NULL dereference in HCDs using HCD_LOCAL_MEM
	netfilter: ebtables: fix erroneous reject of last rule
	bnxt_en: Check valid VNIC ID in bnxt_hwrm_vnic_set_tpa().
	workqueue: use put_device() instead of kfree()
	ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu
	sunvnet: does not support GSO for sctp
	net: Fix vlan untag for bridge and vlan_dev with reorder_hdr off
	batman-adv: fix header size check in batadv_dbg_arp()
	vti4: Don't count header length twice on tunnel setup
	vti4: Don't override MTU passed on link creation via IFLA_MTU
	perf/cgroup: Fix child event counting bug
	RDMA/ucma: Correct option size check using optlen
	mm/mempolicy.c: avoid use uninitialized preferred_node
	selftests: ftrace: Add probe event argument syntax testcase
	selftests: ftrace: Add a testcase for string type with kprobe_event
	selftests: ftrace: Add a testcase for probepoint
	batman-adv: fix multicast-via-unicast transmission with AP isolation
	batman-adv: fix packet loss for broadcasted DHCP packets to a server
	ARM: 8748/1: mm: Define vdso_start, vdso_end as array
	net: qmi_wwan: add BroadMobi BM806U 2020:2033
	net/usb/qmi_wwan.c: Add USB id for lt4120 modem
	net-usb: add qmi_wwan if on lte modem wistron neweb d18q1
	llc: properly handle dev_queue_xmit() return value
	mm/kmemleak.c: wait for scan completion before disabling free
	net: Fix untag for vlan packets without ethernet header
	net: mvneta: fix enable of all initialized RXQs
	sh: fix debug trap failure to process signals before return to user
	x86/pgtable: Don't set huge PUD/PMD on non-leaf entries
	fs/proc/proc_sysctl.c: fix potential page fault while unregistering sysctl table
	swap: divide-by-zero when zero length swap file on ssd
	sr: get/drop reference to device in revalidate and check_events
	Force log to disk before reading the AGF during a fstrim
	cpufreq: CPPC: Initialize shared perf capabilities of CPUs
	scsi: aacraid: Insure command thread is not recursively stopped
	dp83640: Ensure against premature access to PHY registers after reset
	mm/ksm: fix interaction with THP
	mm: fix races between address_space dereference and free in page_evicatable
	Btrfs: bail out on error during replay_dir_deletes
	Btrfs: fix NULL pointer dereference in log_dir_items
	btrfs: Fix possible softlock on single core machines
	ocfs2/dlm: don't handle migrate lockres if already in shutdown
	sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning
	KVM: VMX: raise internal error for exception during invalid protected mode state
	fscache: Fix hanging wait on page discarded by writeback
	sparc64: Make atomic_xchg() an inline function rather than a macro.
	rtc: snvs: Fix usage of snvs_rtc_enable
	net: bgmac: Fix endian access in bgmac_dma_tx_ring_free()
	Bluetooth: btusb: Add USB ID 7392:a611 for Edimax EW-7611ULB
	btrfs: tests/qgroup: Fix wrong tree backref level
	Btrfs: fix copy_items() return value when logging an inode
	btrfs: fix lockdep splat in btrfs_alloc_subvolume_writers
	xen/acpi: off by one in read_acpi_id()
	ACPI: acpi_pad: Fix memory leak in power saving threads
	powerpc/mpic: Check if cpu_possible() in mpic_physmask()
	m68k: set dma and coherent masks for platform FEC ethernets
	parisc/pci: Switch LBA PCI bus from Hard Fail to Soft Fail mode
	hwmon: (nct6775) Fix writing pwmX_mode
	rtc: hctosys: Ensure system time doesn't overflow time_t
	powerpc/perf: Prevent kernel address leak to userspace via BHRB buffer
	powerpc/perf: Fix kernel address leak via sampling registers
	tools/thermal: tmon: fix for segfault
	selftests: Print the test we're running to /dev/kmsg
	net/mlx5: Protect from command bit overflow
	ath10k: Fix kernel panic while using worker (ath10k_sta_rc_update_wk)
	ima: Fix Kconfig to select TPM 2.0 CRB interface
	ima: Fallback to the builtin hash algorithm
	virtio-net: Fix operstate for virtio when no VIRTIO_NET_F_STATUS
	arm: dts: socfpga: fix GIC PPI warning
	usb: dwc3: Update DWC_usb31 GTXFIFOSIZ reg fields
	cpufreq: cppc_cpufreq: Fix cppc_cpufreq_init() failure path
	clk: Don't show the incorrect clock phase
	zorro: Set up z->dev.dma_mask for the DMA API
	bcache: quit dc->writeback_thread when BCACHE_DEV_DETACHING is set
	ACPICA: Events: add a return on failure from acpi_hw_register_read
	ACPICA: acpi: acpica: fix acpi operand cache leak in nseval.c
	i2c: mv64xxx: Apply errata delay only in standard mode
	KVM: lapic: stop advertising DIRECTED_EOI when in-kernel IOAPIC is in use
	xhci: zero usb device slot_id member when disabling and freeing a xhci slot
	MIPS: ath79: Fix AR724X_PLL_REG_PCIE_CONFIG offset
	PCI: Restore config space on runtime resume despite being unbound
	ipmi_ssif: Fix kernel panic at msg_done_handler
	usb: dwc2: Fix interval type issue
	usb: gadget: ffs: Let setup() return USB_GADGET_DELAYED_STATUS
	usb: gadget: ffs: Execute copy_to_user() with USER_DS set
	powerpc: Add missing prototype for arch_irq_work_raise()
	ASoC: topology: create TLV data for dapm widgets
	perf/core: Fix perf_output_read_group()
	hwmon: (pmbus/max8688) Accept negative page register values
	hwmon: (pmbus/adm1275) Accept negative page register values
	cdrom: do not call check_disk_change() inside cdrom_open()
	gfs2: Fix fallocate chunk size
	usb: gadget: udc: change comparison to bitshift when dealing with a mask
	usb: gadget: composite: fix incorrect handling of OS desc requests
	x86/devicetree: Initialize device tree before using it
	x86/devicetree: Fix device IRQ settings in DT
	ALSA: vmaster: Propagate slave error
	media: cx23885: Override 888 ImpactVCBe crystal frequency
	media: cx23885: Set subdev host data to clk_freq pointer
	media: s3c-camif: fix out-of-bounds array access
	dmaengine: pl330: fix a race condition in case of threaded irqs
	media: em28xx: USB bulk packet size fix
	clk: rockchip: Prevent calculating mmc phase if clock rate is zero
	enic: enable rq before updating rq descriptors
	hwrng: stm32 - add reset during probe
	staging: rtl8192u: return -ENOMEM on failed allocation of priv->oldaddr
	rtc: tx4939: avoid unintended sign extension on a 24 bit shift
	serial: xuartps: Fix out-of-bounds access through DT alias
	serial: samsung: Fix out-of-bounds access through serial port index
	serial: mxs-auart: Fix out-of-bounds access through serial port index
	serial: imx: Fix out-of-bounds access through serial port index
	serial: fsl_lpuart: Fix out-of-bounds access through DT alias
	serial: arc_uart: Fix out-of-bounds access through DT alias
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9220
	udf: Provide saner default for invalid uid / gid
	media: cx25821: prevent out-of-bounds read on array card
	clk: samsung: s3c2410: Fix PLL rates
	clk: samsung: exynos5260: Fix PLL rates
	clk: samsung: exynos5433: Fix PLL rates
	clk: samsung: exynos5250: Fix PLL rates
	clk: samsung: exynos3250: Fix PLL rates
	crypto: sunxi-ss - Add MODULE_ALIAS to sun4i-ss
	audit: return on memory error to avoid null pointer dereference
	MIPS: Octeon: Fix logging messages with spurious periods after newlines
	drm/rockchip: Respect page offset for PRIME mmap calls
	x86/apic: Set up through-local-APIC mode on the boot CPU if 'noapic' specified
	perf tests: Use arch__compare_symbol_names to compare symbols
	perf report: Fix memory corruption in --branch-history mode --branch-history
	selftests/net: fixes psock_fanout eBPF test case
	netlabel: If PF_INET6, check sk_buff ip header version
	scsi: lpfc: Fix issue_lip if link is disabled
	scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing
	scsi: lpfc: Fix frequency of Release WQE CQEs
	regulator: of: Add a missing 'of_node_put()' in an error handling path of 'of_regulator_match()'
	ASoC: samsung: i2s: Ensure the RCLK rate is properly determined
	Bluetooth: btusb: Add device ID for RTL8822BE
	kdb: make "mdr" command repeat
	s390/ftrace: use expoline for indirect branches
	Linux 4.4.134

Change-Id: Iababaf9b89bc8d0437b95e1368d8b0a9126a178c
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-05-30 13:25:24 +02:00
Al Viro
03bb758894 do d_instantiate/unlock_new_inode combinations safely
commit 1e2e547a93a00ebc21582c06ca3c6cfea2a309ee upstream.

For anything NFS-exported we do _not_ want to unlock new inode
before it has grown an alias; original set of fixes got the
ordering right, but missed the nasty complication in case of
lockdep being enabled - unlock_new_inode() does
	lockdep_annotate_inode_mutex_key(inode)
which can only be done before anyone gets a chance to touch
->i_mutex.  Unfortunately, flipping the order and doing
unlock_new_inode() before d_instantiate() opens a window when
mkdir can race with open-by-fhandle on a guessed fhandle, leading
to multiple aliases for a directory inode and all the breakage
that follows from that.

	Correct solution: a new primitive (d_instantiate_new())
combining these two in the right order - lockdep annotate, then
d_instantiate(), then the rest of unlock_new_inode().  All
combinations of d_instantiate() with unlock_new_inode() should
be converted to that.

Cc: stable@kernel.org	# 2.6.29 and later
Tested-by: Mike Marshall <hubcap@omnibond.com>
Reviewed-by: Andreas Dilger <adilger@dilger.ca>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30 07:48:52 +02:00
Greg Kroah-Hartman
79e7124a51 This is the 4.4.123 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlqzaAQACgkQONu9yGCS
 aT6/iw//SWWvNvIYVs9SUcAk1lLRdxIPEJ2mR4CELU8S9WblInX8MS/w3+qGUGVt
 9LzQUfxqbwpX0luel6QnyvME66qQI+VQ0CzA7Qtuzoy2UNlr5woW112nOzkBOuKt
 +KjHg3cUwPijljEZnhPq7eEv6TrHrb/ZsU4jjTLMrlL1u/TYfJQGrU+JAFpTbHto
 9IyVu/F80E/Vln2GHwYidHUbWG70lYIY75J/CXvklbfrxuRpsrOhQla52YspzI2p
 B9Q5p6Ct7FIO1SEBdVdBYLMMBbrt/+dbQzHMRWyUtTudiCSgeQ+8JEWRo6Pu+dGs
 1cW9uCJ+Q7AolQWicG/H80YBZczkB/p77PRCua2vVpx8e/FYSujP3E5T8Mnl7k81
 mmSoNRF8SqYXSwWDCqd+c/ck4zPRKPM++eNpkor+M0QyT9MOfh9oq4vIW4HoZQEm
 j+mZWUNihLO+O7znNGLoQLWn2mzDLjcrfnM6DOaKc5FJht/7OG8yyaobINYoUCmA
 +1cQP0RsRrWo87zpcWoHKRSZ6Iw+YtEu3i7q2bqEdtb2Jc/88nQ5/eYcBLYDaPtw
 DZ8GXL8kxXCP2J2GumwHIBTY68yZPVBknABGo5odqZwfkSgjK5fL17MzWX10Occp
 0U6r+IluHkRCSxcCuyS8w10+TPn/Rw26wJCstPYargfiw/twILA=
 =H2Hj
 -----END PGP SIGNATURE-----

Merge 4.4.123 into android-4.4

Changes in 4.4.123
	blkcg: fix double free of new_blkg in blkcg_init_queue
	Input: tsc2007 - check for presence and power down tsc2007 during probe
	staging: speakup: Replace BUG_ON() with WARN_ON().
	staging: wilc1000: add check for kmalloc allocation failure.
	HID: reject input outside logical range only if null state is set
	drm: qxl: Don't alloc fbdev if emulation is not supported
	ath10k: fix a warning during channel switch with multiple vaps
	PCI/MSI: Stop disabling MSI/MSI-X in pci_device_shutdown()
	selinux: check for address length in selinux_socket_bind()
	perf sort: Fix segfault with basic block 'cycles' sort dimension
	i40e: Acquire NVM lock before reads on all devices
	i40e: fix ethtool to get EEPROM data from X722 interface
	perf tools: Make perf_event__synthesize_mmap_events() scale
	drivers: net: xgene: Fix hardware checksum setting
	drm: Defer disabling the vblank IRQ until the next interrupt (for instant-off)
	ath10k: disallow DFS simulation if DFS channel is not enabled
	perf probe: Return errno when not hitting any event
	HID: clamp input to logical range if no null state
	net/8021q: create device with all possible features in wanted_features
	ARM: dts: Adjust moxart IRQ controller and flags
	batman-adv: handle race condition for claims between gateways
	of: fix of_device_get_modalias returned length when truncating buffers
	solo6x10: release vb2 buffers in solo_stop_streaming()
	scsi: ipr: Fix missed EH wakeup
	media: i2c/soc_camera: fix ov6650 sensor getting wrong clock
	timers, sched_clock: Update timeout for clock wrap
	sysrq: Reset the watchdog timers while displaying high-resolution timers
	Input: qt1070 - add OF device ID table
	sched: act_csum: don't mangle TCP and UDP GSO packets
	ASoC: rcar: ssi: don't set SSICR.CKDV = 000 with SSIWSR.CONT
	spi: omap2-mcspi: poll OMAP2_MCSPI_CHSTAT_RXS for PIO transfer
	tcp: sysctl: Fix a race to avoid unexpected 0 window from space
	dmaengine: imx-sdma: add 1ms delay to ensure SDMA channel is stopped
	driver: (adm1275) set the m,b and R coefficients correctly for power
	mm: Fix false-positive VM_BUG_ON() in page_cache_{get,add}_speculative()
	blk-throttle: make sure expire time isn't too big
	f2fs: relax node version check for victim data in gc
	bonding: refine bond_fold_stats() wrap detection
	braille-console: Fix value returned by _braille_console_setup
	drm/vmwgfx: Fixes to vmwgfx_fb
	vxlan: vxlan dev should inherit lowerdev's gso_max_size
	NFC: nfcmrvl: Include unaligned.h instead of access_ok.h
	NFC: nfcmrvl: double free on error path
	ARM: dts: r8a7790: Correct parent of SSI[0-9] clocks
	ARM: dts: r8a7791: Correct parent of SSI[0-9] clocks
	powerpc: Avoid taking a data miss on every userspace instruction miss
	net/faraday: Add missing include of of.h
	ARM: dts: koelsch: Correct clock frequency of X2 DU clock input
	reiserfs: Make cancel_old_flush() reliable
	ALSA: firewire-digi00x: handle all MIDI messages on streaming packets
	fm10k: correctly check if interface is removed
	scsi: ses: don't get power status of SES device slot on probe
	apparmor: Make path_max parameter readonly
	iommu/iova: Fix underflow bug in __alloc_and_insert_iova_range
	video: ARM CLCD: fix dma allocation size
	drm/radeon: Fail fb creation from imported dma-bufs.
	drm/amdgpu: Fail fb creation from imported dma-bufs. (v2)
	coresight: Fixes coresight DT parse to get correct output port ID.
	MIPS: BPF: Quit clobbering callee saved registers in JIT code.
	MIPS: BPF: Fix multiple problems in JIT skb access helpers.
	MIPS: r2-on-r6-emu: Fix BLEZL and BGTZL identification
	MIPS: r2-on-r6-emu: Clear BLTZALL and BGEZALL debugfs counters
	regulator: isl9305: fix array size
	md/raid6: Fix anomily when recovering a single device in RAID6.
	usb: dwc2: Make sure we disconnect the gadget state
	usb: gadget: dummy_hcd: Fix wrong power status bit clear/reset in dummy_hub_control()
	drivers/perf: arm_pmu: handle no platform_device
	perf inject: Copy events when reordering events in pipe mode
	perf session: Don't rely on evlist in pipe mode
	scsi: sg: check for valid direction before starting the request
	scsi: sg: close race condition in sg_remove_sfp_usercontext()
	kprobes/x86: Fix kprobe-booster not to boost far call instructions
	kprobes/x86: Set kprobes pages read-only
	pwm: tegra: Increase precision in PWM rate calculation
	wil6210: fix memory access violation in wil_memcpy_from/toio_32
	drm/edid: set ELD connector type in drm_edid_to_eld()
	video/hdmi: Allow "empty" HDMI infoframes
	HID: elo: clear BTN_LEFT mapping
	ARM: dts: exynos: Correct Trats2 panel reset line
	sched: Stop switched_to_rt() from sending IPIs to offline CPUs
	sched: Stop resched_cpu() from sending IPIs to offline CPUs
	test_firmware: fix setting old custom fw path back on exit
	net: xfrm: allow clearing socket xfrm policies.
	mtd: nand: fix interpretation of NAND_CMD_NONE in nand_command[_lp]()
	ARM: dts: am335x-pepper: Fix the audio CODEC's reset pin
	ARM: dts: omap3-n900: Fix the audio CODEC's reset pin
	ath10k: update tdls teardown state to target
	cpufreq: Fix governor module removal race
	clk: qcom: msm8916: fix mnd_width for codec_digcodec
	ath10k: fix invalid STS_CAP_OFFSET_MASK
	tools/usbip: fixes build with musl libc toolchain
	spi: sun6i: disable/unprepare clocks on remove
	scsi: core: scsi_get_device_flags_keyed(): Always return device flags
	scsi: devinfo: apply to HP XP the same flags as Hitachi VSP
	scsi: dh: add new rdac devices
	media: cpia2: Fix a couple off by one bugs
	veth: set peer GSO values
	drm/amdkfd: Fix memory leaks in kfd topology
	agp/intel: Flush all chipset writes after updating the GGTT
	mac80211_hwsim: enforce PS_MANUAL_POLL to be set after PS_ENABLED
	mac80211: remove BUG() when interface type is invalid
	ASoC: nuc900: Fix a loop timeout test
	ipvlan: add L2 check for packets arriving via virtual devices
	rcutorture/configinit: Fix build directory error message
	ima: relax requiring a file signature for new files with zero length
	selftests/x86/entry_from_vm86: Exit with 1 if we fail
	selftests/x86: Add tests for User-Mode Instruction Prevention
	selftests/x86: Add tests for the STR and SLDT instructions
	selftests/x86/entry_from_vm86: Add test cases for POPF
	x86/vm86/32: Fix POPF emulation
	x86/mm: Fix vmalloc_fault to use pXd_large
	ALSA: pcm: Fix UAF in snd_pcm_oss_get_formats()
	ALSA: hda - Revert power_save option default value
	ALSA: seq: Fix possible UAF in snd_seq_check_queue()
	ALSA: seq: Clear client entry before deleting else at closing
	drm/amdgpu/dce: Don't turn off DP sink when disconnected
	fs: Teach path_connected to handle nfs filesystems with multiple roots.
	lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
	fs/aio: Add explicit RCU grace period when freeing kioctx
	fs/aio: Use RCU accessors for kioctx_table->table[]
	irqchip/gic-v3-its: Ensure nr_ites >= nr_lpis
	scsi: sg: fix SG_DXFER_FROM_DEV transfers
	scsi: sg: fix static checker warning in sg_is_valid_dxfer
	scsi: sg: only check for dxfer_len greater than 256M
	ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
	btrfs: alloc_chunk: fix DUP stripe size handling
	btrfs: Fix use-after-free when cleaning up fs_devs with a single stale device
	USB: gadget: udc: Add missing platform_device_put() on error in bdc_pci_probe()
	usb: gadget: bdc: 64-bit pointer capability check
	bpf: fix incorrect sign extension in check_alu_op()
	Linux 4.4.123

Change-Id: Ieb89411248f93522dde29edb8581f8ece22e33a7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-03-22 09:57:28 +01:00
Al Viro
8dc6893e8f lock_parent() needs to recheck if dentry got __dentry_kill'ed under it
commit 3b821409632ab778d46e807516b457dfa72736ed upstream.

In case when dentry passed to lock_parent() is protected from freeing only
by the fact that it's on a shrink list and trylock of parent fails, we
could get hit by __dentry_kill() (and subsequent dentry_kill(parent))
between unlocking dentry and locking presumed parent.  We need to recheck
that dentry is alive once we lock both it and parent *and* postpone
rcu_read_unlock() until after that point.  Otherwise we could return
a pointer to struct dentry that already is rcu-scheduled for freeing, with
->d_lock held on it; caller's subsequent attempt to unlock it can end
up with memory corruption.

Cc: stable@vger.kernel.org # 3.12+, counting backports
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22 09:23:31 +01:00
Greg Kroah-Hartman
9f764bbe06 This is the 4.4.80 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmHzogACgkQONu9yGCS
 aT72Kg/9Ea02hrf7SCaEmReH0CNBsZiWBp0u/4b6QtXt3TrPDXK0oteIB4SUIVi/
 zOzjU5SkssMLL9RoRQob81DLFJlL0b9ME5nLXxAACe2P74DaRSxA3DDmrYILgerH
 Gnv4k9xjbVMXMjdk6qAZ/SahCFfYPfnPCRO/zPeb3+6EZk8UQpaaB/GNxVCsGFTZ
 AfThsAHYzfFOg2fYdK0T09eDtAFqAokwGY6O8uaigkJt3u5mbMXcgxSp4o322OcG
 V3jxCUPzSk/78QtoSqQErXDCj/30451oLVByMBuRpBJAilsDf6VaURuz1dVfKFW8
 PdkLiy397sir696HwPU0HwHz++kRnZK2u2z//TRDE5wmgsC9VSq9fkggZdmNBol5
 N4ekCWjhYyyJzxf9hTxK/fA4t4KRFtOcdRiEkJj9RDIhT9jxsxPMr3TGJ25LJaUH
 8Qae+nNlYVe7lmaojckGa+AjIMm5HRB7LZnf4VQr1E8kvWpWpwA/0YtnduzPsXhH
 6xqT0rL/1/Z1Jz63/zPAtZ9OSL/ne0hJs+xOuUhKHGwH3oWBKrgmxAH8CAxYq0x9
 Y6ALkDweS3e+vVt+4BcHpUz8JTNTlspMcebt4VvjqvmERpKwmVsl7tEY242Uw4LQ
 wMF50vA9Cc0bVkVS7w2Ns/dn6XEWYpqS4a/MninjaBOMbtMia78=
 =l+tE
 -----END PGP SIGNATURE-----

Merge 4.4.80 into android-4.4

Changes in 4.4.80
	af_key: Add lock to key dump
	pstore: Make spinlock per zone instead of global
	net: reduce skb_warn_bad_offload() noise
	powerpc/pseries: Fix of_node_put() underflow during reconfig remove
	crypto: authencesn - Fix digest_null crash
	md/raid5: add thread_group worker async_tx_issue_pending_all
	drm/vmwgfx: Fix gcc-7.1.1 warning
	drm/nouveau/bar/gf100: fix access to upper half of BAR2
	KVM: PPC: Book3S HV: Context-switch EBB registers properly
	KVM: PPC: Book3S HV: Restore critical SPRs to host values on guest exit
	KVM: PPC: Book3S HV: Reload HTM registers explicitly
	KVM: PPC: Book3S HV: Save/restore host values of debug registers
	Revert "powerpc/numa: Fix percpu allocations to be NUMA aware"
	Staging: comedi: comedi_fops: Avoid orphaned proc entry
	drm/rcar: Nuke preclose hook
	drm: rcar-du: Perform initialization/cleanup at probe/remove time
	drm: rcar-du: Simplify and fix probe error handling
	perf intel-pt: Fix ip compression
	perf intel-pt: Fix last_ip usage
	perf intel-pt: Use FUP always when scanning for an IP
	perf intel-pt: Ensure never to set 'last_ip' when packet 'count' is zero
	xfs: don't BUG() on mixed direct and mapped I/O
	nfc: fdp: fix NULL pointer dereference
	net: phy: Do not perform software reset for Generic PHY
	isdn: Fix a sleep-in-atomic bug
	isdn/i4l: fix buffer overflow
	ath10k: fix null deref on wmi-tlv when trying spectral scan
	wil6210: fix deadlock when using fw_no_recovery option
	mailbox: always wait in mbox_send_message for blocking Tx mode
	mailbox: skip complete wait event if timer expired
	mailbox: handle empty message in tx_tick
	mpt3sas: Don't overreach ioc->reply_post[] during initialization
	kaweth: fix firmware download
	kaweth: fix oops upon failed memory allocation
	sched/cgroup: Move sched_online_group() back into css_online() to fix crash
	PM / Domains: defer dev_pm_domain_set() until genpd->attach_dev succeeds if present
	RDMA/uverbs: Fix the check for port number
	libnvdimm, btt: fix btt_rw_page not returning errors
	ipmi/watchdog: fix watchdog timeout set on reboot
	dentry name snapshots
	v4l: s5c73m3: fix negation operator
	Make file credentials available to the seqfile interfaces
	/proc/iomem: only expose physical resource addresses to privileged users
	vlan: Propagate MAC address to VLANs
	pstore: Allow prz to control need for locking
	pstore: Correctly initialize spinlock and flags
	pstore: Use dynamic spinlock initializer
	net: skb_needs_check() accepts CHECKSUM_NONE for tx
	sched/cputime: Fix prev steal time accouting during CPU hotplug
	xen/blkback: don't free be structure too early
	xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
	tpm: fix a kernel memory leak in tpm-sysfs.c
	tpm: Replace device number bitmap with IDR
	x86/mce/AMD: Make the init code more robust
	r8169: add support for RTL8168 series add-on card.
	ARM: dts: n900: Mark eMMC slot with no-sdio and no-sd flags
	ipv6: Should use consistent conditional judgement for ip6 fragment between __ip6_append_data and ip6_finish_output
	net/mlx4: Remove BUG_ON from ICM allocation routine
	drm/msm: Ensure that the hardware write pointer is valid
	drm/msm: Verify that MSM_SUBMIT_BO_FLAGS are set
	vfio-pci: use 32-bit comparisons for register address for gcc-4.5
	irqchip/keystone: Fix "scheduling while atomic" on rt
	ASoC: tlv320aic3x: Mark the RESET register as volatile
	spi: dw: Make debugfs name unique between instances
	ASoC: nau8825: fix invalid configuration in Pre-Scalar of FLL
	irqchip/mxs: Enable SKIP_SET_WAKE and MASK_ON_SUSPEND
	openrisc: Add _text symbol to fix ksym build error
	dmaengine: ioatdma: Add Skylake PCI Dev ID
	dmaengine: ioatdma: workaround SKX ioatdma version
	dmaengine: ti-dma-crossbar: Add some 'of_node_put()' in error path.
	ARM64: zynqmp: Fix W=1 dtc 1.4 warnings
	ARM64: zynqmp: Fix i2c node's compatible string
	ARM: s3c2410_defconfig: Fix invalid values for NF_CT_PROTO_*
	ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
	usb: gadget: Fix copy/pasted error message
	Btrfs: adjust outstanding_extents counter properly when dio write is split
	tools lib traceevent: Fix prev/next_prio for deadline tasks
	xfrm: Don't use sk_family for socket policy lookups
	perf tools: Install tools/lib/traceevent plugins with install-bin
	perf symbols: Robustify reading of build-id from sysfs
	video: fbdev: cobalt_lcdfb: Handle return NULL error from devm_ioremap
	vfio-pci: Handle error from pci_iomap
	arm64: mm: fix show_pte KERN_CONT fallout
	nvmem: imx-ocotp: Fix wrong register size
	sh_eth: enable RX descriptor word 0 shift on SH7734
	ALSA: usb-audio: test EP_FLAG_RUNNING at urb completion
	HID: ignore Petzl USB headlamp
	scsi: fnic: Avoid sending reset to firmware when another reset is in progress
	scsi: snic: Return error code on memory allocation failure
	ASoC: dpcm: Avoid putting stream state to STOP when FE stream is paused
	Linux 4.4.80

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-08-07 14:29:16 -07:00
Al Viro
407669f2c9 dentry name snapshots
commit 49d31c2f389acfe83417083e1208422b4091cd9e upstream.

take_dentry_name_snapshot() takes a safe snapshot of dentry name;
if the name is a short one, it gets copied into caller-supplied
structure, otherwise an extra reference to external name is grabbed
(those are never modified).  In either case the pointer to stable
string is stored into the same structure.

dentry must be held by the caller of take_dentry_name_snapshot(),
but may be freely dropped afterwards - the snapshot will stay
until destroyed by release_dentry_name_snapshot().

Intended use:
	struct name_snapshot s;

	take_dentry_name_snapshot(&s, dentry);
	...
	access s.name
	...
	release_dentry_name_snapshot(&s);

Replaces fsnotify_oldname_...(), gets used in fsnotify to obtain the name
to pass down with event.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-08-06 19:19:42 -07:00
Greg Kroah-Hartman
59ff2e15be This is the 4.4.78 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllxlOsACgkQONu9yGCS
 aT4naw//VrWIuIf523IP6egcS+iprNM1lt8HZIzTQ0b2x0qv82UFA9FSs3e97dbG
 LJAUEmHW+oBbm6AekAUrNFF62qaMM5HYxVgGYeiniA1dRvLCDG/OxzxPc0hv6Kmm
 VsbZBlPQEj2B5tqUOkpvQNcqbKpIVglBsK4tjeHE/mRziUkIoPXWKS6Tt7VKGtBa
 gIGY7VASyKhPtxGCdbKzHsO/IGi3cmpwAyhqQDRBR/5Dxy3p9xlQ4gTpobeL+x8z
 A7WHqVNbgfdz21LBii5xE8+GUwiRjlYhxeFJTjhM6wo2/XCtw1FJgc7EMmsbVtbZ
 xYu1/tkYaZGoxOQ7sH5jNgVjE8IcyVimDa5eJ7p7fbh3AzsyXzAJIRc8cS7cL40F
 jLDWrYDnm5t7ziITrAf0uoMmRZLHqS2Bv5sqaoCxR3D51r6LVaNdavD6hYN1CLRA
 fjlDvnvwxbLPI4YPr7PZNu4oSJiawKx9jEBOTFSYu9L1XLvdvdcVU6ULAdpShnLn
 80a+YJmYsNg54im7sxFUw6z87AScznzirIpXEJoUO+Hs0SN9Rq/BQx8cKvVeaMEI
 z53c4Hci45Go/ozZVqCTxGbkQ6tKWIKBSo6kl3xchwms4lmijpWuwbAkDUzECNvv
 0RgyQvNx4d2cRJ8hIVcdVFzp+h+kgNqVQQ2nDld7/1QSZHLPhFo=
 =vrgw
 -----END PGP SIGNATURE-----

Merge 4.4.78 into android-4.4

Changes in 4.4.78
	net_sched: fix error recovery at qdisc creation
	net: sched: Fix one possible panic when no destroy callback
	net/phy: micrel: configure intterupts after autoneg workaround
	ipv6: avoid unregistering inet6_dev for loopback
	net: dp83640: Avoid NULL pointer dereference.
	tcp: reset sk_rx_dst in tcp_disconnect()
	net: prevent sign extension in dev_get_stats()
	bpf: prevent leaking pointer via xadd on unpriviledged
	net: handle NAPI_GRO_FREE_STOLEN_HEAD case also in napi_frags_finish()
	ipv6: dad: don't remove dynamic addresses if link is down
	net: ipv6: Compare lwstate in detecting duplicate nexthops
	vrf: fix bug_on triggered by rx when destroying a vrf
	rds: tcp: use sock_create_lite() to create the accept socket
	brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx()
	cfg80211: Define nla_policy for NL80211_ATTR_LOCAL_MESH_POWER_MODE
	cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES
	cfg80211: Check if PMKID attribute is of expected size
	irqchip/gic-v3: Fix out-of-bound access in gic_set_affinity
	parisc: Report SIGSEGV instead of SIGBUS when running out of stack
	parisc: use compat_sys_keyctl()
	parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs
	parisc/mm: Ensure IRQs are off in switch_mm()
	tools/lib/lockdep: Reduce MAX_LOCK_DEPTH to avoid overflowing lock_chain/: Depth
	kernel/extable.c: mark core_kernel_text notrace
	mm/list_lru.c: fix list_lru_count_node() to be race free
	fs/dcache.c: fix spin lockup issue on nlru->lock
	checkpatch: silence perl 5.26.0 unescaped left brace warnings
	binfmt_elf: use ELF_ET_DYN_BASE only for PIE
	arm: move ELF_ET_DYN_BASE to 4MB
	arm64: move ELF_ET_DYN_BASE to 4GB / 4MB
	powerpc: move ELF_ET_DYN_BASE to 4GB / 4MB
	s390: reduce ELF_ET_DYN_BASE
	exec: Limit arg stack to at most 75% of _STK_LIM
	vt: fix unchecked __put_user() in tioclinux ioctls
	mnt: In umount propagation reparent in a separate pass
	mnt: In propgate_umount handle visiting mounts in any order
	mnt: Make propagate_umount less slow for overlapping mount propagation trees
	selftests/capabilities: Fix the test_execve test
	tpm: Get rid of chip->pdev
	tpm: Provide strong locking for device removal
	Add "shutdown" to "struct class".
	tpm: Issue a TPM2_Shutdown for TPM2 devices.
	mm: fix overflow check in expand_upwards()
	crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD
	crypto: atmel - only treat EBUSY as transient if backlog
	crypto: sha1-ssse3 - Disable avx2
	crypto: caam - fix signals handling
	sched/topology: Fix overlapping sched_group_mask
	sched/topology: Optimize build_group_mask()
	PM / wakeirq: Convert to SRCU
	PM / QoS: return -EINVAL for bogus strings
	tracing: Use SOFTIRQ_OFFSET for softirq dectection for more accurate results
	KVM: x86: disable MPX if host did not enable MPX XSAVE features
	kvm: vmx: Do not disable intercepts for BNDCFGS
	kvm: x86: Guest BNDCFGS requires guest MPX support
	kvm: vmx: Check value written to IA32_BNDCFGS
	kvm: vmx: allow host to access guest MSR_IA32_BNDCFGS
	Linux 4.4.78

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-21 09:14:57 +02:00
Sahitya Tummala
68b0f5d85b fs/dcache.c: fix spin lockup issue on nlru->lock
commit b17c070fb624cf10162cf92ea5e1ec25cd8ac176 upstream.

__list_lru_walk_one() acquires nlru spin lock (nlru->lock) for longer
duration if there are more number of items in the lru list.  As per the
current code, it can hold the spin lock for upto maximum UINT_MAX
entries at a time.  So if there are more number of items in the lru
list, then "BUG: spinlock lockup suspected" is observed in the below
path:

  spin_bug+0x90
  do_raw_spin_lock+0xfc
  _raw_spin_lock+0x28
  list_lru_add+0x28
  dput+0x1c8
  path_put+0x20
  terminate_walk+0x3c
  path_lookupat+0x100
  filename_lookup+0x6c
  user_path_at_empty+0x54
  SyS_faccessat+0xd0
  el0_svc_naked+0x24

This nlru->lock is acquired by another CPU in this path -

  d_lru_shrink_move+0x34
  dentry_lru_isolate_shrink+0x48
  __list_lru_walk_one.isra.10+0x94
  list_lru_walk_node+0x40
  shrink_dcache_sb+0x60
  do_remount_sb+0xbc
  do_emergency_remount+0xb0
  process_one_work+0x228
  worker_thread+0x2e0
  kthread+0xf4
  ret_from_fork+0x10

Fix this lockup by reducing the number of entries to be shrinked from
the lru list to 1024 at once.  Also, add cond_resched() before
processing the lru list again.

Link: http://marc.info/?t=149722864900001&r=1&w=2
Link: http://lkml.kernel.org/r/1498707575-2472-1-git-send-email-stummala@codeaurora.org
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Suggested-by: Jan Kara <jack@suse.cz>
Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Polakov <apolyakov@beget.ru>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-21 07:44:56 +02:00
Dmitry Shmidt
e9a82a4cbe This is the 4.4.45 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAliJpBoACgkQONu9yGCS
 aT54KRAAm2BjHOgU3FlM/mTal6ZVNIPKS/Xy9W0YXdQ+9URDKWNb0fwuqWAsf7LP
 n6ozLIB2n8FNlMWro7VHVNXKiUtw3BSRcjNamMm61XQcR1g0xY4iW6uhtpoTblAG
 PdeK3WAUfROxJEAxciFSTqfPKgSDQeaQRDSG10KTP5qIAPQM0T0/VU+20K0w7Cbf
 UZEJaGDOZS0XIRvNOak2DvQQxeXzwfvY5JTdx/MBOHw6e1MPfndeuhRFDJrIeOZC
 hKaG1ipkMQANcftHWTmJQ0gZEZMgVokqDtyQO3hqyrqLgVChM24j6mD7KvguCfPQ
 +ixC5oDQzBMQnp2uienP6FbDg1BZjHxO2R8z0vscXk++QtB3Mjxk8LBKZqeA636k
 E1fuGCrRf6Ec/0d7loMqOOO4KCUxOu+0JuhmlvmQDtrtGvQa5Qqd5WEF8ecOm6Y+
 5yKI11P5yiFANEkz4ysfTlyEltvIxp4Psu0YBrnVM6x5vNYEnr9wuGdikL21FI6F
 kS2FRB9+u2H4n2qNz7PGMt0tPub/F34W7RvD/zII4wqRrFz3wtw3UufAGgiT6X2n
 EIye5DErGfDcpHJ13kKYd7kCXl1u1y8tsBISRqYxl1sqshIZis0ktsb3ZtE5NMXF
 Qbh72lvpUU78E452ER1XDmk6keb98zUWbOtlBfbqJZ4iVpQ4GGY=
 =lShl
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.45' into android-4.4.y

This is the 4.4.45 stable release
2017-01-26 13:42:20 -08:00
Eric W. Biederman
4a6716f165 mnt: Protect the mountpoint hashtable with mount_lock
commit 3895dbf8985f656675b5bde610723a29cbce3fa7 upstream.

Protecting the mountpoint hashtable with namespace_sem was sufficient
until a call to umount_mnt was added to mntput_no_expire.  At which
point it became possible for multiple calls of put_mountpoint on
the same hash chain to happen on the same time.

Kristen Johansen <kjlx@templeofstupid.com> reported:
> This can cause a panic when simultaneous callers of put_mountpoint
> attempt to free the same mountpoint.  This occurs because some callers
> hold the mount_hash_lock, while others hold the namespace lock.  Some
> even hold both.
>
> In this submitter's case, the panic manifested itself as a GP fault in
> put_mountpoint() when it called hlist_del() and attempted to dereference
> a m_hash.pprev that had been poisioned by another thread.

Al Viro observed that the simple fix is to switch from using the namespace_sem
to the mount_lock to protect the mountpoint hash table.

I have taken Al's suggested patch moved put_mountpoint in pivot_root
(instead of taking mount_lock an additional time), and have replaced
new_mountpoint with get_mountpoint a function that does the hash table
lookup and addition under the mount_lock.   The introduction of get_mounptoint
ensures that only the mount_lock is needed to manipulate the mountpoint
hashtable.

d_set_mounted is modified to only set DCACHE_MOUNTED if it is not
already set.  This allows get_mountpoint to use the setting of
DCACHE_MOUNTED to ensure adding a struct mountpoint for a dentry
happens exactly once.

Fixes: ce07d891a0 ("mnt: Honor MNT_LOCKED when detaching mounts")
Reported-by: Krister Johansen <kjlx@templeofstupid.com>
Suggested-by: Al Viro <viro@ZenIV.linux.org.uk>
Acked-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-19 20:17:21 +01:00
Dmitry Shmidt
aa349c0a96 This is the 4.4.19 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXuIDJAAoJEDjbvchgkmk+i10QALySg/PFXDJ6AwUskGbetHBz
 RnsJ8WzjtzBR5vAyaru2vkD/GhFmM3ziG8guQK3uWGhhfpB+CPJjDmYIY1O5Djma
 CviyB6UsEIuf2zN7U70WSmjJ/FyD7XRqjGnEX9u5YGS4WQTFPnPttE4HE82ErEEW
 IocnBGFZriGye9D/2O6OjTDgIusLsZ6WKawK0OyeKiUrTUsmhLBtW0nfMHd/snNw
 4Aas0j6g5tjYrNBUyKqmkYhi7S2kFyZ7QH1vqrXxUHu4CNslTa6i1VTkQ+uVxbuF
 Vw9DLP6KEmB/Q5KyIVFMmEv6E5vvgymv7rrQ4c7pu6vqmHzbdtaWxZFM18EnIXOk
 qe8/9wzF4ahw+h/0ddmjpjmWi/SRYG8PmobgTWmIqJl+SNq4VK2G/GRkWce45EDi
 lMO6UI4qUd8vMw1OJOdKwp8C/D+l5V1qrVlQTVba8IJsH2fKFw9aSKAGwpppawfl
 CiESwHhSINGfhGzDyYS/keo1JM0KDyGc3EYQG5DaSzNZu4jqkhNPjBlQEOJug3/I
 6LDrWQo4+qC6vJJ836NyRvakv1WDL8AsHmTOuiW8h8LzcGsaxac9L7HMRgwItXAs
 aWTXg2eBoJXkBQalglvhSzGqBJl2ytlu0Efxg97zEL1huZuYDdzf9tO7hqMujZhc
 k+SnQTS6JXVuDe46uDyb
 =JLSE
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.19' into android-4.4.y

This is the 4.4.19 stable release
2016-08-22 14:09:08 -07:00
Wei Fang
92f71339bc fs/dcache.c: avoid soft-lockup in dput()
commit 47be61845c775643f1aa4d2a54343549f943c94c upstream.

We triggered soft-lockup under stress test which
open/access/write/close one file concurrently on more than
five different CPUs:

WARN: soft lockup - CPU#0 stuck for 11s! [who:30631]
...
[<ffffffc0003986f8>] dput+0x100/0x298
[<ffffffc00038c2dc>] terminate_walk+0x4c/0x60
[<ffffffc00038f56c>] path_lookupat+0x5cc/0x7a8
[<ffffffc00038f780>] filename_lookup+0x38/0xf0
[<ffffffc000391180>] user_path_at_empty+0x78/0xd0
[<ffffffc0003911f4>] user_path_at+0x1c/0x28
[<ffffffc00037d4fc>] SyS_faccessat+0xb4/0x230

->d_lock trylock may failed many times because of concurrently
operations, and dput() may execute a long time.

Fix this by replacing cpu_relax() with cond_resched().
dput() used to be sleepable, so make it sleepable again
should be safe.

Signed-off-by: Wei Fang <fangwei1@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:50 +02:00
Dmitry Shmidt
b558f17a13 This is the 4.4.16 stable release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJXmOXmAAoJEDjbvchgkmk+QYIP/1S8oBZsvjfDzvH8t63HyLeH
 i43MFlYoFAqUIZc002XpluSvZ8uHoG+r7R8Hq3wmv48wxe3M6OBnMdBVTht6mPw+
 t5OLTZr40lWaJm2EIi4aekueMIrCgmL+Et+IFYv7ZVBuYLteVcfny+zdq4EqGmgj
 /a19+L/sTTr4SHtJIhHxWhiVJ9fVMgQk/N3VgQmIiNF2+lVbiFI7QQiDPLbFl0KK
 CM4ETO22HxHCYilGpzhpSMsHCxv12VqNaXNLAsPAepGGW7PqvUmrEWAqgwsbOfRc
 GxTLNk0dUgJqMrfEpQ8ZOMlgzvCAYG2jZuNSuT+nuzrWSUP+WOGRi9TTTxp1CYuZ
 PHlhNTH7ZnqosxJUUZS2d9N5ygpqD48Rhlfl824YzOWCy94VeUnedkVLb20uJwPF
 Y5aQ5WjktBC9why5e4OgGQERvx/U9KTk8E1zRfZZPc2oft9My0YxuemjjKAKZiYN
 ne4WhXbgOJTQkAoZwh2xqny3bWyEaoSrWpQ3R7bBJ9SIRLEOdCKzKpduDbAnbMP7
 QWgQOQC/6qA1mKqjrqF4KPA1Quo9PcUK2Ajh523ewMGCowgY90vyejAgh4Q8g0GC
 fKlx+jJDoKVDbQ8v4hc9PPHMsNNIKT9a1ptwVS3lE+bq1D5Ffm57A4/uOTMYHVab
 gKqu8h1CA0MCVBsH3nNA
 =nY8S
 -----END PGP SIGNATURE-----

Merge tag 'v4.4.16' into android-4.4.y

This is the 4.4.16 stable release

Change-Id: Ibaf7b7e03695e1acebc654a2ca1a4bfcc48fcea4
2016-08-01 15:57:55 -07:00
Al Viro
2b11d80e1a fix d_walk()/non-delayed __d_free() race
commit 3d56c25e3bb0726a5c5e16fc2d9e38f8ed763085 upstream.

Ascend-to-parent logics in d_walk() depends on all encountered child
dentries not getting freed without an RCU delay.  Unfortunately, in
quite a few cases it is not true, with hard-to-hit oopsable race as
the result.

Fortunately, the fix is simiple; right now the rule is "if it ever
been hashed, freeing must be delayed" and changing it to "if it
ever had a parent, freeing must be delayed" closes that hole and
covers all cases the old rule used to cover.  Moreover, pipes and
sockets remain _not_ covered, so we do not introduce RCU delay in
the cases which are the reason for having that delay conditional
in the first place.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-06-24 10:18:21 -07:00
Miklos Szeredi
c452dfc332 fs: add file_dentry()
commit d101a125954eae1d397adda94ca6319485a50493 upstream.

This series fixes bugs in nfs and ext4 due to 4bacc9c923 ("overlayfs:
Make f_path always point to the overlay and f_inode to the underlay").

Regular files opened on overlayfs will result in the file being opened on
the underlying filesystem, while f_path points to the overlayfs
mount/dentry.

This confuses filesystems which get the dentry from struct file and assume
it's theirs.

Add a new helper, file_dentry() [*], to get the filesystem's own dentry
from the file.  This checks file->f_path.dentry->d_flags against
DCACHE_OP_REAL, and returns file->f_path.dentry if DCACHE_OP_REAL is not
set (this is the common, non-overlayfs case).

In the uncommon case it will call into overlayfs's ->d_real() to get the
underlying dentry, matching file_inode(file).

The reason we need to check against the inode is that if the file is copied
up while being open, d_real() would return the upper dentry, while the open
file comes from the lower dentry.

[*] If possible, it's better simply to use file_inode() instead.

Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Daniel Axtens <dja@axtens.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:12 +09:00
Guenter Roeck
cdaa7dcb05 fs: Export d_absolute_path
The 0-day build bot reports the following build error, seen if SDCARD_FS
is built as module.

ERROR: "d_absolute_path" undefined!

Fixes: 84a1b7d3d3 ("Included sdcardfs source code for kernel 3.0")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
2016-03-24 10:32:35 -07:00
Al Viro
53729dbbd2 use ->d_seq to get coherency between ->d_inode and ->d_flags
commit a528aca7f359f4b0b1d72ae406097e491a5ba9ea upstream.

Games with ordering and barriers are way too brittle.  Just
bump ->d_seq before and after updating ->d_inode and ->d_flags
type bits, so that verifying ->d_seq would guarantee they are
coherent.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:34:49 -08:00
Eric W. Biederman
a03e283bf5 dcache: Reduce the scope of i_lock in d_splice_alias
i_lock is only needed until __d_find_any_alias calls dget on the alias
dentry.  After that the reference to new ensures that dentry_kill and
d_delete will not remove the inode from the dentry, and remove the
dentry from the inode->d_entry list.

The inode i_lock came to be held over the the __d_move calls in
d_splice_alias through a series of introduction of locks with
increasing smaller scope.  First it was the dcache_lock, then
it was the dcache_inode_lock, and finally inode->i_lock.

Furthermore inode->i_lock is not held over any other calls
to d_move or __d_move so it can not provide any meaningful
rename protection.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21 02:34:37 -04:00
Eric W. Biederman
cde93be45a dcache: Handle escaped paths in prepend_path
A rename can result in a dentry that by walking up d_parent
will never reach it's mnt_root.  For lack of a better term
I call this an escaped path.

prepend_path is called by four different functions __d_path,
d_absolute_path, d_path, and getcwd.

__d_path only wants to see paths are connected to the root it passes
in.  So __d_path needs prepend_path to return an error.

d_absolute_path similarly wants to see paths that are connected to
some root.  Escaped paths are not connected to any mnt_root so
d_absolute_path needs prepend_path to return an error greater
than 1.  So escaped paths will be treated like paths on lazily
unmounted mounts.

getcwd needs to prepend "(unreachable)" so getcwd also needs
prepend_path to return an error.

d_path is the interesting hold out.  d_path just wants to print
something, and does not care about the weird cases.  Which raises
the question what should be printed?

Given that <escaped_path>/<anything> should result in -ENOENT I
believe it is desirable for escaped paths to be printed as empty
paths.  As there are not really any meaninful path components when
considered from the perspective of a mount tree.

So tweak prepend_path to return an empty path with an new error
code of 3 when it encounters an escaped path.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-08-21 02:34:36 -04:00
Mel Gorman
4248b0da46 fs, file table: reinit files_stat.max_files after deferred memory initialisation
Dave Hansen reported the following;

	My laptop has been behaving strangely with 4.2-rc2.  Once I log
	in to my X session, I start getting all kinds of strange errors
	from applications and see this in my dmesg:

        	VFS: file-max limit 8192 reached

The problem is that the file-max is calculated before memory is fully
initialised and miscalculates how much memory the kernel is using.  This
patch recalculates file-max after deferred memory initialisation.  Note
that using memory hotplug infrastructure would not have avoided this
problem as the value is not recalculated after memory hot-add.

4.1:             files_stat.max_files = 6582781
4.2-rc2:         files_stat.max_files = 8192
4.2-rc2 patched: files_stat.max_files = 6562467

Small differences with the patch applied and 4.1 but not enough to matter.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Dave Hansen <dave.hansen@intel.com>
Cc: Nicolai Stange <nicstange@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Alex Ng <alexng@microsoft.com>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-08-07 04:39:40 +03:00
Al Viro
75a6f82a0d freeing unlinked file indefinitely delayed
Normally opening a file, unlinking it and then closing will have
the inode freed upon close() (provided that it's not otherwise busy and
has no remaining links, of course).  However, there's one case where that
does *not* happen.  Namely, if you open it by fhandle with cold dcache,
then unlink() and close().

	In normal case you get d_delete() in unlink(2) notice that dentry
is busy and unhash it; on the final dput() it will be forcibly evicted from
dcache, triggering iput() and inode removal.  In this case, though, we end
up with *two* dentries - disconnected (created by open-by-fhandle) and
regular one (used by unlink()).  The latter will have its reference to inode
dropped just fine, but the former will not - it's considered hashed (it
is on the ->s_anon list), so it will stay around until the memory pressure
will finally do it in.  As the result, we have the final iput() delayed
indefinitely.  It's trivial to reproduce -

void flush_dcache(void)
{
        system("mount -o remount,rw /");
}

static char buf[20 * 1024 * 1024];

main()
{
        int fd;
        union {
                struct file_handle f;
                char buf[MAX_HANDLE_SZ];
        } x;
        int m;

        x.f.handle_bytes = sizeof(x);
        chdir("/root");
        mkdir("foo", 0700);
        fd = open("foo/bar", O_CREAT | O_RDWR, 0600);
        close(fd);
        name_to_handle_at(AT_FDCWD, "foo/bar", &x.f, &m, 0);
        flush_dcache();
        fd = open_by_handle_at(AT_FDCWD, &x.f, O_RDWR);
        unlink("foo/bar");
        write(fd, buf, sizeof(buf));
        system("df .");			/* 20Mb eaten */
        close(fd);
        system("df .");			/* should've freed those 20Mb */
        flush_dcache();
        system("df .");			/* should be the same as #2 */
}

will spit out something like
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 303843      1131 100% /
Filesystem     1K-blocks   Used Available Use% Mounted on
/dev/root         322023 283282     21692  93% /
- inode gets freed only when dentry is finally evicted (here we trigger
than by remount; normally it would've happened in response to memory
pressure hell knows when).

Cc: stable@vger.kernel.org # v2.6.38+; earlier ones need s/kill_it/unhash_it/
Acked-by: J. Bruce Fields <bfields@fieldses.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-07-12 11:27:04 -04:00
Linus Torvalds
1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds
0cbee99269 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace updates from Eric Biederman:
 "Long ago and far away when user namespaces where young it was realized
  that allowing fresh mounts of proc and sysfs with only user namespace
  permissions could violate the basic rule that only root gets to decide
  if proc or sysfs should be mounted at all.

  Some hacks were put in place to reduce the worst of the damage could
  be done, and the common sense rule was adopted that fresh mounts of
  proc and sysfs should allow no more than bind mounts of proc and
  sysfs.  Unfortunately that rule has not been fully enforced.

  There are two kinds of gaps in that enforcement.  Only filesystems
  mounted on empty directories of proc and sysfs should be ignored but
  the test for empty directories was insufficient.  So in my tree
  directories on proc, sysctl and sysfs that will always be empty are
  created specially.  Every other technique is imperfect as an ordinary
  directory can have entries added even after a readdir returns and
  shows that the directory is empty.  Special creation of directories
  for mount points makes the code in the kernel a smidge clearer about
  it's purpose.  I asked container developers from the various container
  projects to help test this and no holes were found in the set of mount
  points on proc and sysfs that are created specially.

  This set of changes also starts enforcing the mount flags of fresh
  mounts of proc and sysfs are consistent with the existing mount of
  proc and sysfs.  I expected this to be the boring part of the work but
  unfortunately unprivileged userspace winds up mounting fresh copies of
  proc and sysfs with noexec and nosuid clear when root set those flags
  on the previous mount of proc and sysfs.  So for now only the atime,
  read-only and nodev attributes which userspace happens to keep
  consistent are enforced.  Dealing with the noexec and nosuid
  attributes remains for another time.

  This set of changes also addresses an issue with how open file
  descriptors from /proc/<pid>/ns/* are displayed.  Recently readlink of
  /proc/<pid>/fd has been triggering a WARN_ON that has not been
  meaningful since it was added (as all of the code in the kernel was
  converted) and is not now actively wrong.

  There is also a short list of issues that have not been fixed yet that
  I will mention briefly.

  It is possible to rename a directory from below to above a bind mount.
  At which point any directory pointers below the renamed directory can
  be walked up to the root directory of the filesystem.  With user
  namespaces enabled a bind mount of the bind mount can be created
  allowing the user to pick a directory whose children they can rename
  to outside of the bind mount.  This is challenging to fix and doubly
  so because all obvious solutions must touch code that is in the
  performance part of pathname resolution.

  As mentioned above there is also a question of how to ensure that
  developers by accident or with purpose do not introduce exectuable
  files on sysfs and proc and in doing so introduce security regressions
  in the current userspace that will not be immediately obvious and as
  such are likely to require breaking userspace in painful ways once
  they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  vfs: Remove incorrect debugging WARN in prepend_path
  mnt: Update fs_fully_visible to test for permanently empty directories
  sysfs: Create mountpoints with sysfs_create_mount_point
  sysfs: Add support for permanently empty directories to serve as mount points.
  kernfs: Add support for always empty directories.
  proc: Allow creating permanently empty directories that serve as mount points
  sysctl: Allow creating permanently empty directories that serve as mountpoints.
  fs: Add helper functions for permanently empty directories.
  vfs: Ignore unlocked mounts in fs_fully_visible
  mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
  mnt: Refactor the logic for mounting sysfs and proc in a user namespace
2015-07-03 15:20:57 -07:00
Eric W. Biederman
93e3bce628 vfs: Remove incorrect debugging WARN in prepend_path
The warning message in prepend_path is unclear and outdated.  It was
added as a warning that the mechanism for generating names of pseudo
files had been removed from prepend_path and d_dname should be used
instead.  Unfortunately the warning reads like a general warning,
making it unclear what to do with it.

Remove the warning.  The transition it was added to warn about is long
over, and I added code several years ago which in rare cases causes
the warning to fire on legitimate code, and the warning is now firing
and scaring people for no good reason.

Cc: stable@vger.kernel.org
Reported-by: Ivan Delalande <colona@arista.com>
Reported-by: Omar Sandoval <osandov@osandov.com>
Fixes: f48cfddc67 ("vfs: In d_path don't call d_dname on a mount point")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2015-07-01 10:36:51 -05:00
Linus Torvalds
43224b96af Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner:
 "A rather largish update for everything time and timer related:

   - Cache footprint optimizations for both hrtimers and timer wheel

   - Lower the NOHZ impact on systems which have NOHZ or timer migration
     disabled at runtime.

   - Optimize run time overhead of hrtimer interrupt by making the clock
     offset updates smarter

   - hrtimer cleanups and removal of restrictions to tackle some
     problems in sched/perf

   - Some more leap second tweaks

   - Another round of changes addressing the 2038 problem

   - First step to change the internals of clock event devices by
     introducing the necessary infrastructure

   - Allow constant folding for usecs/msecs_to_jiffies()

   - The usual pile of clockevent/clocksource driver updates

  The hrtimer changes contain updates to sched, perf and x86 as they
  depend on them plus changes all over the tree to cleanup API changes
  and redundant code, which got copied all over the place.  The y2038
  changes touch s390 to remove the last non 2038 safe code related to
  boot/persistant clock"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (114 commits)
  clocksource: Increase dependencies of timer-stm32 to limit build wreckage
  timer: Minimize nohz off overhead
  timer: Reduce timer migration overhead if disabled
  timer: Stats: Simplify the flags handling
  timer: Replace timer base by a cpu index
  timer: Use hlist for the timer wheel hash buckets
  timer: Remove FIFO "guarantee"
  timers: Sanitize catchup_timer_jiffies() usage
  hrtimer: Allow hrtimer::function() to free the timer
  seqcount: Introduce raw_write_seqcount_barrier()
  seqcount: Rename write_seqcount_barrier()
  hrtimer: Fix hrtimer_is_queued() hole
  hrtimer: Remove HRTIMER_STATE_MIGRATE
  selftest: Timers: Avoid signal deadlock in leap-a-day
  timekeeping: Copy the shadow-timekeeper over the real timekeeper last
  clockevents: Check state instead of mode in suspend/resume path
  selftests: timers: Add leap-second timer edge testing to leap-a-day.c
  ntp: Do leapsecond adjustment in adjtimex read path
  time: Prevent early expiry of hrtimers[CLOCK_REALTIME] at the leap second edge
  ntp: Introduce and use SECS_PER_DAY macro instead of 86400
  ...
2015-06-22 18:57:44 -07:00
David Howells
4bacc9c923 overlayfs: Make f_path always point to the overlay and f_inode to the underlay
Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).

Using my union testsuite to set things up, before the patch I see:

	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
	...
	lr-x------. 1 root root 64 Jun  5 14:38 5 -> /a/foo107
	[root@andromeda union-testsuite]# stat /mnt/a/foo107
	...
	Device: 23h/35d Inode: 13381       Links: 1
	...
	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
	...
	Device: 23h/35d Inode: 13381       Links: 1
	...

After the patch:

	[root@andromeda union-testsuite]# bash 5</mnt/a/foo107
	[root@andromeda union-testsuite]# ls -l /proc/$$/fd/
	...
	lr-x------. 1 root root 64 Jun  5 14:22 5 -> /mnt/a/foo107
	[root@andromeda union-testsuite]# stat /mnt/a/foo107
	...
	Device: 23h/35d Inode: 40346       Links: 1
	...
	[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
	...
	Device: 23h/35d Inode: 40346       Links: 1
	...

Note the change in where /proc/$$/fd/5 points to in the ls command.  It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).

The inode accessed, however, is the lower layer.  The union layer is on device
25h/37d and the upper layer on 24h/36d.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-06-19 03:19:32 -04:00
Peter Zijlstra
a7c6f571ff seqcount: Rename write_seqcount_barrier()
I'll shortly be introducing another seqcount primitive that's useful
to provide ordering semantics and would like to use the
write_seqcount_barrier() name for that.

Seeing how there's only one user of the current primitive, lets rename
it to invalidate, as that appears what its doing.

While there, employ lockdep_assert_held() instead of
assert_spin_locked() to not generate debug code for regular kernels.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: ktkhai@parallels.com
Cc: rostedt@goodmis.org
Cc: juri.lelli@gmail.com
Cc: pang.xunlei@linaro.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: wanpeng.li@linux.intel.com
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: umgwanakikbuti@gmail.com
Link: http://lkml.kernel.org/r/20150611124743.279926217@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-06-19 00:09:56 +02:00
Al Viro
2159184ea0 d_walk() might skip too much
when we find that a child has died while we'd been trying to ascend,
we should go into the first live sibling itself, rather than its sibling.

Off-by-one in question had been introduced in "deal with deadlock in
d_walk()" and the fix needs to be backported to all branches this one
has been backported to.

Cc: stable@vger.kernel.org # 3.2 and later
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-05-28 23:45:30 -04:00
David Howells
4bf46a2726 VFS: Impose ordering on accesses of d_inode and d_flags
Impose ordering on accesses of d_inode and d_flags to avoid the need to do
this:

	if (!dentry->d_inode || d_is_negative(dentry)) {

when this:

	if (d_is_negative(dentry)) {

should suffice.

This check is especially problematic if a dentry can have its type field set
to something other than DENTRY_MISS_TYPE when d_inode is NULL (as in
unionmount).

What we really need to do is stick a write barrier between setting d_inode and
setting d_flags and a read barrier between reading d_flags and reading
d_inode.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-15 15:05:28 -04:00
J. Bruce Fields
3d330dc175 dcache: return -ESTALE not -EBUSY on distributed fs race
On a distributed filesystem it's possible for lookup to discover that a
directory it just found is already cached elsewhere in the directory
heirarchy.  The dcache won't let us keep the directory in both places,
so we have to move the dentry to the new location from the place we
previously had it cached.

If the parent has changed, then this requires all the same locks as we'd
need to do a cross-directory rename.  But we're already in lookup
holding one parent's i_mutex, so it's too late to acquire those locks in
the right order.

The (unreliable) solution in __d_unalias is to trylock() the required
locks and return -EBUSY if it fails.

I see no particular reason for returning -EBUSY, and -ESTALE is already
the result of some other lookup races on NFS.  I think -ESTALE is the
more helpful error return.  It also allows us to take advantage of the
logic Jeff Layton added in c6a9428401 "vfs: fix renameat to retry on
ESTALE errors" and ancestors, which hopefully resolves some of these
errors before they're returned to userspace.

I can reproduce these cases using NFS with:

	ssh root@$client '
		mount -olookupcache=pos '$server':'$export' /mnt/
		mkdir /mnt/TO
		mkdir /mnt/DIR
		touch /mnt/DIR/test.txt
		while true; do
			strace -e open cat /mnt/DIR/test.txt 2>&1 | grep EBUSY
		done
	'
	ssh root@$server '
		while true; do
			mv $export/DIR $export/TO/DIR
			mv $export/TO/DIR $export/DIR
		done
	'

It also helps to add some other concurrent use of the directory on the
client (e.g., "ls /mnt/TO").  And you can replace the server-side mv's
by client-side mv's that are repeatedly killed.  (If the client is
interrupted while waiting for the RENAME response then it's left with a
dentry that has to go under one parent or the other, but it doesn't yet
know which.)

Acked-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-04-11 22:24:33 -04:00
David Howells
44bdb5e5f6 VFS: Split DCACHE_FILE_TYPE into regular and special types
Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular
files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and
socket files).

d_is_reg() and d_is_special() are added to detect these subtypes and
d_is_file() is left as the union of the two.

This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to
use d_is_reg(dentry) instead.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:38 -05:00
David Howells
df1a085af1 VFS: Add a fallthrough flag for marking virtual dentries
Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is
a virtual dentry that covers another one in a lower layer that should be used
instead.  This may be recorded on medium if directory integration is stored
there.

The flag can be set with d_set_fallthru() and tested with d_is_fallthru().

Original-author: Valerie Aurora <vaurora@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-02-22 11:38:38 -05:00
Linus Torvalds
50652963ea Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS updates from Al Viro:
 "This cycle a lot of stuff sits on topical branches, so I'll be sending
  more or less one pull request per branch.

  This is the first pile; more to follow in a few.  In this one are
  several misc commits from early in the cycle (before I went for
  separate branches), plus the rework of mntput/dput ordering on umount,
  switching to use of fs_pin instead of convoluted games in
  namespace_unlock()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  switch the IO-triggering parts of umount to fs_pin
  new fs_pin killing logics
  allow attaching fs_pin to a group not associated with some superblock
  get rid of the second argument of acct_kill()
  take count and rcu_head out of fs_pin
  dcache: let the dentry count go down to zero without taking d_lock
  pull bumping refcount into ->kill()
  kill pin_put()
  mode_t whack-a-mole: chelsio
  file->f_path.dentry is pinned down for as long as the file is open...
  get rid of lustre_dump_dentry()
  gut proc_register() a bit
  kill d_validate()
  ncpfs: get rid of d_validate() nonsense
  selinuxfs: don't open-code d_genocide()
2015-02-17 14:56:45 -08:00
Andrey Ryabinin
df4c0e36f1 fs: dcache: manually unpoison dname after allocation to shut up kasan's reports
We need to manually unpoison rounded up allocation size for dname to avoid
kasan's reports in dentry_string_cmp().  When CONFIG_DCACHE_WORD_ACCESS=y
dentry_string_cmp may access few bytes beyound requested in kmalloc()
size.

dentry_string_cmp() relates on that fact that dentry allocated using
kmalloc and kmalloc internally round up allocation size.  So this is not a
bug, but this makes kasan to complain about such accesses.  To avoid such
reports we mark rounded up allocation size in shadow as accessible.

Signed-off-by: Andrey Ryabinin <a.ryabinin@samsung.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Konstantin Serebryany <kcc@google.com>
Cc: Dmitry Chernenkov <dmitryc@google.com>
Signed-off-by: Andrey Konovalov <adech.fo@gmail.com>
Cc: Yuri Gribov <tetra2005@gmail.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-13 21:21:41 -08:00
Vladimir Davydov
3f97b16320 list_lru: add helpers to isolate items
Currently, the isolate callback passed to the list_lru_walk family of
functions is supposed to just delete an item from the list upon returning
LRU_REMOVED or LRU_REMOVED_RETRY, while nr_items counter is fixed by
__list_lru_walk_one after the callback returns.  Since the callback is
allowed to drop the lock after removing an item (it has to return
LRU_REMOVED_RETRY then), the nr_items can be less than the actual number
of elements on the list even if we check them under the lock.  This makes
it difficult to move items from one list_lru_one to another, which is
required for per-memcg list_lru reparenting - we can't just splice the
lists, we have to move entries one by one.

This patch therefore introduces helpers that must be used by callback
functions to isolate items instead of raw list_del/list_move.  These are
list_lru_isolate and list_lru_isolate_move.  They not only remove the
entry from the list, but also fix the nr_items counter, making sure
nr_items always reflects the actual number of elements on the list if
checked under the appropriate lock.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:10 -08:00
Vladimir Davydov
503c358cf1 list_lru: introduce list_lru_shrink_{count,walk}
Kmem accounting of memcg is unusable now, because it lacks slab shrinker
support.  That means when we hit the limit we will get ENOMEM w/o any
chance to recover.  What we should do then is to call shrink_slab, which
would reclaim old inode/dentry caches from this cgroup.  This is what
this patch set is intended to do.

Basically, it does two things.  First, it introduces the notion of
per-memcg slab shrinker.  A shrinker that wants to reclaim objects per
cgroup should mark itself as SHRINKER_MEMCG_AWARE.  Then it will be
passed the memory cgroup to scan from in shrink_control->memcg.  For
such shrinkers shrink_slab iterates over the whole cgroup subtree under
the target cgroup and calls the shrinker for each kmem-active memory
cgroup.

Secondly, this patch set makes the list_lru structure per-memcg.  It's
done transparently to list_lru users - everything they have to do is to
tell list_lru_init that they want memcg-aware list_lru.  Then the
list_lru will automatically distribute objects among per-memcg lists
basing on which cgroup the object is accounted to.  This way to make FS
shrinkers (icache, dcache) memcg-aware we only need to make them use
memcg-aware list_lru, and this is what this patch set does.

As before, this patch set only enables per-memcg kmem reclaim when the
pressure goes from memory.limit, not from memory.kmem.limit.  Handling
memory.kmem.limit is going to be tricky due to GFP_NOFS allocations, and
it is still unclear whether we will have this knob in the unified
hierarchy.

This patch (of 9):

NUMA aware slab shrinkers use the list_lru structure to distribute
objects coming from different NUMA nodes to different lists.  Whenever
such a shrinker needs to count or scan objects from a particular node,
it issues commands like this:

        count = list_lru_count_node(lru, sc->nid);
        freed = list_lru_walk_node(lru, sc->nid, isolate_func,
                                   isolate_arg, &sc->nr_to_scan);

where sc is an instance of the shrink_control structure passed to it
from vmscan.

To simplify this, let's add special list_lru functions to be used by
shrinkers, list_lru_shrink_count() and list_lru_shrink_walk(), which
consolidate the nid and nr_to_scan arguments in the shrink_control
structure.

This will also allow us to avoid patching shrinkers that use list_lru
when we make shrink_slab() per-memcg - all we will have to do is extend
the shrink_control structure to include the target memcg and make
list_lru_shrink_{count,walk} handle this appropriately.

Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Suggested-by: Dave Chinner <david@fromorbit.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Greg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-02-12 18:54:08 -08:00
Linus Torvalds
360f54796e dcache: let the dentry count go down to zero without taking d_lock
We can be more aggressive about this, if we are clever and careful. This is subtle.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:29 -05:00
Al Viro
d6cb125b99 kill d_validate()
no users left

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-01-25 23:16:26 -05:00
Al Viro
ba00410b81 Merge branch 'iov_iter' into for-next 2014-12-08 20:39:29 -05:00
Yan, Zheng
4a7795d35e vfs: fix reference leak in d_prune_aliases()
In "d_prune_alias(): just lock the parent and call __dentry_kill()" the old
dget + d_drop + dput has been replaced with lock_parent + __dentry_kill;
unfortunately, dput() does more than just killing dentry - it also drops the
reference to parent.  New variant leaks that reference and needs dput(parent)
after killing the child off.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:07:20 -05:00
Mikulas Patocka
08d4f77222 dcache: fix kmemcheck warning in switch_names
This patch fixes kmemcheck warning in switch_names. The function
switch_names swaps inline names of two dentries. It swaps full arrays
d_iname, no matter how many bytes are really used by the strings. Reading
data beyond string ends results in kmemcheck warning.

We fix the bug by marking both arrays as fully initialized.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v3.15
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:26 -05:00
Al Viro
b5ae6b15bd merge d_materialise_unique() into d_splice_alias()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:01:19 -05:00
Al Viro
427c77d461 d_add_ci() should just accept a hashed exact match if it finds one
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-19 13:00:10 -05:00
Al Viro
ca5358ef75 deal with deadlock in d_walk()
... by not hitting rename_retry for reasons other than rename having
happened.  In other words, do _not_ restart when finding that
between unlocking the child and locking the parent the former got
into __dentry_kill().  Skip the killed siblings instead...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-03 15:22:16 -05:00
Al Viro
946e51f2bf move d_rcu from overlapping d_child to overlapping d_alias
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-11-03 15:20:29 -05:00
Al Viro
51486b900e fix inode leaks on d_splice_alias() failure exits
d_splice_alias() callers expect it to either stash the inode reference
into a new alias, or drop the inode reference.  That makes it possible
to just return d_splice_alias() result from ->lookup() instance, without
any extra housekeeping required.

Unfortunately, that should include the failure exits.  If d_splice_alias()
returns an error, it leaves the dentry it has been given negative and
thus it *must* drop the inode reference.  Easily fixed, but it goes way
back and will need backporting.

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-23 22:30:18 -04:00
Al Viro
810bb17267 take dname_external() into fs/dcache.c
never used outside and it's too low-level for legitimate uses outside
of fs/dcache.c anyway

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-12 17:09:05 -04:00
Daeseok Youn
b8314f9303 dcache: Fix no spaces at the start of a line in dcache.c
Fixed coding style in dcache.c

Signed-off-by: Daeseok Youn <daeseok.youn@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:39:02 -04:00
Al Viro
2926620145 dcache.c: call ->d_prune() regardless of d_unhashed()
the only in-tree instance checks d_unhashed() anyway,
out-of-tree code can preserve the current behaviour by
adding such check if they want it and we get an ability
to use it in cases where we *want* to be notified of
killing being inevitable before ->d_lock is dropped,
whether it's unhashed or not.  In particular, autofs
would benefit from that.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:59 -04:00
Al Viro
29355c3904 d_prune_alias(): just lock the parent and call __dentry_kill()
The only reason for games with ->d_prune() was __d_drop(), which
was needed only to force dput() into killing the sucker off.

Note that lock_parent() can be called under ->i_lock and won't
drop it, so dentry is safe from somebody managing to kill it
under us - it won't happen while we are holding ->i_lock.

__dentry_kill() is called only with ->d_lockref.count being 0
(here and when picked from shrink list) or 1 (dput() and dropping
the ancestors in shrink_dentry_list()), so it will never be called
twice - the first thing it's doing is making ->d_lockref.count
negative and once that happens, nothing will increment it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2014-10-09 02:38:59 -04:00