Commit graph

44220 commits

Author SHA1 Message Date
Greg Kroah-Hartman
cf682cc8e3 This is the 4.4.186 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl00DyQACgkQONu9yGCS
 aT7NRhAAr1yyk+Rs9H80NW2K733VFLGbT/nsmCEPwi+oS6/AiN+4U0pgi+4YCHSU
 waBXC7BBBNp/tm86zAH5fQmvdWNeCy3hTS0SXbP5BkpjNlpuTr0KM6hHT0ZiQtTE
 i8H9jqG2j0vvyylbFv0B0T4WX+7B8F4U3wH0888MrxgJGrbZYIw2G0C910zH66A5
 XSi9Lsp6xv52Q4zgea0oiKplqTBvTMDQZDxUzp4Dgd3byXu6UVqKUzI67OjkPvIO
 umQ04beAx3jkuQwcab0kqd8i+hj7/9skT9G0wwhDJLSQr7hgi06V+YWql9Y+L6GD
 H4BWHzbWssGbZIGYdPRMiqojSjjTzvLmBZyEHbsjHdNmYGyLqX/R0MPbuoeOFfBD
 eP7oQIoEwRiHH9Ys1RNQsikBqdkege1gG1kRvrAeK1YDCUpX7xWLkwDfvzWerAD5
 jjW9xZ3AYGiIRoZ2Uz8NqWash3KenHnYLulST6xlQ2yiLSadA9C869Asyl7WCtrR
 XFQd/ZJwKahQiiaItu6ZlStqfrJaJ6T0dWwTficQHdWozP8KD2m83xIXo+9OQEc/
 bcvLNpYe0dWy41ZJR2j6bqc+mpb8c+VoSmoyL2amsqIiGkBVoiQYmZ31qHNvEVeg
 QwF7949xYp7CfanJ8hgNAc31VgZSuC5nzMwuDwybCoATxDLHMRo=
 =cWZi
 -----END PGP SIGNATURE-----

Merge 4.4.186 into android-4.4-p

Changes in 4.4.186
	Input: elantech - enable middle button support on 2 ThinkPads
	samples, bpf: fix to change the buffer size for read()
	mac80211: mesh: fix RCU warning
	mwifiex: Fix possible buffer overflows at parsing bss descriptor
	dt-bindings: can: mcp251x: add mcp25625 support
	can: mcp251x: add support for mcp25625
	Input: imx_keypad - make sure keyboard can always wake up system
	ARM: davinci: da850-evm: call regulator_has_full_constraints()
	ARM: davinci: da8xx: specify dma_coherent_mask for lcdc
	md: fix for divide error in status_resync
	bnx2x: Check if transceiver implements DDM before access
	udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
	x86/ptrace: Fix possible spectre-v1 in ptrace_get_debugreg()
	x86/tls: Fix possible spectre-v1 in do_get_thread_area()
	mwifiex: Abort at too short BSS descriptor element
	mwifiex: Fix heap overflow in mwifiex_uap_parse_tail_ies()
	fscrypt: don't set policy for a dead directory
	mwifiex: Don't abort on small, spec-compliant vendor IEs
	USB: serial: ftdi_sio: add ID for isodebug v1
	USB: serial: option: add support for GosunCn ME3630 RNDIS mode
	usb: gadget: ether: Fix race between gether_disconnect and rx_submit
	usb: renesas_usbhs: add a workaround for a race condition of workqueue
	staging: comedi: dt282x: fix a null pointer deref on interrupt
	staging: comedi: amplc_pci230: fix null pointer deref on interrupt
	carl9170: fix misuse of device driver API
	VMCI: Fix integer overflow in VMCI handle arrays
	MIPS: Remove superfluous check for __linux__
	e1000e: start network tx queue only when link is up
	perf/core: Fix perf_sample_regs_user() mm check
	ARM: omap2: remove incorrect __init annotation
	be2net: fix link failure after ethtool offline test
	ppp: mppe: Add softdep to arc4
	sis900: fix TX completion
	dm verity: use message limit for data block corruption message
	kvm: x86: avoid warning on repeated KVM_SET_TSS_ADDR
	ARC: hide unused function unw_hdr_alloc
	s390: fix stfle zero padding
	s390/qdio: (re-)initialize tiqdio list entries
	s390/qdio: don't touch the dsci in tiqdio_add_input_queues()
	KVM: x86: protect KVM_CREATE_PIT/KVM_CREATE_PIT2 with kvm->lock
	Linux 4.4.186

Change-Id: Ib318f4f4b7c21ffdccc7807d8bb26c0b46557129
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-22 15:54:20 +02:00
Hongjie Fang
da612a5ae7 fscrypt: don't set policy for a dead directory
commit 5858bdad4d0d0fc18bf29f34c3ac836e0b59441f upstream.

The directory may have been removed when entering
fscrypt_ioctl_set_policy().  If so, the empty_dir() check will return
error for ext4 file system.

ext4_rmdir() sets i_size = 0, then ext4_empty_dir() reports an error
because 'inode->i_size < EXT4_DIR_REC_LEN(1) + EXT4_DIR_REC_LEN(2)'.  If
the fs is mounted with errors=panic, it will trigger a panic issue.

Add the check IS_DEADDIR() to fix this problem.

Fixes: 9bd8212f98 ("ext4 crypto: add encryption policy and password salt support")
Cc: <stable@vger.kernel.org> # v4.1+
Signed-off-by: Hongjie Fang <hongjiefang@asrmicro.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:07:10 +02:00
Steven J. Magnani
63d4f19662 udf: Fix incorrect final NOT_ALLOCATED (hole) extent length
commit fa33cdbf3eceb0206a4f844fe91aeebcf6ff2b7a upstream.

In some cases, using the 'truncate' command to extend a UDF file results
in a mismatch between the length of the file's extents (specifically, due
to incorrect length of the final NOT_ALLOCATED extent) and the information
(file) length. The discrepancy can prevent other operating systems
(i.e., Windows 10) from opening the file.

Two particular errors have been observed when extending a file:

1. The final extent is larger than it should be, having been rounded up
   to a multiple of the block size.

B. The final extent is not shorter than it should be, due to not having
   been updated when the file's information length was increased.

[JK: simplified udf_do_extend_final_block(), fixed up some types]

Fixes: 2c948b3f86 ("udf: Avoid IO in udf_clear_inode")
CC: stable@vger.kernel.org
Signed-off-by: Steven J. Magnani <steve@digidescorp.com>
Link: https://lore.kernel.org/r/1561948775-5878-1-git-send-email-steve@digidescorp.com
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-21 09:07:08 +02:00
Greg Kroah-Hartman
60a02c0d81 This is the 4.4.185 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0lmj0ACgkQONu9yGCS
 aT67kRAAgoQ2/imz/GJ7bBKhtql6XneFD7fPDnggxHNRGA4VvkV+KcpHIinWZtXp
 QV5sq8u092SlvQHYolLcVyK3OsMzfzjsK0HMXJRxEa4UsI38hEDBqpgDaUIYiKrb
 l1NUsQLy8foFobbKsTkdiN3+RBdjkTFWZ/SxDXg1T6EicbntvpP/E0QXQ4J+H9TQ
 Wdvx0wMS1m8RkBQzoUxyqz10vCDP67e2NPIBotaKOQS3iYa1he6nU5Y7mLpJOw7s
 mEPkwnT9+DnDENhL7YHo37JPIgSKz+kLdnw6xLzKSIEOl7MUylu7ocbxcJlAHyxC
 LkLRA6E3vCBu540ajHhyjIfN4IPln78qnV1ciGmyTE+YNWtvyPqZVERDXqh9thM9
 4lPUm20HAyhopmxdYfoAq933Ki8IH/mTc3vXpcXbVnAOp2uZfJ0nhVOyhqov9B+t
 p6ct9t9/1ARPITmFTGNWFvTTRT+OoPLDi6ND1o+8ukXRmn9+sl0/VpWsJNpdIAiM
 ss93JfdVTFkR84Oc0zL4Cg55q01BvJYiGtj2oeU5cBECXvXAEvR3ro17b17fmCFh
 jk5xcckickpSK1g8iXNsC8EAAL0tIE+vQemxmJDPCcV3rM83addo9Lwqyme6Wa/3
 F702sfcOSvRvoEfd+9WZcKV01GfTEFu4jK7onrvL16Xn60Wq1iM=
 =Qr7D
 -----END PGP SIGNATURE-----

Merge 4.4.185 into android-4.4-p

Changes in 4.4.185
	fs/binfmt_flat.c: make load_flat_shared_library() work
	mm/page_idle.c: fix oops because end_pfn is larger than max_pfn
	scsi: vmw_pscsi: Fix use-after-free in pvscsi_queue_lck()
	tracing: Silence GCC 9 array bounds warning
	gcc-9: silence 'address-of-packed-member' warning
	usb: chipidea: udc: workaround for endpoint conflict issue
	Input: uinput - add compat ioctl number translation for UI_*_FF_UPLOAD
	apparmor: enforce nullbyte at end of tag string
	parport: Fix mem leak in parport_register_dev_model
	parisc: Fix compiler warnings in float emulation code
	IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
	MIPS: uprobes: remove set but not used variable 'epc'
	net: hns: Fix loopback test failed at copper ports
	sparc: perf: fix updated event period in response to PERF_EVENT_IOC_PERIOD
	scripts/checkstack.pl: Fix arm64 wrong or unknown architecture
	scsi: ufs: Check that space was properly alloced in copy_query_response
	s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
	hwmon: (pmbus/core) Treat parameters as paged if on multiple pages
	Btrfs: fix race between readahead and device replace/removal
	btrfs: start readahead also in seed devices
	can: flexcan: fix timeout when set small bitrate
	can: purge socket error queue on sock destruct
	ARM: imx: cpuidle-imx6sx: Restrict the SW2ISO increase to i.MX6SX
	Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
	Bluetooth: Fix regression with minimum encryption key size alignment
	SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
	cfg80211: fix memory leak of wiphy device name
	mac80211: drop robust management frames from unknown TA
	perf ui helpline: Use strlcpy() as a shorter form of strncpy() + explicit set nul
	perf help: Remove needless use of strncpy()
	9p/rdma: do not disconnect on down_interruptible EAGAIN
	9p: acl: fix uninitialized iattr access
	9p/rdma: remove useless check in cm_event_handler
	9p: p9dirent_read: check network-provided name length
	net/9p: include trans_common.h to fix missing prototype warning.
	KVM: X86: Fix scan ioapic use-before-initialization
	ovl: modify ovl_permission() to do checks on two inodes
	x86/speculation: Allow guests to use SSBD even if host does not
	cpu/speculation: Warn on unsupported mitigations= parameter
	sctp: change to hold sk after auth shkey is created successfully
	tipc: change to use register_pernet_device
	tipc: check msg->req data len in tipc_nl_compat_bearer_disable
	team: Always enable vlan tx offload
	ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
	bonding: Always enable vlan tx offload
	net: check before dereferencing netdev_ops during busy poll
	Bluetooth: Fix faulty expression for minimum encryption key size check
	um: Compile with modern headers
	ASoC : cs4265 : readable register too low
	spi: bitbang: Fix NULL pointer dereference in spi_unregister_master
	ASoC: max98090: remove 24-bit format support if RJ is 0
	usb: gadget: fusb300_udc: Fix memory leak of fusb300->ep[i]
	usb: gadget: udc: lpc32xx: allocate descriptor with GFP_ATOMIC
	scsi: hpsa: correct ioaccel2 chaining
	ARC: Assume multiplier is always present
	ARC: fix build warning in elf.h
	MIPS: math-emu: do not use bools for arithmetic
	mfd: omap-usb-tll: Fix register offsets
	swiotlb: Make linux/swiotlb.h standalone includible
	bug.h: work around GCC PR82365 in BUG()
	MIPS: Workaround GCC __builtin_unreachable reordering bug
	ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME
	crypto: user - prevent operating on larval algorithms
	ALSA: seq: fix incorrect order of dest_client/dest_ports arguments
	ALSA: firewire-lib/fireworks: fix miss detection of received MIDI messages
	ALSA: usb-audio: fix sign unintended sign extension on left shifts
	lib/mpi: Fix karactx leak in mpi_powm
	btrfs: Ensure replaced device doesn't have pending chunk allocation
	tty: rocket: fix incorrect forward declaration of 'rp_init()'
	ARC: handle gcc generated __builtin_trap for older compiler
	arm64, vdso: Define vdso_{start,end} as array
	KVM: x86: degrade WARN to pr_warn_ratelimited
	dmaengine: imx-sdma: remove BD_INTR for channel0
	Linux 4.4.185

Change-Id: If1b1ee0b61d5f6d6fb162dc446c621d6baebfab9
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-07-10 12:57:28 +02:00
Nikolay Borisov
986543fcf5 btrfs: Ensure replaced device doesn't have pending chunk allocation
commit debd1c065d2037919a7da67baf55cc683fee09f0 upstream.

Recent FITRIM work, namely bbbf7243d62d ("btrfs: combine device update
operations during transaction commit") combined the way certain
operations are recoded in a transaction. As a result an ASSERT was added
in dev_replace_finish to ensure the new code works correctly.
Unfortunately I got reports that it's possible to trigger the assert,
meaning that during a device replace it's possible to have an unfinished
chunk allocation on the source device.

This is supposed to be prevented by the fact that a transaction is
committed before finishing the replace oepration and alter acquiring the
chunk mutex. This is not sufficient since by the time the transaction is
committed and the chunk mutex acquired it's possible to allocate a chunk
depending on the workload being executed on the replaced device. This
bug has been present ever since device replace was introduced but there
was never code which checks for it.

The correct way to fix is to ensure that there is no pending device
modification operation when the chunk mutex is acquire and if there is
repeat transaction commit. Unfortunately it's not possible to just
exclude the source device from btrfs_fs_devices::dev_alloc_list since
this causes ENOSPC to be hit in transaction commit.

Fixing that in another way would need to add special cases to handle the
last writes and forbid new ones. The looped transaction fix is more
obvious, and can be easily backported. The runtime of dev-replace is
long so there's no noticeable delay caused by that.

Reported-by: David Sterba <dsterba@suse.com>
Fixes: 391cd9df81 ("Btrfs: fix unprotected alloc list insertion during the finishing procedure of replace")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:44 +02:00
Vivek Goyal
b24be4acd1 ovl: modify ovl_permission() to do checks on two inodes
commit c0ca3d70e8d3cf81e2255a217f7ca402f5ed0862 upstream.

Right now ovl_permission() calls __inode_permission(realinode), to do
permission checks on real inode and no checks are done on overlay inode.

Modify it to do checks both on overlay inode as well as underlying inode.
Checks on overlay inode will be done with the creds of calling task while
checks on underlying inode will be done with the creds of mounter.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
[ Srivatsa: 4.4.y backport:
  - Skipped the hunk modifying non-existent function ovl_get_acl()
  - Adjusted the error path
  - Included linux/cred.h to get prototype for revert_creds() ]
Signed-off-by: Srivatsa S. Bhat (VMware) <srivatsa@csail.mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:36 +02:00
Dominique Martinet
1275a5cf02 9p: acl: fix uninitialized iattr access
[ Upstream commit e02a53d92e197706cad1627bd84705d4aa20a145 ]

iattr is passed to v9fs_vfs_setattr_dotl which does send various
values from iattr over the wire, even if it tells the server to
only look at iattr.ia_valid fields this could leak some stack data.

Link: http://lkml.kernel.org/r/1536339057-21974-2-git-send-email-asmadeus@codewreck.org
Addresses-Coverity-ID: 1195601 ("Uninitalized scalar variable")
Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-07-10 09:56:35 +02:00
Steve French
b79e58acda SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write
commit 8d526d62db907e786fd88948c75d1833d82bd80e upstream.

Some servers such as Windows 10 will return STATUS_INSUFFICIENT_RESOURCES
as the number of simultaneous SMB3 requests grows (even though the client
has sufficient credits).  Return EAGAIN on STATUS_INSUFFICIENT_RESOURCES
so that we can retry writes which fail with this status code.

This (for example) fixes large file copies to Windows 10 on fast networks.

Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:34 +02:00
Naohiro Aota
d38cac353b btrfs: start readahead also in seed devices
commit c4e0540d0ad49c8ceab06cceed1de27c4fe29f6e upstream.

Currently, btrfs does not consult seed devices to start readahead. As a
result, if readahead zone is added to the seed devices, btrfs_reada_wait()
indefinitely wait for the reada_ctl to finish.

You can reproduce the hung by modifying btrfs/163 to have larger initial
file size (e.g. xfs_io pwrite 4M instead of current 256K).

Fixes: 7414a03fbf ("btrfs: initial readahead code and prototypes")
Cc: stable@vger.kernel.org # 3.2+: ce7791ffee1e: Btrfs: fix race between readahead and device replace/removal
Cc: stable@vger.kernel.org # 3.2+
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:33 +02:00
Filipe Manana
95c8238695 Btrfs: fix race between readahead and device replace/removal
commit ce7791ffee1e1ee9f97193b817c7dd1fa6746aad upstream.

The list of devices is protected by the device_list_mutex and the device
replace code, in its finishing phase correctly takes that mutex before
removing the source device from that list. However the readahead code was
iterating that list without acquiring the respective mutex leading to
crashes later on due to invalid memory accesses:

[125671.831036] general protection fault: 0000 [#1] PREEMPT SMP
[125671.832129] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq acpi_cpufreq tpm_tis tpm ppdev evdev parport_pc psmouse sg parport
processor ser
[125671.834973] CPU: 10 PID: 19603 Comm: kworker/u32:19 Tainted: G        W       4.6.0-rc7-btrfs-next-29+ #1
[125671.834973] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[125671.834973] Workqueue: btrfs-readahead btrfs_readahead_helper [btrfs]
[125671.834973] task: ffff8801ac520540 ti: ffff8801ac918000 task.ti: ffff8801ac918000
[125671.834973] RIP: 0010:[<ffffffff81270479>]  [<ffffffff81270479>] __radix_tree_lookup+0x6a/0x105
[125671.834973] RSP: 0018:ffff8801ac91bc28  EFLAGS: 00010206
[125671.834973] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6a RCX: 0000000000000000
[125671.834973] RDX: 0000000000000000 RSI: 00000000000c1bff RDI: ffff88002ebd62a8
[125671.834973] RBP: ffff8801ac91bc70 R08: 0000000000000001 R09: 0000000000000000
[125671.834973] R10: ffff8801ac91bc70 R11: 0000000000000000 R12: ffff88002ebd62a8
[125671.834973] R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000c1bff
[125671.834973] FS:  0000000000000000(0000) GS:ffff88023fd40000(0000) knlGS:0000000000000000
[125671.834973] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[125671.834973] CR2: 000000000073cae4 CR3: 00000000b7723000 CR4: 00000000000006e0
[125671.834973] Stack:
[125671.834973]  0000000000000000 ffff8801422d5600 ffff8802286bbc00 0000000000000000
[125671.834973]  0000000000000001 ffff8802286bbc00 00000000000c1bff 0000000000000000
[125671.834973]  ffff88002e639eb8 ffff8801ac91bc80 ffffffff81270541 ffff8801ac91bcb0
[125671.834973] Call Trace:
[125671.834973]  [<ffffffff81270541>] radix_tree_lookup+0xd/0xf
[125671.834973]  [<ffffffffa04ae6a6>] reada_peer_zones_set_lock+0x3e/0x60 [btrfs]
[125671.834973]  [<ffffffffa04ae8b9>] reada_pick_zone+0x29/0x103 [btrfs]
[125671.834973]  [<ffffffffa04af42f>] reada_start_machine_worker+0x129/0x2d3 [btrfs]
[125671.834973]  [<ffffffffa04880be>] btrfs_scrubparity_helper+0x185/0x3aa [btrfs]
[125671.834973]  [<ffffffffa0488341>] btrfs_readahead_helper+0xe/0x10 [btrfs]
[125671.834973]  [<ffffffff81069691>] process_one_work+0x271/0x4e9
[125671.834973]  [<ffffffff81069dda>] worker_thread+0x1eb/0x2c9
[125671.834973]  [<ffffffff81069bef>] ? rescuer_thread+0x2b3/0x2b3
[125671.834973]  [<ffffffff8106f403>] kthread+0xd4/0xdc
[125671.834973]  [<ffffffff8149e242>] ret_from_fork+0x22/0x40
[125671.834973]  [<ffffffff8106f32f>] ? kthread_stop+0x286/0x286

So fix this by taking the device_list_mutex in the readahead code. We
can't use here the lighter approach of using a rcu_read_lock() and
rcu_read_unlock() pair together with a list_for_each_entry_rcu() call
because we end up doing calls to sleeping functions (kzalloc()) in the
respective code path.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:33 +02:00
Jann Horn
fb7adf69e0 fs/binfmt_flat.c: make load_flat_shared_library() work
commit 867bfa4a5fcee66f2b25639acae718e8b28b25a5 upstream.

load_flat_shared_library() is broken: It only calls load_flat_file() if
prepare_binprm() returns zero, but prepare_binprm() returns the number of
bytes read - so this only happens if the file is empty.

Instead, call into load_flat_file() if the number of bytes read is
non-negative. (Even if the number of bytes is zero - in that case,
load_flat_file() will see nullbytes and return a nice -ENOEXEC.)

In addition, remove the code related to bprm creds and stop using
prepare_binprm() - this code is loading a library, not a main executable,
and it only actually uses the members "buf", "file" and "filename" of the
linux_binprm struct. Instead, call kernel_read() directly.

Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com
Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses")
Signed-off-by: Jann Horn <jannh@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Kees Cook <keescook@chromium.org>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Greg Ungerer <gerg@linux-m68k.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:30 +02:00
Greg Kroah-Hartman
032bab8306 This is the 4.4.183 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl0NyDMACgkQONu9yGCS
 aT5MEA//XPA94FjtAzSiaMZlf79PAusz1FJIazheoOuhgn+dkLyrmZtO4BoADglK
 3G8M7ArAYYGg4grZgIkUdqjeTAlWu5XQm+pS8N7b2fJFiKM/p/1t4s1RitQwk2xm
 giCcqvCA+5D9FIwPsktAlRjhtLK7chmQAvxHDv8CYI60GPkMxSTLrx+7rMwj31wP
 4MR/Ll8MJAxXWXF7PPJnfDgMCIp5T8FEV+Uu3UUQc5kB0YNQofi9sFyhbCdiHSgN
 g9HIGuUKt5wV+bwzGcshRR86sumfsh0ayuEToZQHTUSWBgPvmh4Q9hyn/cFkOmAq
 tSFB9wC+DIjgGwvrT1573i1pwZBhetz8L8sgo4QGilqQl4QTbGv3PR5dpdxm4VFK
 BbzhWGmZGGd/J8KSIjRpwEEE2FoyUi/kalcYxqcIu8N+FuQVX3HvUdDkFSmwyoLB
 7/xMZb+xNkkmsuw+4JTGQ2CMQKcY4XbLIpa1cVbvDY9z923DGMJCqmMfb2FU4Rn1
 KRUXSV2Dt231IHVbXjqt7jLnMdTIq9Z8gMjH/yoX0Z4AFphvq2pk/xB3W7AkWQE3
 Q4iqSzPzT8Hij/3OJBVQTcwUajdmwjmMf+6ZvKjnECUcYv2nOuyuU6hXMDQuhH/v
 ZEB9WyH/kjUaRwjNensR+NhEmsKRKRW6eWaZ3CSJvzlUZKWMBsU=
 =8C1+
 -----END PGP SIGNATURE-----

Merge 4.4.183 into android-4.4-p

Changes in 4.4.183
	fs/fat/file.c: issue flush after the writeback of FAT
	sysctl: return -EINVAL if val violates minmax
	ipc: prevent lockup on alloc_msg and free_msg
	hugetlbfs: on restore reserve error path retain subpool reservation
	mm/cma.c: fix crash on CMA allocation if bitmap allocation fails
	mm/cma_debug.c: fix the break condition in cma_maxchunk_get()
	kernel/sys.c: prctl: fix false positive in validate_prctl_map()
	mfd: intel-lpss: Set the device in reset state when init
	mfd: twl6040: Fix device init errors for ACCCTL register
	perf/x86/intel: Allow PEBS multi-entry in watermark mode
	drm/bridge: adv7511: Fix low refresh rate selection
	ntp: Allow TAI-UTC offset to be set to zero
	f2fs: fix to avoid panic in do_recover_data()
	f2fs: fix to do sanity check on valid block count of segment
	iommu/vt-d: Set intel_iommu_gfx_mapped correctly
	ALSA: hda - Register irq handler after the chip initialization
	nvmem: core: fix read buffer in place
	fuse: retrieve: cap requested size to negotiated max_write
	nfsd: allow fh_want_write to be called twice
	x86/PCI: Fix PCI IRQ routing table memory leak
	platform/chrome: cros_ec_proto: check for NULL transfer function
	soc: mediatek: pwrap: Zero initialize rdata in pwrap_init_cipher
	clk: rockchip: Turn on "aclk_dmac1" for suspend on rk3288
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ahb" clock to SDMA
	ARM: dts: imx6sx: Specify IMX6SX_CLK_IPG as "ipg" clock to SDMA
	ARM: dts: imx6qdl: Specify IMX6QDL_CLK_IPG as "ipg" clock to SDMA
	PCI: rpadlpar: Fix leaked device_node references in add/remove paths
	PCI: rcar: Fix a potential NULL pointer dereference
	video: hgafb: fix potential NULL pointer dereference
	video: imsttfb: fix potential NULL pointer dereferences
	PCI: xilinx: Check for __get_free_pages() failure
	gpio: gpio-omap: add check for off wake capable gpios
	dmaengine: idma64: Use actual device for DMA transfers
	pwm: tiehrpwm: Update shadow register for disabling PWMs
	ARM: dts: exynos: Always enable necessary APIO_1V8 and ABB_1V8 regulators on Arndale Octa
	pwm: Fix deadlock warning when removing PWM device
	ARM: exynos: Fix undefined instruction during Exynos5422 resume
	futex: Fix futex lock the wrong page
	Revert "Bluetooth: Align minimum encryption key size for LE and BR/EDR connections"
	ALSA: seq: Cover unsubscribe_port() in list_mutex
	libata: Extend quirks for the ST1000LM024 drives with NOLPM quirk
	mm/list_lru.c: fix memory leak in __memcg_init_list_lru_node
	fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
	signal/ptrace: Don't leak unitialized kernel memory with PTRACE_PEEK_SIGINFO
	ptrace: restore smp_rmb() in __ptrace_may_access()
	i2c: acorn: fix i2c warning
	bcache: fix stack corruption by PRECEDING_KEY()
	cgroup: Use css_tryget() instead of css_tryget_online() in task_get_css()
	ASoC: cs42xx8: Add regcache mask dirty
	Drivers: misc: fix out-of-bounds access in function param_set_kgdbts_var
	scsi: lpfc: add check for loss of ndlp when sending RRQ
	scsi: bnx2fc: fix incorrect cast to u64 on shift operation
	usbnet: ipheth: fix racing condition
	KVM: x86/pmu: do not mask the value that is written to fixed PMUs
	KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION
	drm/vmwgfx: integer underflow in vmw_cmd_dx_set_shader() leading to an invalid read
	drm/vmwgfx: NULL pointer dereference from vmw_cmd_dx_view_define()
	USB: Fix chipmunk-like voice when using Logitech C270 for recording audio.
	USB: usb-storage: Add new ID to ums-realtek
	USB: serial: pl2303: add Allied Telesis VT-Kit3
	USB: serial: option: add support for Simcom SIM7500/SIM7600 RNDIS mode
	USB: serial: option: add Telit 0x1260 and 0x1261 compositions
	ax25: fix inconsistent lock state in ax25_destroy_timer
	be2net: Fix number of Rx queues used for flow hashing
	ipv6: flowlabel: fl6_sock_lookup() must use atomic_inc_not_zero
	lapb: fixed leak of control-blocks.
	neigh: fix use-after-free read in pneigh_get_next
	sunhv: Fix device naming inconsistency between sunhv_console and sunhv_reg
	mISDN: make sure device name is NUL terminated
	x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
	perf/ring_buffer: Fix exposing a temporarily decreased data_head
	perf/ring_buffer: Add ordering to rb->nest increment
	gpio: fix gpio-adp5588 build errors
	net: tulip: de4x5: Drop redundant MODULE_DEVICE_TABLE()
	i2c: dev: fix potential memory leak in i2cdev_ioctl_rdwr
	configfs: Fix use-after-free when accessing sd->s_dentry
	ia64: fix build errors by exporting paddr_to_nid()
	KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
	net: sh_eth: fix mdio access in sh_eth_close() for R-Car Gen2 and RZ/A1 SoCs
	scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
	scsi: libsas: delete sas port if expander discover failed
	Revert "crypto: crypto4xx - properly set IV after de- and encrypt"
	coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
	Abort file_remove_privs() for non-reg. files
	Linux 4.4.183

Change-Id: I26e2772a587b1dcf557adede5bcff66962f72432
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-22 09:45:38 +02:00
Alexander Lochmann
df7ba8162c Abort file_remove_privs() for non-reg. files
commit f69e749a49353d96af1a293f56b5b56de59c668a upstream.

file_remove_privs() might be called for non-regular files, e.g.
blkdev inode. There is no reason to do its job on things
like blkdev inodes, pipes, or cdevs. Hence, abort if
file does not refer to a regular inode.

AV: more to the point, for devices there might be any number of
inodes refering to given device.  Which one to strip the permissions
from, even if that made any sense in the first place?  All of them
will be observed with contents modified, after all.

Found by LockDoc (Alexander Lochmann, Horst Schirmeier and Olaf
Spinczyk)

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Alexander Lochmann <alexander.lochmann@tu-dortmund.de>
Signed-off-by: Horst Schirmeier <horst.schirmeier@tu-dortmund.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:27 +02:00
Andrea Arcangeli
8f6345a11c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
[mhocko@suse.com: stable 4.4 backport
 - drop infiniband part because of missing 5f9794dc94f59
 - drop userfaultfd_event_wait_completion hunk because of
   missing 9cd75c3cd4c3d]
 - handle binder_update_page_range because of missing 720c241924046
 - handle mlx5_ib_disassociate_ucontext - akaher@vmware.com
]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:27 +02:00
Sahitya Tummala
432030b25b configfs: Fix use-after-free when accessing sd->s_dentry
[ Upstream commit f6122ed2a4f9c9c1c073ddf6308d1b2ac10e0781 ]

In the vfs_statx() context, during path lookup, the dentry gets
added to sd->s_dentry via configfs_attach_attr(). In the end,
vfs_statx() kills the dentry by calling path_put(), which invokes
configfs_d_iput(). Ideally, this dentry must be removed from
sd->s_dentry but it doesn't if the sd->s_count >= 3. As a result,
sd->s_dentry is holding reference to a stale dentry pointer whose
memory is already freed up. This results in use-after-free issue,
when this stale sd->s_dentry is accessed later in
configfs_readdir() path.

This issue can be easily reproduced, by running the LTP test case -
sh fs_racer_file_list.sh /config
(https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/fs/racer/fs_racer_file_list.sh)

Fixes: 76ae281f63 ('configfs: fix race between dentry put and lookup')
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:26 +02:00
Wengang Wang
0b871fc866 fs/ocfs2: fix race in ocfs2_dentry_attach_lock()
commit be99ca2716972a712cde46092c54dee5e6192bf8 upstream.

ocfs2_dentry_attach_lock() can be executed in parallel threads against the
same dentry.  Make that race safe.  The race is like this:

            thread A                               thread B

(A1) enter ocfs2_dentry_attach_lock,
seeing dentry->d_fsdata is NULL,
and no alias found by
ocfs2_find_local_alias, so kmalloc
a new ocfs2_dentry_lock structure
to local variable "dl", dl1

               .....

                                    (B1) enter ocfs2_dentry_attach_lock,
                                    seeing dentry->d_fsdata is NULL,
                                    and no alias found by
                                    ocfs2_find_local_alias so kmalloc
                                    a new ocfs2_dentry_lock structure
                                    to local variable "dl", dl2.

                                                   ......

(A2) set dentry->d_fsdata with dl1,
call ocfs2_dentry_lock() and increase
dl1->dl_lockres.l_ro_holders to 1 on
success.
              ......

                                    (B2) set dentry->d_fsdata with dl2
                                    call ocfs2_dentry_lock() and increase
				    dl2->dl_lockres.l_ro_holders to 1 on
				    success.

                                                  ......

(A3) call ocfs2_dentry_unlock()
and decrease
dl2->dl_lockres.l_ro_holders to 0
on success.
             ....

                                    (B3) call ocfs2_dentry_unlock(),
                                    decreasing
				    dl2->dl_lockres.l_ro_holders, but
				    see it's zero now, panic

Link: http://lkml.kernel.org/r/20190529174636.22364-1-wen.gang.wang@oracle.com
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Reported-by: Daniel Sobe <daniel.sobe@nxp.com>
Tested-by: Daniel Sobe <daniel.sobe@nxp.com>
Reviewed-by: Changwei Ge <gechangwei@live.cn>
Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:22 +02:00
J. Bruce Fields
198a54f07f nfsd: allow fh_want_write to be called twice
[ Upstream commit 0b8f62625dc309651d0efcb6a6247c933acd8b45 ]

A fuzzer recently triggered lockdep warnings about potential sb_writers
deadlocks caused by fh_want_write().

Looks like we aren't careful to pair each fh_want_write() with an
fh_drop_write().

It's not normally a problem since fh_put() will call fh_drop_write() for
us.  And was OK for NFSv3 where we'd do one operation that might call
fh_want_write(), and then put the filehandle.

But an NFSv4 protocol fuzzer can do weird things like call unlink twice
in a compound, and then we get into trouble.

I'm a little worried about this approach of just leaving everything to
fh_put().  But I think there are probably a lot of
fh_want_write()/fh_drop_write() imbalances so for now I think we need it
to be more forgiving.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:20 +02:00
Kirill Smelkov
e6779b264d fuse: retrieve: cap requested size to negotiated max_write
[ Upstream commit 7640682e67b33cab8628729afec8ca92b851394f ]

FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header + that
max_write bytes. A filesystem server is free to set its max_write in
anywhere in the range between [1*page, fc->max_pages*page]. In particular
go-fuse[2] sets max_write by default as 64K, wheres default fc->max_pages
corresponds to 128K. Libfuse also allows users to configure max_write, but
by default presets it to possible maximum.

If max_write is < fc->max_pages*page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract, will
be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server request
buffer. In turn the filesystem server could get stuck waiting indefinitely
for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK which is
understood by clients as that NOTIFY_REPLY was queued and will be sent
back.

Cap requested size to negotiate max_write to avoid the problem.  This
aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages.  This
way it should not hurt NOTIFY_RETRIEVE semantic if we return less data than
was originally requested.

Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.

[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse

Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Cc: Han-Wen Nienhuys <hanwen@google.com>
Cc: Jakob Unterwurzacher <jakobunt@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:20 +02:00
Chao Yu
9e4ed17b94 f2fs: fix to do sanity check on valid block count of segment
[ Upstream commit e95bcdb2fefa129f37bd9035af1d234ca92ee4ef ]

As Jungyeon reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=203233

- Overview
When mounting the attached crafted image and running program, following errors are reported.
Additionally, it hangs on sync after running program.

The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y

- Reproduces
cc poc_13.c
mkdir test
mount -t f2fs tmp.img test
cp a.out test
cd test
sudo ./a.out
sync

- Kernel messages
 F2FS-fs (sdb): Bitmap was wrongly set, blk:4608
 kernel BUG at fs/f2fs/segment.c:2102!
 RIP: 0010:update_sit_entry+0x394/0x410
 Call Trace:
  f2fs_allocate_data_block+0x16f/0x660
  do_write_page+0x62/0x170
  f2fs_do_write_node_page+0x33/0xa0
  __write_node_page+0x270/0x4e0
  f2fs_sync_node_pages+0x5df/0x670
  f2fs_write_checkpoint+0x372/0x1400
  f2fs_sync_fs+0xa3/0x130
  f2fs_do_sync_file+0x1a6/0x810
  do_fsync+0x33/0x60
  __x64_sys_fsync+0xb/0x10
  do_syscall_64+0x43/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

sit.vblocks and sum valid block count in sit.valid_map may be
inconsistent, segment w/ zero vblocks will be treated as free
segment, while allocating in free segment, we may allocate a
free block, if its bitmap is valid previously, it can cause
kernel crash due to bitmap verification failure.

Anyway, to avoid further serious metadata inconsistence and
corruption, it is necessary and worth to detect SIT
inconsistence. So let's enable check_block_count() to verify
vblocks and valid_map all the time rather than do it only
CONFIG_F2FS_CHECK_FS is enabled.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:19 +02:00
Chao Yu
534ef92237 f2fs: fix to avoid panic in do_recover_data()
[ Upstream commit 22d61e286e2d9097dae36f75ed48801056b77cac ]

As Jungyeon reported in bugzilla:

https://bugzilla.kernel.org/show_bug.cgi?id=203227

- Overview
When mounting the attached crafted image, following errors are reported.
Additionally, it hangs on sync after trying to mount it.

The image is intentionally fuzzed from a normal f2fs image for testing.
Compile options for F2FS are as follows.
CONFIG_F2FS_FS=y
CONFIG_F2FS_STAT_FS=y
CONFIG_F2FS_FS_XATTR=y
CONFIG_F2FS_FS_POSIX_ACL=y
CONFIG_F2FS_CHECK_FS=y

- Reproduces
mkdir test
mount -t f2fs tmp.img test
sync

- Messages
 kernel BUG at fs/f2fs/recovery.c:549!
 RIP: 0010:recover_data+0x167a/0x1780
 Call Trace:
  f2fs_recover_fsync_data+0x613/0x710
  f2fs_fill_super+0x1043/0x1aa0
  mount_bdev+0x16d/0x1a0
  mount_fs+0x4a/0x170
  vfs_kern_mount+0x5d/0x100
  do_mount+0x200/0xcf0
  ksys_mount+0x79/0xc0
  __x64_sys_mount+0x1c/0x20
  do_syscall_64+0x43/0xf0
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

During recovery, if ofs_of_node is inconsistent in between recovered
node page and original checkpointed node page, let's just fail recovery
instead of making kernel panic.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:19 +02:00
Hou Tao
8b9241b052 fs/fat/file.c: issue flush after the writeback of FAT
[ Upstream commit bd8309de0d60838eef6fb575b0c4c7e95841cf73 ]

fsync() needs to make sure the data & meta-data of file are persistent
after the return of fsync(), even when a power-failure occurs later.  In
the case of fat-fs, the FAT belongs to the meta-data of file, so we need
to issue a flush after the writeback of FAT instead before.

Also bail out early when any stage of fsync fails.

Link: http://lkml.kernel.org/r/20190409030158.136316-1-houtao1@huawei.com
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-22 08:18:17 +02:00
Greg Kroah-Hartman
f992814dac This is the 4.4.181 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlz/gVcACgkQONu9yGCS
 aT4kBQ//adwq+iNdyEF550hc8tWZny0dSLPRKflzTb4hPXnzGdImCSY6pO1KdXzK
 IhjtgLb8aeFpDSZyyAw+sqFxY/2Nd9GZ5pgetWedm218uX/Hr9ETRUe+QqfmXKfx
 sIeBfhSSCm2T8HV23SOL+MWqLaHLQFEXWjSDxJAPxB7ptzGiYJ4jmje0MBrN1xV8
 22H5ijDR9SweZoR83AFtDAr9hKnpXz2ciQtJ/0xOjnVPGDQgD2uK3mpaO+F2r1hR
 kbLA2Hst3m4C3mtQZnns/SZWCKURkPk1hFYhKZyD0k757sRcSR4iHnqKdBBk29kR
 lFNfjVsAARCIj1ucYwwBbkiRJfBaCpT6TMphdtgT0f91zVMOCDTuVTN2couGSsJl
 6wWmboyM20SKkHJ3VawvtZ4YcTUjut2B1mZC/iFBSQJsMyVPQkhFzSdAXUKO6VZ9
 ZLrMTXNpPwlkYLL7VluIzdUr5crRmYj9sYIH1A/+pyzfM8WZO779jev/i2E4Eipt
 lU7ak2UMgSEZhv3GWmqPkFnJIpZwHyIsl5bGUWJ2b3wd69VasUjroVxRu1CyynXN
 CeDnqmJGLSoOlFD6/SF3MCqgvuavt3hgF+eKT2gbVti9zwLnxCxkQ7pgWMQpiMZs
 uIECSg9f1Zox/E+RpsyWc6Jx7r5yIkYHTlAyIpMuwgT+zwhWXaY=
 =sf4M
 -----END PGP SIGNATURE-----

Merge 4.4.181 into android-4.4-p

Changes in 4.4.181
	x86/speculation/mds: Revert CPU buffer clear on double fault exit
	x86/speculation/mds: Improve CPU buffer clear documentation
	ARM: exynos: Fix a leaked reference by adding missing of_node_put
	crypto: vmx - fix copy-paste error in CTR mode
	crypto: crct10dif-generic - fix use via crypto_shash_digest()
	crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
	ALSA: usb-audio: Fix a memory leak bug
	ALSA: hda/hdmi - Consider eld_valid when reporting jack event
	ALSA: hda/realtek - EAPD turn on later
	ASoC: max98090: Fix restore of DAPM Muxes
	ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
	mm/mincore.c: make mincore() more conservative
	ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
	mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
	tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
	ext4: actually request zeroing of inode table after grow
	ext4: fix ext4_show_options for file systems w/o journal
	Btrfs: do not start a transaction at iterate_extent_inodes()
	bcache: fix a race between cache register and cacheset unregister
	bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
	ipmi:ssif: compare block number correctly for multi-part return messages
	crypto: gcm - Fix error return code in crypto_gcm_create_common()
	crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
	crypto: chacha20poly1305 - set cra_name correctly
	crypto: salsa20 - don't access already-freed walk.iv
	crypto: arm/aes-neonbs - don't access already-freed walk.iv
	writeback: synchronize sync(2) against cgroup writeback membership switches
	fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
	ext4: zero out the unused memory region in the extent tree block
	ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
	KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
	net: avoid weird emergency message
	net/mlx4_core: Change the error print to info print
	ppp: deflate: Fix possible crash in deflate_init
	tipc: switch order of device registration to fix a crash
	tipc: fix modprobe tipc failed after switch order of device registration
	stm class: Fix channel free in stm output free path
	md: add mddev->pers to avoid potential NULL pointer dereference
	intel_th: msu: Fix single mode with IOMMU
	of: fix clang -Wunsequenced for be32_to_cpu()
	cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
	media: ov6650: Fix sensor possibly not detected on probe
	NFS4: Fix v4.0 client state corruption when mount
	clk: tegra: Fix PLLM programming on Tegra124+ when PMC overrides divider
	fuse: fix writepages on 32bit
	fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
	iommu/tegra-smmu: Fix invalid ASID bits on Tegra30/114
	ceph: flush dirty inodes before proceeding with remount
	tracing: Fix partial reading of trace event's id file
	memory: tegra: Fix integer overflow on tick value calculation
	perf intel-pt: Fix instructions sampling rate
	perf intel-pt: Fix improved sample timestamp
	perf intel-pt: Fix sample timestamp wrt non-taken branches
	fbdev: sm712fb: fix brightness control on reboot, don't set SR30
	fbdev: sm712fb: fix VRAM detection, don't set SR70/71/74/75
	fbdev: sm712fb: fix white screen of death on reboot, don't set CR3B-CR3F
	fbdev: sm712fb: fix boot screen glitch when sm712fb replaces VGA
	fbdev: sm712fb: fix crashes during framebuffer writes by correctly mapping VRAM
	fbdev: sm712fb: fix support for 1024x768-16 mode
	fbdev: sm712fb: use 1024x768 by default on non-MIPS, fix garbled display
	fbdev: sm712fb: fix crashes and garbled display during DPMS modesetting
	PCI: Mark Atheros AR9462 to avoid bus reset
	dm delay: fix a crash when invalid device is specified
	xfrm: policy: Fix out-of-bound array accesses in __xfrm_policy_unlink
	xfrm6_tunnel: Fix potential panic when unloading xfrm6_tunnel module
	vti4: ipip tunnel deregistration fixes.
	xfrm4: Fix uninitialized memory read in _decode_session4
	KVM: arm/arm64: Ensure vcpu target is unset on reset failure
	power: supply: sysfs: prevent endless uevent loop with CONFIG_POWER_SUPPLY_DEBUG
	ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
	perf bench numa: Add define for RUSAGE_THREAD if not present
	Revert "Don't jump to compute_result state from check_result state"
	md/raid: raid5 preserve the writeback action after the parity check
	btrfs: Honour FITRIM range constraints during free space trim
	fbdev: sm712fb: fix memory frequency by avoiding a switch/case fallthrough
	ext4: do not delete unlinked inode from orphan list on failed truncate
	KVM: x86: fix return value for reserved EFER
	bio: fix improper use of smp_mb__before_atomic()
	Revert "scsi: sd: Keep disk read-only when re-reading partition"
	crypto: vmx - CTR: always increment IV as quadword
	gfs2: Fix sign extension bug in gfs2_update_stats
	Btrfs: fix race between ranged fsync and writeback of adjacent ranges
	btrfs: sysfs: don't leak memory when failing add fsid
	fbdev: fix divide error in fb_var_to_videomode
	hugetlb: use same fault hash key for shared and private mappings
	fbdev: fix WARNING in __alloc_pages_nodemask bug
	media: cpia2: Fix use-after-free in cpia2_exit
	media: vivid: use vfree() instead of kfree() for dev->bitmap_cap
	ssb: Fix possible NULL pointer dereference in ssb_host_pcmcia_exit
	at76c50x-usb: Don't register led_trigger if usb_register_driver failed
	perf tools: No need to include bitops.h in util.h
	tools include: Adopt linux/bits.h
	gfs2: Fix lru_count going negative
	cxgb4: Fix error path in cxgb4_init_module
	mmc: core: Verify SD bus width
	powerpc/boot: Fix missing check of lseek() return value
	ASoC: imx: fix fiq dependencies
	spi: pxa2xx: fix SCR (divisor) calculation
	brcm80211: potential NULL dereference in brcmf_cfg80211_vndr_cmds_dcmd_handler()
	rtc: 88pm860x: prevent use-after-free on device remove
	w1: fix the resume command API
	dmaengine: pl330: _stop: clear interrupt status
	mac80211/cfg80211: update bss channel on channel switch
	ASoC: fsl_sai: Update is_slave_mode with correct value
	mwifiex: prevent an array overflow
	net: cw1200: fix a NULL pointer dereference
	bcache: return error immediately in bch_journal_replay()
	bcache: fix failure in journal relplay
	bcache: add failure check to run_cache_set() for journal replay
	bcache: avoid clang -Wunintialized warning
	x86/build: Move _etext to actual end of .text
	smpboot: Place the __percpu annotation correctly
	x86/mm: Remove in_nmi() warning from 64-bit implementation of vmalloc_fault()
	mm/uaccess: Use 'unsigned long' to placate UBSAN warnings on older GCC versions
	HID: logitech-hidpp: use RAP instead of FAP to get the protocol version
	pinctrl: pistachio: fix leaked of_node references
	dmaengine: at_xdmac: remove BUG_ON macro in tasklet
	media: coda: clear error return value before picture run
	media: ov6650: Move v4l2_clk_get() to ov6650_video_probe() helper
	media: au0828: stop video streaming only when last user stops
	media: ov2659: make S_FMT succeed even if requested format doesn't match
	audit: fix a memory leak bug
	media: au0828: Fix NULL pointer dereference in au0828_analog_stream_enable()
	media: pvrusb2: Prevent a buffer overflow
	powerpc/numa: improve control of topology updates
	sched/core: Check quota and period overflow at usec to nsec conversion
	sched/core: Handle overflow in cpu_shares_write_u64
	USB: core: Don't unbind interfaces following device reset failure
	x86/irq/64: Limit IST stack overflow check to #DB stack
	i40e: don't allow changes to HW VLAN stripping on active port VLANs
	RDMA/cxgb4: Fix null pointer dereference on alloc_skb failure
	hwmon: (vt1211) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47m1) Use request_muxed_region for Super-IO accesses
	hwmon: (smsc47b397) Use request_muxed_region for Super-IO accesses
	hwmon: (pc87427) Use request_muxed_region for Super-IO accesses
	hwmon: (f71805f) Use request_muxed_region for Super-IO accesses
	scsi: libsas: Do discovery on empty PHY to update PHY info
	mmc_spi: add a status check for spi_sync_locked
	mmc: sdhci-of-esdhc: add erratum eSDHC5 support
	mmc: sdhci-of-esdhc: add erratum eSDHC-A001 and A-008358 support
	PM / core: Propagate dev->power.wakeup_path when no callbacks
	extcon: arizona: Disable mic detect if running when driver is removed
	s390: cio: fix cio_irb declaration
	cpufreq: ppc_cbe: fix possible object reference leak
	cpufreq/pasemi: fix possible object reference leak
	cpufreq: pmac32: fix possible object reference leak
	x86/build: Keep local relocations with ld.lld
	iio: ad_sigma_delta: Properly handle SPI bus locking vs CS assertion
	iio: hmc5843: fix potential NULL pointer dereferences
	iio: common: ssp_sensors: Initialize calculated_time in ssp_common_process_data
	rtlwifi: fix a potential NULL pointer dereference
	brcmfmac: fix missing checks for kmemdup
	b43: shut up clang -Wuninitialized variable warning
	brcmfmac: convert dev_init_lock mutex to completion
	brcmfmac: fix race during disconnect when USB completion is in progress
	scsi: ufs: Fix regulator load and icc-level configuration
	scsi: ufs: Avoid configuring regulator with undefined voltage range
	arm64: cpu_ops: fix a leaked reference by adding missing of_node_put
	x86/ia32: Fix ia32_restore_sigcontext() AC leak
	chardev: add additional check for minor range overlap
	HID: core: move Usage Page concatenation to Main item
	ASoC: eukrea-tlv320: fix a leaked reference by adding missing of_node_put
	ASoC: fsl_utils: fix a leaked reference by adding missing of_node_put
	cxgb3/l2t: Fix undefined behaviour
	spi: tegra114: reset controller on probe
	media: wl128x: prevent two potential buffer overflows
	virtio_console: initialize vtermno value for ports
	tty: ipwireless: fix missing checks for ioremap
	rcutorture: Fix cleanup path for invalid torture_type strings
	usb: core: Add PM runtime calls to usb_hcd_platform_shutdown
	scsi: qla4xxx: avoid freeing unallocated dma memory
	media: m88ds3103: serialize reset messages in m88ds3103_set_frontend
	media: go7007: avoid clang frame overflow warning with KASAN
	media: saa7146: avoid high stack usage with clang
	scsi: lpfc: Fix SLI3 commands being issued on SLI4 devices
	spi : spi-topcliff-pch: Fix to handle empty DMA buffers
	spi: rspi: Fix sequencer reset during initialization
	spi: Fix zero length xfer bug
	ASoC: davinci-mcasp: Fix clang warning without CONFIG_PM
	ipv6: Consider sk_bound_dev_if when binding a raw socket to an address
	llc: fix skb leak in llc_build_and_send_ui_pkt()
	net-gro: fix use-after-free read in napi_gro_frags()
	net: stmmac: fix reset gpio free missing
	usbnet: fix kernel crash after disconnect
	tipc: Avoid copying bytes beyond the supplied data
	bnxt_en: Fix aggregation buffer leak under OOM condition.
	net: mvpp2: fix bad MVPP2_TXQ_SCHED_TOKEN_CNTR_REG queue value
	crypto: vmx - ghash: do nosimd fallback manually
	xen/pciback: Don't disable PCI_COMMAND on PCI device reset.
	Revert "tipc: fix modprobe tipc failed after switch order of device registration"
	tipc: fix modprobe tipc failed after switch order of device registration -v2
	sparc64: Fix regression in non-hypervisor TLB flush xcall
	include/linux/bitops.h: sanitize rotate primitives
	xhci: Convert xhci_handshake() to use readl_poll_timeout_atomic()
	usb: xhci: avoid null pointer deref when bos field is NULL
	USB: Fix slab-out-of-bounds write in usb_get_bos_descriptor
	USB: sisusbvga: fix oops in error path of sisusb_probe
	USB: Add LPM quirk for Surface Dock GigE adapter
	USB: rio500: refuse more than one device at a time
	USB: rio500: fix memory leak in close after disconnect
	media: usb: siano: Fix general protection fault in smsusb
	media: usb: siano: Fix false-positive "uninitialized variable" warning
	media: smsusb: better handle optional alignment
	scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
	scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
	Btrfs: fix race updating log root item during fsync
	ALSA: hda/realtek - Set default power save node to 0
	drm/nouveau/i2c: Disable i2c bus access after ->fini()
	tty: serial: msm_serial: Fix XON/XOFF
	tty: max310x: Fix external crystal register setup
	memcg: make it work on sparse non-0-node systems
	kernel/signal.c: trace_signal_deliver when signal_group_exit
	CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
	binder: Replace "%p" with "%pK" for stable
	binder: replace "%p" with "%pK"
	net: create skb_gso_validate_mac_len()
	bnx2x: disable GSO where gso_size is too big for hardware
	brcmfmac: Add length checks on firmware events
	brcmfmac: screening firmware event packet
	brcmfmac: revise handling events in receive path
	brcmfmac: fix incorrect event channel deduction
	brcmfmac: add length checks in scheduled scan result handler
	brcmfmac: add subtype check for event handling in data path
	userfaultfd: don't pin the user memory in userfaultfd_file_create()
	Revert "x86/build: Move _etext to actual end of .text"
	net: cdc_ncm: GetNtbFormat endian fix
	usb: gadget: fix request length error for isoc transfer
	media: uvcvideo: Fix uvc_alloc_entity() allocation alignment
	ethtool: fix potential userspace buffer overflow
	neighbor: Call __ipv4_neigh_lookup_noref in neigh_xmit
	net/mlx4_en: ethtool, Remove unsupported SFP EEPROM high pages query
	net: rds: fix memory leak in rds_ib_flush_mr_pool
	pktgen: do not sleep with the thread lock held.
	rcu: locking and unlocking need to always be at least barriers
	parisc: Use implicit space register selection for loading the coherence index of I/O pdirs
	fuse: fallocate: fix return with locked inode
	MIPS: pistachio: Build uImage.gz by default
	genwqe: Prevent an integer overflow in the ioctl
	drm/gma500/cdv: Check vbt config bits when detecting lvds panels
	fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
	fuse: Add FOPEN_STREAM to use stream_open()
	ipv4: Define __ipv4_neigh_lookup_noref when CONFIG_INET is disabled
	ethtool: check the return value of get_regs_len
	Linux 4.4.181

Change-Id: I0c9e7effbb6bd5d1978b4ffad3db3b76af6692bc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-06-11 14:23:58 +02:00
Kirill Smelkov
c9696a8f3e fuse: Add FOPEN_STREAM to use stream_open()
commit bbd84f33652f852ce5992d65db4d020aba21f882 upstream.

Starting from commit 9c225f2655 ("vfs: atomic f_pos accesses as per
POSIX") files opened even via nonseekable_open gate read and write via lock
and do not allow them to be run simultaneously. This can create read vs
write deadlock if a filesystem is trying to implement a socket-like file
which is intended to be simultaneously used for both read and write from
filesystem client.  See commit 10dce8af3422 ("fs: stream_open - opener for
stream-like files so that read and write can run simultaneously without
deadlock") for details and e.g. commit 581d21a2d02a ("xenbus: fix deadlock
on writes to /proc/xen/xenbus") for a similar deadlock example on
/proc/xen/xenbus.

To avoid such deadlock it was tempting to adjust fuse_finish_open to use
stream_open instead of nonseekable_open on just FOPEN_NONSEEKABLE flags,
but grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
and in particular GVFS which actually uses offset in its read and write
handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

so if we would do such a change it will break a real user.

Add another flag (FOPEN_STREAM) for filesystem servers to indicate that the
opened handler is having stream-like semantics; does not use file position
and thus the kernel is free to issue simultaneous read and write request on
opened file handle.

This patch together with stream_open() should be added to stable kernels
starting from v3.14+. This will allow to patch OSSPD and other FUSE
filesystems that provide stream-like files to return FOPEN_STREAM |
FOPEN_NONSEEKABLE in open handler and this way avoid the deadlock on all
kernel versions. This should work because fuse_finish_open ignores unknown
open flags returned from a filesystem and so passing FOPEN_STREAM to a
kernel that is not aware of this flag cannot hurt. In turn the kernel that
is not aware of FOPEN_STREAM will be < v3.14 where just FOPEN_NONSEEKABLE
is sufficient to implement streams without read vs write deadlock.

Cc: stable@vger.kernel.org # v3.14+
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:13 +02:00
Kirill Smelkov
3bf0c45961 fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock
commit 10dce8af34226d90fa56746a934f8da5dcdba3df upstream.

Commit 9c225f2655 ("vfs: atomic f_pos accesses as per POSIX") added
locking for file.f_pos access and in particular made concurrent read and
write not possible - now both those functions take f_pos lock for the
whole run, and so if e.g. a read is blocked waiting for data, write will
deadlock waiting for that read to complete.

This caused regression for stream-like files where previously read and
write could run simultaneously, but after that patch could not do so
anymore. See e.g. commit 581d21a2d02a ("xenbus: fix deadlock on writes
to /proc/xen/xenbus") which fixes such regression for particular case of
/proc/xen/xenbus.

The patch that added f_pos lock in 2014 did so to guarantee POSIX thread
safety for read/write/lseek and added the locking to file descriptors of
all regular files. In 2014 that thread-safety problem was not new as it
was already discussed earlier in 2006.

However even though 2006'th version of Linus's patch was adding f_pos
locking "only for files that are marked seekable with FMODE_LSEEK (thus
avoiding the stream-like objects like pipes and sockets)", the 2014
version - the one that actually made it into the tree as 9c225f2655 -
is doing so irregardless of whether a file is seekable or not.

See

    https://lore.kernel.org/lkml/53022DB1.4070805@gmail.com/
    https://lwn.net/Articles/180387
    https://lwn.net/Articles/180396

for historic context.

The reason that it did so is, probably, that there are many files that
are marked non-seekable, but e.g. their read implementation actually
depends on knowing current position to correctly handle the read. Some
examples:

	kernel/power/user.c		snapshot_read
	fs/debugfs/file.c		u32_array_read
	fs/fuse/control.c		fuse_conn_waiting_read + ...
	drivers/hwmon/asus_atk0110.c	atk_debugfs_ggrp_read
	arch/s390/hypfs/inode.c		hypfs_read_iter
	...

Despite that, many nonseekable_open users implement read and write with
pure stream semantics - they don't depend on passed ppos at all. And for
those cases where read could wait for something inside, it creates a
situation similar to xenbus - the write could be never made to go until
read is done, and read is waiting for some, potentially external, event,
for potentially unbounded time -> deadlock.

Besides xenbus, there are 14 such places in the kernel that I've found
with semantic patch (see below):

	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write()
	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write()
	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write()
	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write()
	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write()
	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write()
	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write()
	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write()
	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write()
	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write()
	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write()
	drivers/input/misc/uinput.c:400:1-17: ERROR: uinput_fops: .read() can deadlock .write()
	drivers/infiniband/core/user_mad.c:985:7-23: ERROR: umad_fops: .read() can deadlock .write()
	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write()

In addition to the cases above another regression caused by f_pos
locking is that now FUSE filesystems that implement open with
FOPEN_NONSEEKABLE flag, can no longer implement bidirectional
stream-like files - for the same reason as above e.g. read can deadlock
write locking on file.f_pos in the kernel.

FUSE's FOPEN_NONSEEKABLE was added in 2008 in a7c1b990f7 ("fuse:
implement nonseekable open") to support OSSPD. OSSPD implements /dev/dsp
in userspace with FOPEN_NONSEEKABLE flag, with corresponding read and
write routines not depending on current position at all, and with both
read and write being potentially blocking operations:

See

    https://github.com/libfuse/osspd
    https://lwn.net/Articles/308445

    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1406
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1438-L1477
    https://github.com/libfuse/osspd/blob/14a9cff0/osspd.c#L1479-L1510

Corresponding libfuse example/test also describes FOPEN_NONSEEKABLE as
"somewhat pipe-like files ..." with read handler not using offset.
However that test implements only read without write and cannot exercise
the deadlock scenario:

    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L124-L131
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L146-L163
    https://github.com/libfuse/libfuse/blob/fuse-3.4.2-3-ga1bff7d/example/poll.c#L209-L216

I've actually hit the read vs write deadlock for real while implementing
my FUSE filesystem where there is /head/watch file, for which open
creates separate bidirectional socket-like stream in between filesystem
and its user with both read and write being later performed
simultaneously. And there it is semantically not easy to split the
stream into two separate read-only and write-only channels:

    https://lab.nexedi.com/kirr/wendelin.core/blob/f13aa600/wcfs/wcfs.go#L88-169

Let's fix this regression. The plan is:

1. We can't change nonseekable_open to include &~FMODE_ATOMIC_POS -
   doing so would break many in-kernel nonseekable_open users which
   actually use ppos in read/write handlers.

2. Add stream_open() to kernel to open stream-like non-seekable file
   descriptors. Read and write on such file descriptors would never use
   nor change ppos. And with that property on stream-like files read and
   write will be running without taking f_pos lock - i.e. read and write
   could be running simultaneously.

3. With semantic patch search and convert to stream_open all in-kernel
   nonseekable_open users for which read and write actually do not
   depend on ppos and where there is no other methods in file_operations
   which assume @offset access.

4. Add FOPEN_STREAM to fs/fuse/ and open in-kernel file-descriptors via
   steam_open if that bit is present in filesystem open reply.

   It was tempting to change fs/fuse/ open handler to use stream_open
   instead of nonseekable_open on just FOPEN_NONSEEKABLE flags, but
   grepping through Debian codesearch shows users of FOPEN_NONSEEKABLE,
   and in particular GVFS which actually uses offset in its read and
   write handlers

	https://codesearch.debian.net/search?q=-%3Enonseekable+%3D
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1080
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1247-1346
	https://gitlab.gnome.org/GNOME/gvfs/blob/1.40.0-6-gcbc54396/client/gvfsfusedaemon.c#L1399-1481

   so if we would do such a change it will break a real user.

5. Add stream_open and FOPEN_STREAM handling to stable kernels starting
   from v3.14+ (the kernel where 9c225f2655 first appeared).

   This will allow to patch OSSPD and other FUSE filesystems that
   provide stream-like files to return FOPEN_STREAM | FOPEN_NONSEEKABLE
   in their open handler and this way avoid the deadlock on all kernel
   versions. This should work because fs/fuse/ ignores unknown open
   flags returned from a filesystem and so passing FOPEN_STREAM to a
   kernel that is not aware of this flag cannot hurt. In turn the kernel
   that is not aware of FOPEN_STREAM will be < v3.14 where just
   FOPEN_NONSEEKABLE is sufficient to implement streams without read vs
   write deadlock.

This patch adds stream_open, converts /proc/xen/xenbus to it and adds
semantic patch to automatically locate in-kernel places that are either
required to be converted due to read vs write deadlock, or that are just
safe to be converted because read and write do not use ppos and there
are no other funky methods in file_operations.

Regarding semantic patch I've verified each generated change manually -
that it is correct to convert - and each other nonseekable_open instance
left - that it is either not correct to convert there, or that it is not
converted due to current stream_open.cocci limitations.

The script also does not convert files that should be valid to convert,
but that currently have .llseek = noop_llseek or generic_file_llseek for
unknown reason despite file being opened with nonseekable_open (e.g.
drivers/input/mousedev.c)

Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Yongzhi Pan <panyongzhi@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Julia Lawall <Julia.Lawall@lip6.fr>
Cc: Nikolaus Rath <Nikolaus@rath.org>
Cc: Han-Wen Nienhuys <hanwen@google.com>
[ backport to 4.4: actually fixed deadlock on /proc/xen/xenbus as 581d21a2d02a was not backported to 4.4 ]
Signed-off-by: Kirill Smelkov <kirr@nexedi.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:13 +02:00
Miklos Szeredi
8061c23f53 fuse: fallocate: fix return with locked inode
commit 35d6fcbb7c3e296a52136347346a698a35af3fda upstream.

Do the proper cleanup in case the size check fails.

Tested with xfstests:generic/228

Reported-by: kbuild test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 0cbade024ba5 ("fuse: honor RLIMIT_FSIZE in fuse_file_fallocate")
Cc: Liu Bo <bo.liu@linux.alibaba.com>
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:13 +02:00
Oleg Nesterov
6ad730b831 userfaultfd: don't pin the user memory in userfaultfd_file_create()
commit d2005e3f41d4f9299e2df6a967c8beb5086967a9 upstream.

userfaultfd_file_create() increments mm->mm_users; this means that the
memory won't be unmapped/freed if mm owner exits/execs, and UFFDIO_COPY
after that can populate the orphaned mm more.

Change userfaultfd_file_create() and userfaultfd_ctx_put() to use
mm->mm_count to pin mm_struct.  This means that
atomic_inc_not_zero(mm->mm_users) is needed when we are going to
actually play with this memory.  Except handle_userfault() path doesn't
need this, the caller must already have a reference.

The patch adds the new trivial helper, mmget_not_zero(), it can have
more users.

Link: http://lkml.kernel.org/r/20160516172254.GA8595@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:11 +02:00
Roberto Bergantinos Corpas
336c166217 CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM
commit 31fad7d41e73731f05b8053d17078638cf850fa6 upstream.

 In cifs_read_allocate_pages, in case of ENOMEM, we go through
whole rdata->pages array but we have failed the allocation before
nr_pages, therefore we may end up calling put_page with NULL
pointer, causing oops

Signed-off-by: Roberto Bergantinos Corpas <rbergant@redhat.com>
Acked-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:10 +02:00
Filipe Manana
494447b90d Btrfs: fix race updating log root item during fsync
commit 06989c799f04810f6876900d4760c0edda369cf7 upstream.

When syncing the log, the final phase of a fsync operation, we need to
either create a log root's item or update the existing item in the log
tree of log roots, and that depends on the current value of the log
root's log_transid - if it's 1 we need to create the log root item,
otherwise it must exist already and we update it. Since there is no
synchronization between updating the log_transid and checking it for
deciding whether the log root's item needs to be created or updated, we
end up with a tiny race window that results in attempts to update the
item to fail because the item was not yet created:

              CPU 1                                    CPU 2

  btrfs_sync_log()

    lock root->log_mutex

    set log root's log_transid to 1

    unlock root->log_mutex

                                               btrfs_sync_log()

                                                 lock root->log_mutex

                                                 sets log root's
                                                 log_transid to 2

                                                 unlock root->log_mutex

    update_log_root()

      sees log root's log_transid
      with a value of 2

        calls btrfs_update_root(),
        which fails with -EUCLEAN
        and causes transaction abort

Until recently the race lead to a BUG_ON at btrfs_update_root(), but after
the recent commit 7ac1e464c4d47 ("btrfs: Don't panic when we can't find a
root key") we just abort the current transaction.

A sample trace of the BUG_ON() on a SLE12 kernel:

  ------------[ cut here ]------------
  kernel BUG at ../fs/btrfs/root-tree.c:157!
  Oops: Exception in kernel mode, sig: 5 [#1]
  SMP NR_CPUS=2048 NUMA pSeries
  (...)
  Supported: Yes, External
  CPU: 78 PID: 76303 Comm: rtas_errd Tainted: G                 X 4.4.156-94.57-default #1
  task: c00000ffa906d010 ti: c00000ff42b08000 task.ti: c00000ff42b08000
  NIP: d000000036ae5cdc LR: d000000036ae5cd8 CTR: 0000000000000000
  REGS: c00000ff42b0b860 TRAP: 0700   Tainted: G                 X  (4.4.156-94.57-default)
  MSR: 8000000002029033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 22444484  XER: 20000000
  CFAR: d000000036aba66c SOFTE: 1
  GPR00: d000000036ae5cd8 c00000ff42b0bae0 d000000036bda220 0000000000000054
  GPR04: 0000000000000001 0000000000000000 c00007ffff8d37c8 0000000000000000
  GPR08: c000000000e19c00 0000000000000000 0000000000000000 3736343438312079
  GPR12: 3930373337303434 c000000007a3a800 00000000007fffff 0000000000000023
  GPR16: c00000ffa9d26028 c00000ffa9d261f8 0000000000000010 c00000ffa9d2ab28
  GPR20: c00000ff42b0bc48 0000000000000001 c00000ff9f0d9888 0000000000000001
  GPR24: c00000ffa9d26000 c00000ffa9d261e8 c00000ffa9d2a800 c00000ff9f0d9888
  GPR28: c00000ffa9d26028 c00000ffa9d2aa98 0000000000000001 c00000ffa98f5b20
  NIP [d000000036ae5cdc] btrfs_update_root+0x25c/0x4e0 [btrfs]
  LR [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs]
  Call Trace:
  [c00000ff42b0bae0] [d000000036ae5cd8] btrfs_update_root+0x258/0x4e0 [btrfs] (unreliable)
  [c00000ff42b0bba0] [d000000036b53610] btrfs_sync_log+0x2d0/0xc60 [btrfs]
  [c00000ff42b0bce0] [d000000036b1785c] btrfs_sync_file+0x44c/0x4e0 [btrfs]
  [c00000ff42b0bd80] [c00000000032e300] vfs_fsync_range+0x70/0x120
  [c00000ff42b0bdd0] [c00000000032e44c] do_fsync+0x5c/0xb0
  [c00000ff42b0be10] [c00000000032e8dc] SyS_fdatasync+0x2c/0x40
  [c00000ff42b0be30] [c000000000009488] system_call+0x3c/0x100
  Instruction dump:
  7f43d378 4bffebb9 60000000 88d90008 3d220000 e8b90000 3b390009 e87a01f0
  e8898e08 e8f90000 4bfd48e5 60000000 <0fe00000> e95b0060 39200004 394a0ea0
  ---[ end trace 8f2dc8f919cabab8 ]---

So fix this by doing the check of log_transid and updating or creating the
log root's item while holding the root's log_mutex.

Fixes: 7237f18336 ("Btrfs: fix tree logs parallel sync")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:24:09 +02:00
Chengguang Xu
cb7872f128 chardev: add additional check for minor range overlap
[ Upstream commit de36e16d1557a0b6eb328bc3516359a12ba5c25c ]

Current overlap checking cannot correctly handle
a case which is baseminor < existing baseminor &&
baseminor + minorct > existing baseminor + minorct.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11 12:24:03 +02:00
Ross Lagerwall
6948c6bc17 gfs2: Fix lru_count going negative
[ Upstream commit 7881ef3f33bb80f459ea6020d1e021fc524a6348 ]

Under certain conditions, lru_count may drop below zero resulting in
a large amount of log spam like this:

vmscan: shrink_slab: gfs2_dump_glock+0x3b0/0x630 [gfs2] \
    negative objects to delete nr=-1

This happens as follows:
1) A glock is moved from lru_list to the dispose list and lru_count is
   decremented.
2) The dispose function calls cond_resched() and drops the lru lock.
3) Another thread takes the lru lock and tries to add the same glock to
   lru_list, checking if the glock is on an lru list.
4) It is on a list (actually the dispose list) and so it avoids
   incrementing lru_count.
5) The glock is moved to lru_list.
5) The original thread doesn't dispose it because it has been re-added
   to the lru list but the lru_count has still decreased by one.

Fix by checking if the LRU flag is set on the glock rather than checking
if the glock is on some list and rearrange the code so that the LRU flag
is added/removed precisely when the glock is added/removed from lru_list.

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11 12:23:53 +02:00
Mike Kravetz
bf8474c648 hugetlb: use same fault hash key for shared and private mappings
commit 1b426bac66e6cc83c9f2d92b96e4e72acf43419a upstream.

hugetlb uses a fault mutex hash table to prevent page faults of the
same pages concurrently.  The key for shared and private mappings is
different.  Shared keys off address_space and file index.  Private keys
off mm and virtual address.  Consider a private mappings of a populated
hugetlbfs file.  A fault will map the page from the file and if needed
do a COW to map a writable page.

Hugetlbfs hole punch uses the fault mutex to prevent mappings of file
pages.  It uses the address_space file index key.  However, private
mappings will use a different key and could race with this code to map
the file page.  This causes problems (BUG) for the page cache remove
code as it expects the page to be unmapped.  A sample stack is:

page dumped because: VM_BUG_ON_PAGE(page_mapped(page))
kernel BUG at mm/filemap.c:169!
...
RIP: 0010:unaccount_page_cache_page+0x1b8/0x200
...
Call Trace:
__delete_from_page_cache+0x39/0x220
delete_from_page_cache+0x45/0x70
remove_inode_hugepages+0x13c/0x380
? __add_to_page_cache_locked+0x162/0x380
hugetlbfs_fallocate+0x403/0x540
? _cond_resched+0x15/0x30
? __inode_security_revalidate+0x5d/0x70
? selinux_file_permission+0x100/0x130
vfs_fallocate+0x13f/0x270
ksys_fallocate+0x3c/0x80
__x64_sys_fallocate+0x1a/0x20
do_syscall_64+0x5b/0x180
entry_SYSCALL_64_after_hwframe+0x44/0xa9

There seems to be another potential COW issue/race with this approach
of different private and shared keys as noted in commit 8382d914eb
("mm, hugetlb: improve page-fault scalability").

Since every hugetlb mapping (even anon and private) is actually a file
mapping, just use the address_space index key for all mappings.  This
results in potentially more hash collisions.  However, this should not
be the common case.

Link: http://lkml.kernel.org/r/20190328234704.27083-3-mike.kravetz@oracle.com
Link: http://lkml.kernel.org/r/20190412165235.t4sscoujczfhuiyt@linux-r8p5
Fixes: b5cec28d36 ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:52 +02:00
Tobin C. Harding
5c9a20390c btrfs: sysfs: don't leak memory when failing add fsid
commit e32773357d5cc271b1d23550b3ed026eb5c2a468 upstream.

A failed call to kobject_init_and_add() must be followed by a call to
kobject_put().  Currently in the error path when adding fs_devices we
are missing this call.  This could be fixed by calling
btrfs_sysfs_remove_fsid() if btrfs_sysfs_add_fsid() returns an error or
by adding a call to kobject_put() directly in btrfs_sysfs_add_fsid().
Here we choose the second option because it prevents the slightly
unusual error path handling requirements of kobject from leaking out
into btrfs functions.

Add a call to kobject_put() in the error path of kobject_add_and_init().
This causes the release method to be called if kobject_init_and_add()
fails.  open_tree() is the function that calls btrfs_sysfs_add_fsid()
and the error code in this function is already written with the
assumption that the release method is called during the error path of
open_tree() (as seen by the call to btrfs_sysfs_remove_fsid() under the
fail_fsdev_sysfs label).

Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tobin C. Harding <tobin@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:52 +02:00
Filipe Manana
0fa88718cd Btrfs: fix race between ranged fsync and writeback of adjacent ranges
commit 0c713cbab6200b0ab6473b50435e450a6e1de85d upstream.

When we do a full fsync (the bit BTRFS_INODE_NEEDS_FULL_SYNC is set in the
inode) that happens to be ranged, which happens during a msync() or writes
for files opened with O_SYNC for example, we can end up with a corrupt log,
due to different file extent items representing ranges that overlap with
each other, or hit some assertion failures.

When doing a ranged fsync we only flush delalloc and wait for ordered
exents within that range. If while we are logging items from our inode
ordered extents for adjacent ranges complete, we end up in a race that can
make us insert the file extent items that overlap with others we logged
previously and the assertion failures.

For example, if tree-log.c:copy_items() receives a leaf that has the
following file extents items, all with a length of 4K and therefore there
is an implicit hole in the range 68K to 72K - 1:

  (257 EXTENT_ITEM 64K), (257 EXTENT_ITEM 72K), (257 EXTENT_ITEM 76K), ...

It copies them to the log tree. However due to the need to detect implicit
holes, it may release the path, in order to look at the previous leaf to
detect an implicit hole, and then later it will search again in the tree
for the first file extent item key, with the goal of locking again the
leaf (which might have changed due to concurrent changes to other inodes).

However when it locks again the leaf containing the first key, the key
corresponding to the extent at offset 72K may not be there anymore since
there is an ordered extent for that range that is finishing (that is,
somewhere in the middle of btrfs_finish_ordered_io()), and it just
removed the file extent item but has not yet replaced it with a new file
extent item, so the part of copy_items() that does hole detection will
decide that there is a hole in the range starting from 68K to 76K - 1,
and therefore insert a file extent item to represent that hole, having
a key offset of 68K. After that we now have a log tree with 2 different
extent items that have overlapping ranges:

 1) The file extent item copied before copy_items() released the path,
    which has a key offset of 72K and a length of 4K, representing the
    file range 72K to 76K - 1.

 2) And a file extent item representing a hole that has a key offset of
    68K and a length of 8K, representing the range 68K to 76K - 1. This
    item was inserted after releasing the path, and overlaps with the
    extent item inserted before.

The overlapping extent items can cause all sorts of unpredictable and
incorrect behaviour, either when replayed or if a fast (non full) fsync
happens later, which can trigger a BUG_ON() when calling
btrfs_set_item_key_safe() through __btrfs_drop_extents(), producing a
trace like the following:

  [61666.783269] ------------[ cut here ]------------
  [61666.783943] kernel BUG at fs/btrfs/ctree.c:3182!
  [61666.784644] invalid opcode: 0000 [#1] PREEMPT SMP
  (...)
  [61666.786253] task: ffff880117b88c40 task.stack: ffffc90008168000
  [61666.786253] RIP: 0010:btrfs_set_item_key_safe+0x7c/0xd2 [btrfs]
  [61666.786253] RSP: 0018:ffffc9000816b958 EFLAGS: 00010246
  [61666.786253] RAX: 0000000000000000 RBX: 000000000000000f RCX: 0000000000030000
  [61666.786253] RDX: 0000000000000000 RSI: ffffc9000816ba4f RDI: ffffc9000816b937
  [61666.786253] RBP: ffffc9000816b998 R08: ffff88011dae2428 R09: 0000000000001000
  [61666.786253] R10: 0000160000000000 R11: 6db6db6db6db6db7 R12: ffff88011dae2418
  [61666.786253] R13: ffffc9000816ba4f R14: ffff8801e10c4118 R15: ffff8801e715c000
  [61666.786253] FS:  00007f6060a18700(0000) GS:ffff88023f5c0000(0000) knlGS:0000000000000000
  [61666.786253] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [61666.786253] CR2: 00007f6060a28000 CR3: 0000000213e69000 CR4: 00000000000006e0
  [61666.786253] Call Trace:
  [61666.786253]  __btrfs_drop_extents+0x5e3/0xaad [btrfs]
  [61666.786253]  ? time_hardirqs_on+0x9/0x14
  [61666.786253]  btrfs_log_changed_extents+0x294/0x4e0 [btrfs]
  [61666.786253]  ? release_extent_buffer+0x38/0xb4 [btrfs]
  [61666.786253]  btrfs_log_inode+0xb6e/0xcdc [btrfs]
  [61666.786253]  ? lock_acquire+0x131/0x1c5
  [61666.786253]  ? btrfs_log_inode_parent+0xee/0x659 [btrfs]
  [61666.786253]  ? arch_local_irq_save+0x9/0xc
  [61666.786253]  ? btrfs_log_inode_parent+0x1f5/0x659 [btrfs]
  [61666.786253]  btrfs_log_inode_parent+0x223/0x659 [btrfs]
  [61666.786253]  ? arch_local_irq_save+0x9/0xc
  [61666.786253]  ? lockref_get_not_zero+0x2c/0x34
  [61666.786253]  ? rcu_read_unlock+0x3e/0x5d
  [61666.786253]  btrfs_log_dentry_safe+0x60/0x7b [btrfs]
  [61666.786253]  btrfs_sync_file+0x317/0x42c [btrfs]
  [61666.786253]  vfs_fsync_range+0x8c/0x9e
  [61666.786253]  SyS_msync+0x13c/0x1c9
  [61666.786253]  entry_SYSCALL_64_fastpath+0x18/0xad

A sample of a corrupt log tree leaf with overlapping extents I got from
running btrfs/072:

      item 14 key (295 108 200704) itemoff 2599 itemsize 53
              extent data disk bytenr 0 nr 0
              extent data offset 0 nr 458752 ram 458752
      item 15 key (295 108 659456) itemoff 2546 itemsize 53
              extent data disk bytenr 4343541760 nr 770048
              extent data offset 606208 nr 163840 ram 770048
      item 16 key (295 108 663552) itemoff 2493 itemsize 53
              extent data disk bytenr 4343541760 nr 770048
              extent data offset 610304 nr 155648 ram 770048
      item 17 key (295 108 819200) itemoff 2440 itemsize 53
              extent data disk bytenr 4334788608 nr 4096
              extent data offset 0 nr 4096 ram 4096

The file extent item at offset 659456 (item 15) ends at offset 823296
(659456 + 163840) while the next file extent item (item 16) starts at
offset 663552.

Another different problem that the race can trigger is a failure in the
assertions at tree-log.c:copy_items(), which expect that the first file
extent item key we found before releasing the path exists after we have
released path and that the last key we found before releasing the path
also exists after releasing the path:

  $ cat -n fs/btrfs/tree-log.c
  4080          if (need_find_last_extent) {
  4081                  /* btrfs_prev_leaf could return 1 without releasing the path */
  4082                  btrfs_release_path(src_path);
  4083                  ret = btrfs_search_slot(NULL, inode->root, &first_key,
  4084                                  src_path, 0, 0);
  4085                  if (ret < 0)
  4086                          return ret;
  4087                  ASSERT(ret == 0);
  (...)
  4103                  if (i >= btrfs_header_nritems(src_path->nodes[0])) {
  4104                          ret = btrfs_next_leaf(inode->root, src_path);
  4105                          if (ret < 0)
  4106                                  return ret;
  4107                          ASSERT(ret == 0);
  4108                          src = src_path->nodes[0];
  4109                          i = 0;
  4110                          need_find_last_extent = true;
  4111                  }
  (...)

The second assertion implicitly expects that the last key before the path
release still exists, because the surrounding while loop only stops after
we have found that key. When this assertion fails it produces a stack like
this:

  [139590.037075] assertion failed: ret == 0, file: fs/btrfs/tree-log.c, line: 4107
  [139590.037406] ------------[ cut here ]------------
  [139590.037707] kernel BUG at fs/btrfs/ctree.h:3546!
  [139590.038034] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
  [139590.038340] CPU: 1 PID: 31841 Comm: fsstress Tainted: G        W         5.0.0-btrfs-next-46 #1
  (...)
  [139590.039354] RIP: 0010:assfail.constprop.24+0x18/0x1a [btrfs]
  (...)
  [139590.040397] RSP: 0018:ffffa27f48f2b9b0 EFLAGS: 00010282
  [139590.040730] RAX: 0000000000000041 RBX: ffff897c635d92c8 RCX: 0000000000000000
  [139590.041105] RDX: 0000000000000000 RSI: ffff897d36a96868 RDI: ffff897d36a96868
  [139590.041470] RBP: ffff897d1b9a0708 R08: 0000000000000000 R09: 0000000000000000
  [139590.041815] R10: 0000000000000008 R11: 0000000000000000 R12: 0000000000000013
  [139590.042159] R13: 0000000000000227 R14: ffff897cffcbba88 R15: 0000000000000001
  [139590.042501] FS:  00007f2efc8dee80(0000) GS:ffff897d36a80000(0000) knlGS:0000000000000000
  [139590.042847] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  [139590.043199] CR2: 00007f8c064935e0 CR3: 0000000232252002 CR4: 00000000003606e0
  [139590.043547] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  [139590.043899] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  [139590.044250] Call Trace:
  [139590.044631]  copy_items+0xa3f/0x1000 [btrfs]
  [139590.045009]  ? generic_bin_search.constprop.32+0x61/0x200 [btrfs]
  [139590.045396]  btrfs_log_inode+0x7b3/0xd70 [btrfs]
  [139590.045773]  btrfs_log_inode_parent+0x2b3/0xce0 [btrfs]
  [139590.046143]  ? do_raw_spin_unlock+0x49/0xc0
  [139590.046510]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
  [139590.046872]  btrfs_sync_file+0x3b6/0x440 [btrfs]
  [139590.047243]  btrfs_file_write_iter+0x45b/0x5c0 [btrfs]
  [139590.047592]  __vfs_write+0x129/0x1c0
  [139590.047932]  vfs_write+0xc2/0x1b0
  [139590.048270]  ksys_write+0x55/0xc0
  [139590.048608]  do_syscall_64+0x60/0x1b0
  [139590.048946]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
  [139590.049287] RIP: 0033:0x7f2efc4be190
  (...)
  [139590.050342] RSP: 002b:00007ffe743243a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
  [139590.050701] RAX: ffffffffffffffda RBX: 0000000000008d58 RCX: 00007f2efc4be190
  [139590.051067] RDX: 0000000000008d58 RSI: 00005567eca0f370 RDI: 0000000000000003
  [139590.051459] RBP: 0000000000000024 R08: 0000000000000003 R09: 0000000000008d60
  [139590.051863] R10: 0000000000000078 R11: 0000000000000246 R12: 0000000000000003
  [139590.052252] R13: 00000000003d3507 R14: 00005567eca0f370 R15: 0000000000000000
  (...)
  [139590.055128] ---[ end trace 193f35d0215cdeeb ]---

So fix this race between a full ranged fsync and writeback of adjacent
ranges by flushing all delalloc and waiting for all ordered extents to
complete before logging the inode. This is the simplest way to solve the
problem because currently the full fsync path does not deal with ranges
at all (it assumes a full range from 0 to LLONG_MAX) and it always needs
to look at adjacent ranges for hole detection. For use cases of ranged
fsyncs this can make a few fsyncs slower but on the other hand it can
make some following fsyncs to other ranges do less work or no need to do
anything at all. A full fsync is rare anyway and happens only once after
loading/creating an inode and once after less common operations such as a
shrinking truncate.

This is an issue that exists for a long time, and was often triggered by
generic/127, because it does mmap'ed writes and msync (which triggers a
ranged fsync). Adding support for the tree checker to detect overlapping
extents (next patch in the series) and trigger a WARN() when such cases
are found, and then calling btrfs_check_leaf_full() at the end of
btrfs_insert_file_extent() made the issue much easier to detect. Running
btrfs/072 with that change to the tree checker and making fsstress open
files always with O_SYNC made it much easier to trigger the issue (as
triggering it with generic/127 is very rare).

CC: stable@vger.kernel.org # 3.16+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:52 +02:00
Andreas Gruenbacher
2f5ac0bd2e gfs2: Fix sign extension bug in gfs2_update_stats
commit 5a5ec83d6ac974b12085cd99b196795f14079037 upstream.

Commit 4d207133e9 changed the types of the statistic values in struct
gfs2_lkstats from s64 to u64.  Because of that, what should be a signed
value in gfs2_update_stats turned into an unsigned value.  When shifted
right, we end up with a large positive value instead of a small negative
value, which results in an incorrect variance estimate.

Fixes: 4d207133e9 ("gfs2: Make statistics unsigned, suitable for use with do_div()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: stable@vger.kernel.org # v4.4+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:51 +02:00
Jan Kara
75d63b131b ext4: do not delete unlinked inode from orphan list on failed truncate
commit ee0ed02ca93ef1ecf8963ad96638795d55af2c14 upstream.

It is possible that unlinked inode enters ext4_setattr() (e.g. if
somebody calls ftruncate(2) on unlinked but still open file). In such
case we should not delete the inode from the orphan list if truncate
fails. Note that this is mostly a theoretical concern as filesystem is
corrupted if we reach this path anyway but let's be consistent in our
orphan handling.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:50 +02:00
Nikolay Borisov
7d64186e79 btrfs: Honour FITRIM range constraints during free space trim
commit c2d1b3aae33605a61cbab445d8ae1c708ccd2698 upstream.

Up until now trimming the freespace was done irrespective of what the
arguments of the FITRIM ioctl were. For example fstrim's -o/-l arguments
will be entirely ignored. Fix it by correctly handling those paramter.
This requires breaking if the found freespace extent is after the end of
the passed range as well as completing trim after trimming
fstrim_range::len bytes.

Fixes: 499f377f49 ("btrfs: iterate over unused chunk space in FITRIM")
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:50 +02:00
Al Viro
66ee750cfd ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour
[ Upstream commit 4e9036042fedaffcd868d7f7aa948756c48c637d ]

To choose whether to pick the GID from the old (16bit) or new (32bit)
field, we should check if the old gid field is set to 0xffff.  Mainline
checks the old *UID* field instead - cut'n'paste from the corresponding
code in ufs_get_inode_uid().

Fixes: 252e211e90
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-06-11 12:23:49 +02:00
Jeff Layton
a7929c9486 ceph: flush dirty inodes before proceeding with remount
commit 00abf69dd24f4444d185982379c5cc3bb7b6d1fc upstream.

xfstest generic/452 was triggering a "Busy inodes after umount" warning.
ceph was allowing the mount to go read-only without first flushing out
dirty inodes in the cache. Ensure we sync out the filesystem before
allowing a remount to proceed.

Cc: stable@vger.kernel.org
Link: http://tracker.ceph.com/issues/39571
Signed-off-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:46 +02:00
Liu Bo
40857ab739 fuse: honor RLIMIT_FSIZE in fuse_file_fallocate
commit 0cbade024ba501313da3b7e5dd2a188a6bc491b5 upstream.

fstests generic/228 reported this failure that fuse fallocate does not
honor what 'ulimit -f' has set.

This adds the necessary inode_newsize_ok() check.

Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
Fixes: 05ba1f0823 ("fuse: add FALLOCATE operation")
Cc: <stable@vger.kernel.org> # v3.5
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:46 +02:00
Miklos Szeredi
73724958d1 fuse: fix writepages on 32bit
commit 9de5be06d0a89ca97b5ab902694d42dfd2bb77d2 upstream.

Writepage requests were cropped to i_size & 0xffffffff, which meant that
mmaped writes to any file larger than 4G might be silently discarded.

Fix by storing the file size in a properly sized variable (loff_t instead
of size_t).

Reported-by: Antonio SJ Musumeci <trapexit@spawn.link>
Fixes: 6eaf4782eb ("fuse: writepages: crop secondary requests")
Cc: <stable@vger.kernel.org> # v3.13
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:46 +02:00
ZhangXiaoxu
4676a07add NFS4: Fix v4.0 client state corruption when mount
commit f02f3755dbd14fb935d24b14650fff9ba92243b8 upstream.

stat command with soft mount never return after server is stopped.

When alloc a new client, the state of the client will be set to
NFS4CLNT_LEASE_EXPIRED.

When the server is stopped, the state manager will work, and accord
the state to recover. But the state is NFS4CLNT_LEASE_EXPIRED, it
will drain the slot table and lead other task to wait queue, until
the client recovered. Then the stat command is hung.

When discover server trunking, the client will renew the lease,
but check the client state, it lead the client state corruption.

So, we need to call state manager to recover it when detect server
ip trunking.

Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:45 +02:00
Christoph Probst
dffc9e5ffa cifs: fix strcat buffer overflow and reduce raciness in smb21_set_oplock_level()
commit 6a54b2e002c9d00b398d35724c79f9fe0d9b38fb upstream.

Change strcat to strncpy in the "None" case to fix a buffer overflow
when cinode->oplock is reset to 0 by another thread accessing the same
cinode. It is never valid to append "None" to any other message.

Consolidate multiple writes to cinode->oplock to reduce raciness.

Signed-off-by: Christoph Probst <kernel@probst.it>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:45 +02:00
Sriram Rajagopalan
98529ecd31 ext4: zero out the unused memory region in the extent tree block
commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream.

This commit zeroes out the unused memory region in the buffer_head
corresponding to the extent metablock after writing the extent header
and the corresponding extent node entries.

This is done to prevent random uninitialized data from getting into
the filesystem when the extent block is synced.

This fixes CVE-2019-11833.

Signed-off-by: Sriram Rajagopalan <sriramr@arista.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:42 +02:00
Jiufei Xue
9ff6372e5a fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream.

synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb
switch may not go to the workqueue after synchronize_rcu().  Thus
previous scheduled switches was not finished even flushing the
workqueue, which will cause a NULL pointer dereferenced followed below.

  VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds.  Have a nice day...
  BUG: unable to handle kernel NULL pointer dereference at 0000000000000278
    evict+0xb3/0x180
    iput+0x1b0/0x230
    inode_switch_wbs_work_fn+0x3c0/0x6a0
    worker_thread+0x4e/0x490
    ? process_one_work+0x410/0x410
    kthread+0xe6/0x100
    ret_from_fork+0x39/0x50

Replace the synchronize_rcu() call with a rcu_barrier() to wait for all
pending callbacks to finish.  And inc isw_nr_in_flight after call_rcu()
in inode_switch_wbs() to make more sense.

Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com
Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com>
Acked-by: Tejun Heo <tj@kernel.org>
Suggested-by: Tejun Heo <tj@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:41 +02:00
Tejun Heo
bfce20eaf1 writeback: synchronize sync(2) against cgroup writeback membership switches
commit 7fc5854f8c6efae9e7624970ab49a1eac2faefb1 upstream.

sync_inodes_sb() can race against cgwb (cgroup writeback) membership
switches and fail to writeback some inodes.  For example, if an inode
switches to another wb while sync_inodes_sb() is in progress, the new
wb might not be visible to bdi_split_work_to_wbs() at all or the inode
might jump from a wb which hasn't issued writebacks yet to one which
already has.

This patch adds backing_dev_info->wb_switch_rwsem to synchronize cgwb
switch path against sync_inodes_sb() so that sync_inodes_sb() is
guaranteed to see all the target wbs and inodes can't jump wbs to
escape syncing.

v2: Fixed misplaced rwsem init.  Spotted by Jiufei.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Jiufei Xue <xuejiufei@gmail.com>
Link: http://lkml.kernel.org/r/dc694ae2-f07f-61e1-7097-7c8411cee12d@gmail.com
Acked-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:41 +02:00
Filipe Manana
686e4352e3 Btrfs: do not start a transaction at iterate_extent_inodes()
commit bfc61c36260ca990937539cd648ede3cd749bc10 upstream.

When finding out which inodes have references on a particular extent, done
by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both
v1 and v2) ioctl and from scrub we use the transaction join API to grab a
reference on the currently running transaction, since in order to give
accurate results we need to inspect the delayed references of the currently
running transaction.

However, if there is currently no running transaction, the join operation
will create a new transaction. This is inefficient as the transaction will
eventually be committed, doing unnecessary IO and introducing a potential
point of failure that will lead to a transaction abort due to -ENOSPC, as
recently reported [1].

That's because the join, creates the transaction but does not reserve any
space, so when attempting to update the root item of the root passed to
btrfs_join_transaction(), during the transaction commit, we can end up
failling with -ENOSPC. Users of a join operation are supposed to actually
do some filesystem changes and reserve space by some means, which is not
the case of iterate_extent_inodes(), it is a read-only operation for all
contextes from which it is called.

The reported [1] -ENOSPC failure stack trace is the following:

 heisenberg kernel: ------------[ cut here ]------------
 heisenberg kernel: BTRFS: Transaction aborted (error -28)
 heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs]
(...)
 heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2
 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018
 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs]
(...)
 heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286
 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006
 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0
 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007
 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078
 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068
 heisenberg kernel: FS:  0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000
 heisenberg kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0
 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 heisenberg kernel: Call Trace:
 heisenberg kernel:  commit_fs_roots+0x166/0x1d0 [btrfs]
 heisenberg kernel:  ? _cond_resched+0x15/0x30
 heisenberg kernel:  ? btrfs_run_delayed_refs+0xac/0x180 [btrfs]
 heisenberg kernel:  btrfs_commit_transaction+0x2bd/0x870 [btrfs]
 heisenberg kernel:  ? start_transaction+0x9d/0x3f0 [btrfs]
 heisenberg kernel:  transaction_kthread+0x147/0x180 [btrfs]
 heisenberg kernel:  ? btrfs_cleanup_transaction+0x530/0x530 [btrfs]
 heisenberg kernel:  kthread+0x112/0x130
 heisenberg kernel:  ? kthread_bind+0x30/0x30
 heisenberg kernel:  ret_from_fork+0x35/0x40
 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---

So fix that by using the attach API, which does not create a transaction
when there is currently no running transaction.

[1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4.camel@scientia.net/

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
CC: stable@vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:38 +02:00
Debabrata Banerjee
b268b6e501 ext4: fix ext4_show_options for file systems w/o journal
commit 50b29d8f033a7c88c5bc011abc2068b1691ab755 upstream.

Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as
I assume was intended, all other options were blown away leading to
_ext4_show_options() output being incorrect.

Fixes: 1e381f60da ("ext4: do not allow journal_opts for fs w/o journal")
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:38 +02:00
Kirill Tkhai
f3b9c26f19 ext4: actually request zeroing of inode table after grow
commit 310a997fd74de778b9a4848a64be9cda9f18764a upstream.

It is never possible, that number of block groups decreases,
since only online grow is supported.

But after a growing occured, we have to zero inode tables
for just created new block groups.

Fixes: 19c5246d25 ("ext4: add new online resize interface")
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:38 +02:00
Shuning Zhang
e3a74fbc42 ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
commit e091eab028f9253eac5c04f9141bbc9d170acab3 upstream.

In some cases, ocfs2_iget() reads the data of inode, which has been
deleted for some reason.  That will make the system panic.  So We should
judge whether this inode has been deleted, and tell the caller that the
inode is a bad inode.

For example, the ocfs2 is used as the backed of nfs, and the client is
nfsv3.  This issue can be reproduced by the following steps.

on the nfs server side,
..../patha/pathb

Step 1: The process A was scheduled before calling the function fh_verify.

Step 2: The process B is removing the 'pathb', and just completed the call
to function dput.  Then the dentry of 'pathb' has been deleted from the
dcache, and all ancestors have been deleted also.  The relationship of
dentry and inode was deleted through the function hlist_del_init.  The
following is the call stack.
dentry_iput->hlist_del_init(&dentry->d_u.d_alias)

At this time, the inode is still in the dcache.

Step 3: The process A call the function ocfs2_get_dentry, which get the
inode from dcache.  Then the refcount of inode is 1.  The following is the
call stack.
nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)

Step 4: Dirty pages are flushed by bdi threads.  So the inode of 'patha'
is evicted, and this directory was deleted.  But the inode of 'pathb'
can't be evicted, because the refcount of the inode was 1.

Step 5: The process A keep running, and call the function
reconnect_path(in exportfs_decode_fh), which call function
ocfs2_get_parent of ocfs2.  Get the block number of parent
directory(patha) by the name of ...  Then read the data from disk by the
block number.  But this inode has been deleted, so the system panic.

Process A                                             Process B
1. in nfsd3_proc_getacl                   |
2.                                        |        dput
3. fh_to_dentry(ocfs2_get_dentry)         |
4. bdi flush dirty cache                  |
5. ocfs2_iget                             |

[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block:
Invalid dinode #580640: OCFS2_VALID_FL not set

[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced
after error

[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G        W
4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2
[283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX
Desktop Reference Platform, BIOS 6.00 09/21/2015
[283465.549657]  0000000000000000 ffff8800a56fb7b8 ffffffff816e839c
ffffffffa0514758
[283465.550392]  000000000008dc20 ffff8800a56fb838 ffffffff816e62d3
0000000000000008
[283465.551056]  ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8
ffff88005df9f000
[283465.551710] Call Trace:
[283465.552516]  [<ffffffff816e839c>] dump_stack+0x63/0x81
[283465.553291]  [<ffffffff816e62d3>] panic+0xcb/0x21b
[283465.554037]  [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2]
[283465.554882]  [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2]
[283465.555768]  [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230
[ocfs2]
[283465.556683]  [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2]
[283465.557408]  [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20
[ocfs2]
[283465.557973]  [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60
[ocfs2]
[283465.558525]  [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2]
[283465.559082]  [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2]
[283465.559622]  [<ffffffff81297c05>] reconnect_path+0xb5/0x300
[283465.560156]  [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0
[283465.560708]  [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd]
[283465.561262]  [<ffffffff810a8196>] ? prepare_creds+0x26/0x110
[283465.561932]  [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd]
[283465.562862]  [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd]
[283465.563697]  [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd]
[283465.564510]  [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd]
[283465.565358]  [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30
[sunrpc]
[283465.566272]  [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc]
[283465.567155]  [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc]
[283465.568020]  [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd]
[283465.568962]  [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd]
[283465.570112]  [<ffffffff810a622b>] kthread+0xcb/0xf0
[283465.571099]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
[283465.572114]  [<ffffffff816f11b8>] ret_from_fork+0x58/0x90
[283465.573156]  [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180

Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@oracle.com
Signed-off-by: Shuning Zhang <sunny.s.zhang@oracle.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: piaojun <piaojun@huawei.com>
Cc: "Gang He" <ghe@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-11 12:23:37 +02:00
Greg Kroah-Hartman
4521d273cf This is the 4.4.180 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlzdoa4ACgkQONu9yGCS
 aT6Rdg/+Ph+FR5Y8FI3lAJWSbdWw7ePiiN6LQ7NjM46nWaVWE3KzQoMHTN+qE69Q
 bZLnjSs73uFD2CdVZdxWG/2nkn6XNauEKOCZNeVXxvwbHokyWsTQv4cLaGqngbLw
 1DO02kgs5IXJJuvn41M0R+jU2Sxnj7yck/JDowB3OiMtM3nkdggBAr318ebPXrKb
 ZSWjRqI3d02MJxJO+/sUrrSfiVBiOtpj/2YbSw64vATg5mLrEfsmIuv2H2LZFcGK
 D8v6LKS1m2yERfs8Q361ha/NPKRW9jRsde9VvwKLTisXMDIFJNZL0XcvnfXO/oye
 VWf3TLk7dWUiyxD53LIQ/9S+9Pgv0qP8vgzw5s7BEYGDlUBcqFoZrKyMzCQgogon
 5lo0YQZkvQvXLKqbJVp1OFDASdD6YXXRgvtxGqzlhtS5513MT2p+gWBv164ZPjQn
 TCmqlJiR15QIoahxfDvSHzRssOofNCVVl1SS0Lx30N8++DopuEUc+86xmjE4iOq7
 gNHyrsKqHH/LRnVLaippnVm/fL/6zgiQ/ZaiZ2AWIGzGLGhqgbXTStsTNh6XZ0+c
 mIVUV3Tya/bP1II63R7PaQqoLBtJbARW/bDnbpHgnTDaBH30UcgI0LZx549/QELl
 0SNn8K0QkGqE5EgqGvchX+KMBIqMf1NXoAffeoZopSkdK3DMGB4=
 =37ec
 -----END PGP SIGNATURE-----

Merge 4.4.180 into android-4.4-p

Changes in 4.4.180
	kbuild: simplify ld-option implementation
	KVM: fail KVM_SET_VCPU_EVENTS with invalid exception number
	cifs: do not attempt cifs operation on smb2+ rename error
	MIPS: scall64-o32: Fix indirect syscall number load
	trace: Fix preempt_enable_no_resched() abuse
	sched/numa: Fix a possible divide-by-zero
	ceph: ensure d_name stability in ceph_dentry_hash()
	ceph: fix ci->i_head_snapc leak
	nfsd: Don't release the callback slot unless it was actually held
	sunrpc: don't mark uninitialised items as VALID.
	USB: Add new USB LPM helpers
	USB: Consolidate LPM checks to avoid enabling LPM twice
	powerpc/xmon: Add RFI flush related fields to paca dump
	powerpc/64s: Improve RFI L1-D cache flush fallback
	powerpc/pseries: Support firmware disable of RFI flush
	powerpc/powernv: Support firmware disable of RFI flush
	powerpc/rfi-flush: Move the logic to avoid a redo into the debugfs code
	powerpc/rfi-flush: Make it possible to call setup_rfi_flush() again
	powerpc/rfi-flush: Always enable fallback flush on pseries
	powerpc/rfi-flush: Differentiate enabled and patched flush types
	powerpc/pseries: Add new H_GET_CPU_CHARACTERISTICS flags
	powerpc/rfi-flush: Call setup_rfi_flush() after LPM migration
	powerpc: Add security feature flags for Spectre/Meltdown
	powerpc/pseries: Set or clear security feature flags
	powerpc/powernv: Set or clear security feature flags
	powerpc/64s: Move cpu_show_meltdown()
	powerpc/64s: Enhance the information in cpu_show_meltdown()
	powerpc/powernv: Use the security flags in pnv_setup_rfi_flush()
	powerpc/pseries: Use the security flags in pseries_setup_rfi_flush()
	powerpc/64s: Wire up cpu_show_spectre_v1()
	powerpc/64s: Wire up cpu_show_spectre_v2()
	powerpc/pseries: Fix clearing of security feature flags
	powerpc: Move default security feature flags
	powerpc/pseries: Restore default security feature flags on setup
	powerpc/64s: Fix section mismatch warnings from setup_rfi_flush()
	powerpc/64s: Add support for a store forwarding barrier at kernel entry/exit
	powerpc/64s: Add barrier_nospec
	powerpc/64s: Add support for ori barrier_nospec patching
	powerpc/64s: Patch barrier_nospec in modules
	powerpc/64s: Enable barrier_nospec based on firmware settings
	powerpc/64: Use barrier_nospec in syscall entry
	powerpc: Use barrier_nospec in copy_from_user()
	powerpc/64s: Enhance the information in cpu_show_spectre_v1()
	powerpc64s: Show ori31 availability in spectre_v1 sysfs file not v2
	powerpc/64: Disable the speculation barrier from the command line
	powerpc/64: Make stf barrier PPC_BOOK3S_64 specific.
	powerpc/64: Add CONFIG_PPC_BARRIER_NOSPEC
	powerpc/64: Call setup_barrier_nospec() from setup_arch()
	powerpc/64: Make meltdown reporting Book3S 64 specific
	powerpc/fsl: Add barrier_nospec implementation for NXP PowerPC Book3E
	powerpc/asm: Add a patch_site macro & helpers for patching instructions
	powerpc/64s: Add new security feature flags for count cache flush
	powerpc/64s: Add support for software count cache flush
	powerpc/pseries: Query hypervisor for count cache flush settings
	powerpc/powernv: Query firmware for count cache flush settings
	powerpc: Avoid code patching freed init sections
	powerpc/fsl: Add infrastructure to fixup branch predictor flush
	powerpc/fsl: Add macro to flush the branch predictor
	powerpc/fsl: Fix spectre_v2 mitigations reporting
	powerpc/fsl: Add nospectre_v2 command line argument
	powerpc/fsl: Flush the branch predictor at each kernel entry (64bit)
	powerpc/fsl: Update Spectre v2 reporting
	powerpc/security: Fix spectre_v2 reporting
	powerpc/fsl: Fix the flush of branch predictor.
	tipc: handle the err returned from cmd header function
	slip: make slhc_free() silently accept an error pointer
	intel_th: gth: Fix an off-by-one in output unassigning
	fs/proc/proc_sysctl.c: Fix a NULL pointer dereference
	NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family.
	netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON
	tipc: check bearer name with right length in tipc_nl_compat_bearer_enable
	tipc: check link name with right length in tipc_nl_compat_link_set
	bpf: reject wrong sized filters earlier
	Revert "block/loop: Use global lock for ioctl() operation."
	ipv4: add sanity checks in ipv4_link_failure()
	team: fix possible recursive locking when add slaves
	net: stmmac: move stmmac_check_ether_addr() to driver probe
	ipv4: set the tcp_min_rtt_wlen range from 0 to one day
	powerpc/fsl: Enable runtime patching if nospectre_v2 boot arg is used
	powerpc/fsl: Flush branch predictor when entering KVM
	powerpc/fsl: Emulate SPRN_BUCSR register
	powerpc/fsl: Flush the branch predictor at each kernel entry (32 bit)
	powerpc/fsl: Sanitize the syscall table for NXP PowerPC 32 bit platforms
	powerpc/fsl: Fixed warning: orphan section `__btb_flush_fixup'
	powerpc/fsl: Add FSL_PPC_BOOK3E as supported arch for nospectre_v2 boot arg
	Documentation: Add nospectre_v1 parameter
	usbnet: ipheth: prevent TX queue timeouts when device not ready
	usbnet: ipheth: fix potential null pointer dereference in ipheth_carrier_set
	qlcnic: Avoid potential NULL pointer dereference
	netfilter: bridge: set skb transport_header before entering NF_INET_PRE_ROUTING
	sc16is7xx: missing unregister/delete driver on error in sc16is7xx_init()
	usb: gadget: net2280: Fix overrun of OUT messages
	usb: gadget: net2280: Fix net2280_dequeue()
	usb: gadget: net2272: Fix net2272_dequeue()
	ARM: dts: pfla02: increase phy reset duration
	net: ks8851: Dequeue RX packets explicitly
	net: ks8851: Reassert reset pin if chip ID check fails
	net: ks8851: Delay requesting IRQ until opened
	net: ks8851: Set initial carrier state to down
	net: xilinx: fix possible object reference leak
	net: ibm: fix possible object reference leak
	net: ethernet: ti: fix possible object reference leak
	scsi: qla4xxx: fix a potential NULL pointer dereference
	usb: u132-hcd: fix resource leak
	ceph: fix use-after-free on symlink traversal
	scsi: zfcp: reduce flood of fcrscn1 trace records on multi-element RSCN
	libata: fix using DMA buffers on stack
	kconfig/[mn]conf: handle backspace (^H) key
	vfio/type1: Limit DMA mappings per container
	ALSA: line6: use dynamic buffers
	ipv4: ip_do_fragment: Preserve skb_iif during fragmentation
	ipv6/flowlabel: wait rcu grace period before put_pid()
	ipv6: invert flowlabel sharing check in process and user mode
	bnxt_en: Improve multicast address setup logic.
	packet: validate msg_namelen in send directly
	USB: yurex: Fix protection fault after device removal
	USB: w1 ds2490: Fix bug caused by improper use of altsetting array
	USB: core: Fix unterminated string returned by usb_string()
	USB: core: Fix bug caused by duplicate interface PM usage counter
	HID: debug: fix race condition with between rdesc_show() and device removal
	rtc: sh: Fix invalid alarm warning for non-enabled alarm
	igb: Fix WARN_ONCE on runtime suspend
	bonding: show full hw address in sysfs for slave entries
	jffs2: fix use-after-free on symlink traversal
	debugfs: fix use-after-free on symlink traversal
	rtc: da9063: set uie_unsupported when relevant
	vfio/pci: use correct format characters
	scsi: storvsc: Fix calculation of sub-channel count
	net: hns: Use NAPI_POLL_WEIGHT for hns driver
	net: hns: Fix WARNING when remove HNS driver with SMMU enabled
	hugetlbfs: fix memory leak for resv_map
	xsysace: Fix error handling in ace_setup
	ARM: orion: don't use using 64-bit DMA masks
	ARM: iop: don't use using 64-bit DMA masks
	usb: usbip: fix isoc packet num validation in get_pipe
	staging: iio: adt7316: allow adt751x to use internal vref for all dacs
	staging: iio: adt7316: fix the dac read calculation
	staging: iio: adt7316: fix the dac write calculation
	Input: snvs_pwrkey - initialize necessary driver data before enabling IRQ
	selinux: never allow relabeling on context mounts
	x86/mce: Improve error message when kernel cannot recover, p2
	media: v4l2: i2c: ov7670: Fix PLL bypass register values
	scsi: libsas: fix a race condition when smp task timeout
	ASoC:soc-pcm:fix a codec fixup issue in TDM case
	ASoC: cs4270: Set auto-increment bit for register writes
	ASoC: tlv320aic32x4: Fix Common Pins
	perf/x86/intel: Fix handling of wakeup_events for multi-entry PEBS
	scsi: csiostor: fix missing data copy in csio_scsi_err_handler()
	iommu/amd: Set exclusion range correctly
	genirq: Prevent use-after-free and work list corruption
	usb: dwc3: Fix default lpm_nyet_threshold value
	scsi: qla2xxx: Fix incorrect region-size setting in optrom SYSFS routines
	Bluetooth: hidp: fix buffer overflow
	Bluetooth: Align minimum encryption key size for LE and BR/EDR connections
	UAS: fix alignment of scatter/gather segments
	ipv6: fix a potential deadlock in do_ipv6_setsockopt()
	ASoC: Intel: avoid Oops if DMA setup fails
	timer/debug: Change /proc/timer_stats from 0644 to 0600
	netfilter: compat: initialize all fields in xt_init
	platform/x86: sony-laptop: Fix unintentional fall-through
	iio: adc: xilinx: fix potential use-after-free on remove
	HID: input: add mapping for Expose/Overview key
	HID: input: add mapping for keyboard Brightness Up/Down/Toggle keys
	libnvdimm/btt: Fix a kmemdup failure check
	s390/dasd: Fix capacity calculation for large volumes
	s390/3270: fix lockdep false positive on view->lock
	KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
	tools lib traceevent: Fix missing equality check for strcmp
	init: initialize jump labels before command line option parsing
	ipvs: do not schedule icmp errors from tunnels
	s390: ctcm: fix ctcm_new_device error return code
	selftests/net: correct the return value for run_netsocktests
	gpu: ipu-v3: dp: fix CSC handling
	cw1200: fix missing unlock on error in cw1200_hw_scan()
	x86/vdso: Pass --eh-frame-hdr to the linker
	Don't jump to compute_result state from check_result state
	locking/static_keys: Provide DECLARE and well as DEFINE macros
	x86/microcode/intel: Add a helper which gives the microcode revision
	x86: stop exporting msr-index.h to userland
	bitops: avoid integer overflow in GENMASK(_ULL)
	x86/microcode/intel: Check microcode revision before updating sibling threads
	x86/MCE: Save microcode revision in machine check records
	x86/cpufeatures: Hide AMD-specific speculation flags
	x86/speculation: Support Enhanced IBRS on future CPUs
	x86/speculation: Simplify the CPU bug detection logic
	x86/bugs: Add AMD's variant of SSB_NO
	x86/bugs: Add AMD's SPEC_CTRL MSR usage
	x86/bugs: Switch the selection of mitigation from CPU vendor to CPU features
	locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a new <linux/bits.h> file
	x86/bugs: Fix the AMD SSBD usage of the SPEC_CTRL MSR
	x86/speculation: Remove SPECTRE_V2_IBRS in enum spectre_v2_mitigation
	x86/microcode: Make sure boot_cpu_data.microcode is up-to-date
	x86/microcode: Update the new microcode revision unconditionally
	x86/cpu: Sanitize FAM6_ATOM naming
	KVM: x86: SVM: Call x86_spec_ctrl_set_guest/host() with interrupts disabled
	x86/mm: Use WRITE_ONCE() when setting PTEs
	x86/speculation: Apply IBPB more strictly to avoid cross-process data leak
	x86/speculation: Enable cross-hyperthread spectre v2 STIBP mitigation
	x86/speculation: Propagate information about RSB filling mitigation to sysfs
	x86/speculation: Update the TIF_SSBD comment
	x86/speculation: Clean up spectre_v2_parse_cmdline()
	x86/speculation: Remove unnecessary ret variable in cpu_show_common()
	x86/speculation: Move STIPB/IBPB string conditionals out of cpu_show_common()
	x86/speculation: Disable STIBP when enhanced IBRS is in use
	x86/speculation: Rename SSBD update functions
	x86/speculation: Reorganize speculation control MSRs update
	x86/Kconfig: Select SCHED_SMT if SMP enabled
	sched: Add sched_smt_active()
	x86/speculation: Rework SMT state change
	x86/speculation: Reorder the spec_v2 code
	x86/speculation: Mark string arrays const correctly
	x86/speculataion: Mark command line parser data __initdata
	x86/speculation: Unify conditional spectre v2 print functions
	x86/speculation: Add command line control for indirect branch speculation
	x86/speculation: Prepare for per task indirect branch speculation control
	x86/process: Consolidate and simplify switch_to_xtra() code
	x86/speculation: Avoid __switch_to_xtra() calls
	x86/speculation: Prepare for conditional IBPB in switch_mm()
	x86/speculation: Split out TIF update
	x86/speculation: Prepare arch_smt_update() for PRCTL mode
	x86/speculation: Prevent stale SPEC_CTRL msr content
	x86/speculation: Add prctl() control for indirect branch speculation
	x86/speculation: Enable prctl mode for spectre_v2_user
	x86/speculation: Add seccomp Spectre v2 user space protection mode
	x86/speculation: Provide IBPB always command line options
	kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
	x86/msr-index: Cleanup bit defines
	x86/speculation: Consolidate CPU whitelists
	x86/speculation/mds: Add basic bug infrastructure for MDS
	x86/speculation/mds: Add BUG_MSBDS_ONLY
	x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
	x86/speculation/mds: Add mds_clear_cpu_buffers()
	x86/speculation/mds: Clear CPU buffers on exit to user
	x86/speculation/mds: Conditionally clear CPU buffers on idle entry
	x86/speculation/mds: Add mitigation control for MDS
	x86/speculation/l1tf: Document l1tf in sysfs
	x86/speculation/mds: Add sysfs reporting for MDS
	x86/speculation/mds: Add mitigation mode VMWERV
	Documentation: Move L1TF to separate directory
	Documentation: Add MDS vulnerability documentation
	x86/cpu/bugs: Use __initconst for 'const' init data
	x86/speculation: Move arch_smt_update() call to after mitigation decisions
	x86/speculation/mds: Add SMT warning message
	x86/speculation/mds: Fix comment
	x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
	cpu/speculation: Add 'mitigations=' cmdline option
	x86/speculation: Support 'mitigations=' cmdline option
	x86/speculation/mds: Add 'mitigations=' support for MDS
	x86/mds: Add MDSUM variant to the MDS documentation
	Documentation: Correct the possible MDS sysfs values
	x86/speculation/mds: Fix documentation typo
	x86/bugs: Change L1TF mitigation string to match upstream
	USB: serial: use variable for status
	USB: serial: fix unthrottle races
	powerpc/64s: Include cpu header
	bridge: Fix error path for kobject_init_and_add()
	net: ucc_geth - fix Oops when changing number of buffers in the ring
	packet: Fix error path in packet_init
	vlan: disable SIOCSHWTSTAMP in container
	vrf: sit mtu should not be updated when vrf netdev is the link
	ipv4: Fix raw socket lookup for local traffic
	bonding: fix arp_validate toggling in active-backup mode
	drivers/virt/fsl_hypervisor.c: dereferencing error pointers in ioctl
	drivers/virt/fsl_hypervisor.c: prevent integer overflow in ioctl
	powerpc/booke64: set RI in default MSR
	powerpc/lib: fix book3s/32 boot failure due to code patching
	Linux 4.4.180

Change-Id: I72f6c596cc992689d95abc8b5d1303d6ec22b051
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2019-05-16 22:34:35 +02:00