commit 7f43d4affb2a254d421ab20b0cf65ac2569909fb upstream.
Function check_leaf() checks if any item pointer points outside of the
leaf, but it doesn't check if the pointer overlaps with the item itself.
Normally only the last item may be the victim, but adding such check is
never a bad idea anyway.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c3267bbaa9cae09b62960eafe33ad19196803285 upstream.
Current check_leaf() function does a good job checking key order and
item offset/size.
However it only checks from slot 0 to the last but one slot, this is
good but makes later expansion hard.
So this refactoring iterates from slot 0 to the last slot.
For key comparison, it uses a key with all 0 as initial key, so all
valid keys should be larger than that.
And for item size/offset checks, it compares current item end with
previous item offset.
For slot 0, use leaf end as a special case.
This makes later item/key offset checks and item size checks easier to
be implemented.
Also, makes check_leaf() to return -EUCLEAN other than -EIO to indicate
error.
Signed-off-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[bwh: Backported to 4.4:
- BTRFS_LEAF_DATA_SIZE() takes a root rather than an fs_info
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1cbb1f454e5321e47fc1e6b233066c7ccc979d15 upstream.
We have reader helpers for most of the on-disk structures that use
an extent_buffer and pointer as offset into the buffer that are
read-only. We should mark them as const and, in turn, allow consumers
of these interfaces to mark the buffers const as well.
No impact on code, but serves as documentation that a buffer is intended
not to be modified.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 851cd173f06045816528176001cf82948282029c upstream.
This is an additional patch to
"Btrfs: memset to avoid stale content in btree node block".
This uses memset to initialize the unused space in a leaf to avoid
potential stale content, which may be incurred by pushing items
between sibling leaves.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 02794222c4132ac003e7281fb71f4ec1645ffc87 upstream.
In a corrupted btrfs image, we can come across this BUG_ON and
get an unreponsive system, but if we return errors instead,
its caller can handle everything gracefully by aborting the current
transaction.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b722c1747d533ac6d4df110dc8233db46918b65 upstream.
We need to check items in a node to make sure that we're reading
a valid one, otherwise we could get various crashes while processing
delayed_refs.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3eb548ee3a8042d95ad81be254e67a5222c24e03 upstream.
During updating btree, we could push items between sibling
nodes/leaves, for leaves data sections starts reversely from
the end of the block while for nodes we only have key pairs
which are stored one by one from the start of the block.
So we could do try to push key pairs from one node to the next
node right in the tree, and after that, we update the node's
nritems to reflect the correct end while leaving the stale
content in the node. One may intentionally corrupt the fs
image and access the stale content by bumping the nritems and
causes various crashes.
This takes the in-memory @nritems as the correct one and
gets to memset the unused part of a btree node.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ef85b25e982b5bba1530b936e283ef129f02ab9d upstream.
This can only happen with CONFIG_BTRFS_FS_CHECK_INTEGRITY=y.
Commit 1ba98d0 ("Btrfs: detect corruption when non-root leaf has zero item")
assumes that a leaf is its root when leaf->bytenr == btrfs_root_bytenr(root),
however, we should not use btrfs_root_bytenr(root) since it's mainly got
updated during committing transaction. So the check can fail when doing
COW on this leaf while it is a root.
This changes to use "if (leaf == btrfs_root_node(root))" instead, just like
how we check whether leaf is a root in __btrfs_cow_block().
Fixes: 1ba98d086fe3 (Btrfs: detect corruption when non-root leaf has zero item)
Reported-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 053ab70f0604224c7893b43f9d9d5efa283580d6 upstream.
When btree node (level = 1) has nritems which equals to zero,
we can end up with panic due to insert_ptr()'s
BUG_ON(slot > nritems);
where slot is 1 and nritems is 0, as copy_for_split() calls
insert_ptr(.., path->slots[1] + 1, ...);
A invalid value results in the whole mess, this adds the check
for btree's node nritems so that we stop reading block when
when something is wrong.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1ba98d086fe3a14d6a31f2f66dbab70c45d00f63 upstream.
Right now we treat leaf which has zero item as a valid one
because we could have an empty tree, that is, a root that is
also a leaf without any item, however, in the same case but
when the leaf is not a root, we can end up with hitting the
BUG_ON(1) in btrfs_extend_item() called by
setup_inline_extent_backref().
This makes us check the situation as a corruption if leaf is
not its own root.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 187ee58c62c1d0d238d3dc4835869d33e1869906 upstream.
We need to call free_extent_map() on the em we look up.
Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6fb37b756acce6d6e045f79c3764206033f617b4 upstream.
With btrfs-corrupt-block, one can drop one chunk item and mounting
will end up with a panic in btrfs_full_stripe_len().
This doesn't not remove the BUG_ON, but instead checks it a bit
earlier when we find the block group item.
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e06cd3dd7cea50e87663a88acdfdb7ac1c53a5ca upstream.
To prevent fuzzed filesystem images from panic the whole system,
we need various validation checks to refuse to mount such an image
if btrfs finds any invalid value during loading chunks, including
both sys_array and regular chunks.
Note that these checks may not be sufficient to cover all corner cases,
feel free to add more checks.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Reported-by: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f04b772bfc17f502703794f4d100d12155c1a1a9 upstream.
Enhance chunk validation:
1) Num_stripes
We already have such check but it's only in super block sys chunk
array.
Now check all on-disk chunks.
2) Chunk logical
It should be aligned to sector size.
This behavior should be *DOUBLE CHECKED* for 64K sector size like
PPC64 or AArch64.
Maybe we can found some hidden bugs.
3) Chunk length
Same as chunk logical, should be aligned to sector size.
4) Stripe length
It should be power of 2.
5) Chunk type
Any bit out of TYPE_MAS | PROFILE_MASK is invalid.
With all these much restrict rules, several fuzzed image reported in
mail list should no longer cause kernel panic.
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 95617d69326ce386c95e33db7aeb832b45ee9f8f upstream.
Overloading extent_map->bdev to struct map_lookup * might have started out
as a means to an end, but it's a pattern that's used all over the place
now. Let's get rid of the casting and just add a union instead.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlw6/14ACgkQONu9yGCS
aT7NGw//UntCx3g48kmf077FwEf+l85rCffBZDknDppUzHA6fsLbMLYOsrDytRo/
rfhPZzmvIB8B3JtCJW3QZQiPUBy/TBJP4o6CNALbyYVkr9DVV0H6dqwN3R9tK3zm
EHg1qaKheUgfBSe+wNGUdyQ2MiSeAGAW0AaAPGqsWthJVKU1MdL7xW85uAn5kfci
rgMW+kFKEkVEQjzlbSGK1IZQpp5mqyZqXYaYcBkqf2TRmwJAVzwUP2EASrKaWyN7
cdHSvJsy23fd21GhTd9982tqE8cLu95g3uV/NgdvSEoWynSLgK0hMHUL7ec8amO/
CLaY0eAq/aJYXiUnKOPy15Uzznp97YyghFxxg59uxhXqKBAGxFJrkPi+Ct3GNVck
uNtOpFNu2h46uRmFrRz7zyNt3gpQtJOXBnMSsJcI9BNYO1kuy3C5q6MwrhlzAltD
WdXGQRtXrY7gd8KU3YYIkEgyTxSu1QlSdVutdWOoQD9shaF9vcT3jwl1SyAQWO7o
YXSDeVrglAmSul0CelxJltE16yJGY6yAbzqP6gssH9zG9nszooSFG1L/UsJfOiMg
iOEh7K2kwMlkd3HrxbnzaKDJ2hmpz/ipoJA3DKvzo1Dose6604szp5qFkWVQFRRs
nbo7DZLvfQPe10UWObH/0FDyX5fi4GuxN+BEqVtNvNPHgSRmxPM=
=eSCs
-----END PGP SIGNATURE-----
Merge 4.4.170 into android-4.4-p
Changes in 4.4.170
USB: hso: Fix OOB memory access in hso_probe/hso_get_config_data
xhci: Don't prevent USB2 bus suspend in state check intended for USB3 only
USB: serial: option: add GosunCn ZTE WeLink ME3630
USB: serial: option: add HP lt4132
USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)
USB: serial: option: add Fibocom NL668 series
USB: serial: option: add Telit LN940 series
mmc: core: Reset HPI enabled state during re-init and in case of errors
mmc: omap_hsmmc: fix DMA API warning
gpio: max7301: fix driver for use with CONFIG_VMAP_STACK
Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
x86/mtrr: Don't copy uninitialized gentry fields back to userspace
drm/ioctl: Fix Spectre v1 vulnerabilities
ip6mr: Fix potential Spectre v1 vulnerability
ipv4: Fix potential Spectre v1 vulnerability
ax25: fix a use-after-free in ax25_fillin_cb()
ibmveth: fix DMA unmap error in ibmveth_xmit_start error path
ieee802154: lowpan_header_create check must check daddr
ipv6: explicitly initialize udp6_addr in udp_sock_create6()
isdn: fix kernel-infoleak in capi_unlocked_ioctl
netrom: fix locking in nr_find_socket()
packet: validate address length
packet: validate address length if non-zero
sctp: initialize sin6_flowinfo for ipv6 addrs in sctp_inet6addr_event
vhost: make sure used idx is seen before log in vhost_add_used_n()
VSOCK: Send reset control packet when socket is partially bound
xen/netfront: tolerate frags with no data
gro_cell: add napi_disable in gro_cells_destroy
sock: Make sock->sk_stamp thread-safe
ALSA: rme9652: Fix potential Spectre v1 vulnerability
ALSA: emu10k1: Fix potential Spectre v1 vulnerabilities
ALSA: pcm: Fix potential Spectre v1 vulnerability
ALSA: emux: Fix potential Spectre v1 vulnerabilities
ALSA: hda: add mute LED support for HP EliteBook 840 G4
ALSA: hda/tegra: clear pending irq handlers
USB: serial: pl2303: add ids for Hewlett-Packard HP POS pole displays
USB: serial: option: add Fibocom NL678 series
usb: r8a66597: Fix a possible concurrency use-after-free bug in r8a66597_endpoint_disable()
Input: elan_i2c - add ACPI ID for touchpad in ASUS Aspire F5-573G
KVM: x86: Use jmp to invoke kvm_spurious_fault() from .fixup
perf pmu: Suppress potential format-truncation warning
ext4: fix possible use after free in ext4_quota_enable
ext4: missing unlock/put_page() in ext4_try_to_write_inline_data()
ext4: fix EXT4_IOC_GROUP_ADD ioctl
ext4: force inode writes when nfsd calls commit_metadata()
spi: bcm2835: Fix race on DMA termination
spi: bcm2835: Fix book-keeping of DMA termination
spi: bcm2835: Avoid finishing transfer prematurely in IRQ mode
cdc-acm: fix abnormal DATA RX issue for Mediatek Preloader.
media: vivid: free bitmap_cap when updating std/timings/etc.
MIPS: Ensure pmd_present() returns false after pmd_mknotpresent()
MIPS: Align kernel load address to 64KB
CIFS: Fix error mapping for SMB2_LOCK command which caused OFD lock problem
x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested
spi: bcm2835: Unbreak the build of esoteric configs
powerpc: Fix COFF zImage booting on old powermacs
ARM: imx: update the cpu power up timing setting on i.mx6sx
Input: restore EV_ABS ABS_RESERVED
checkstack.pl: fix for aarch64
xfrm: Fix bucket count reported to userspace
scsi: bnx2fc: Fix NULL dereference in error handling
Input: omap-keypad - fix idle configuration to not block SoC idle states
scsi: zfcp: fix posting too many status read buffers leading to adapter shutdown
fork: record start_time late
hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined
mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
mm, devm_memremap_pages: kill mapping "System RAM" support
sunrpc: fix cache_head leak due to queued request
sunrpc: use SVC_NET() in svcauth_gss_* functions
crypto: x86/chacha20 - avoid sleeping with preemption disabled
ALSA: cs46xx: Potential NULL dereference in probe
ALSA: usb-audio: Avoid access before bLength check in build_audio_procunit()
ALSA: usb-audio: Fix an out-of-bound read in create_composite_quirks
dlm: fixed memory leaks after failed ls_remove_names allocation
dlm: possible memory leak on error path in create_lkb()
dlm: lost put_lkb on error path in receive_convert() and receive_unlock()
dlm: memory leaks on error path in dlm_user_request()
gfs2: Fix loop in gfs2_rbm_find
b43: Fix error in cordic routine
9p/net: put a lower bound on msize
iommu/vt-d: Handle domain agaw being less than iommu agaw
ceph: don't update importing cap's mseq when handing cap export
genwqe: Fix size check
intel_th: msu: Fix an off-by-one in attribute store
power: supply: olpc_battery: correct the temperature units
Linux 4.4.170
Change-Id: I33c9750483716a6c44b40fbea8e729f96af41f52
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3c1392d4c49962a31874af14ae9ff289cb2b3851 upstream.
Updating mseq makes client think importer mds has accepted all prior
cap messages and importer mds knows what caps client wants. Actually
some cap messages may have been dropped because of mseq mismatch.
If mseq is left untouched, importing cap's mds_wanted later will get
reset by cap import message.
Cc: stable@vger.kernel.org
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2d29f6b96d8f80322ed2dd895bca590491c38d34 upstream.
Fix the resource group wrap-around logic in gfs2_rbm_find that commit
e579ed4f44 broke. The bug can lead to unnecessary repeated scanning of the
same bitmaps; there is a risk that future changes will turn this into an
endless loop.
Fixes: e579ed4f44 ("GFS2: Introduce rbm field bii")
Cc: stable@vger.kernel.org # v3.13+
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d47b41aceeadc6b58abc9c7c6485bef7cfb75636 upstream.
According to comment in dlm_user_request() ua should be freed
in dlm_free_lkb() after successful attach to lkb.
However ua is attached to lkb not in set_lock_args() but later,
inside request_lock().
Fixes 597d0cae0f ("[DLM] dlm: user locks")
Cc: stable@kernel.org # 2.6.19
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b982896cdb6e6a6b89d86dfb39df489d9df51e14 upstream.
If allocation fails on last elements of array need to free already
allocated elements.
v2: just move existing out_rsbtbl label to right place
Fixes 789924ba635f ("dlm: fix race between remove and lookup")
Cc: stable@kernel.org # 3.6
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Signed-off-by: David Teigland <teigland@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 9a596f5b39593414c0ec80f71b94a226286f084e upstream.
While resolving a bug with locks on samba shares found a strange behavior.
When a file locked by one node and we trying to lock it from another node
it fail with errno 5 (EIO) but in that case errno must be set to
(EACCES | EAGAIN).
This isn't happening when we try to lock file second time on same node.
In this case it returns EACCES as expected.
Also this issue not reproduces when we use SMB1 protocol (vers=1.0 in
mount options).
Further investigation showed that the mapping from status_to_posix_error
is different for SMB1 and SMB2+ implementations.
For SMB1 mapping is [NT_STATUS_LOCK_NOT_GRANTED to ERRlock]
(See fs/cifs/netmisc.c line 66)
but for SMB2+ mapping is [STATUS_LOCK_NOT_GRANTED to -EIO]
(see fs/cifs/smb2maperror.c line 383)
Quick changes in SMB2+ mapping from EIO to EACCES has fixed issue.
BUG: https://bugzilla.kernel.org/show_bug.cgi?id=201971
Signed-off-by: Georgy A Bystrenin <gkot@altlinux.org>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fde872682e175743e0c3ef939c89e3c6008a1529 upstream.
Some time back, nfsd switched from calling vfs_fsync() to using a new
commit_metadata() hook in export_operations(). If the file system did
not provide a commit_metadata() hook, it fell back to using
sync_inode_metadata(). Unfortunately doesn't work on all file
systems. In particular, it doesn't work on ext4 due to how the inode
gets journalled --- the VFS writeback code will not always call
ext4_write_inode().
So we need to provide our own ext4_nfs_commit_metdata() method which
calls ext4_write_inode() directly.
Google-Bug-Id: 121195940
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e647e29196b7f802f8242c39ecb7cc937f5ef217 upstream.
Commit e2b911c535 ("ext4: clean up feature test macros with
predicate functions") broke the EXT4_IOC_GROUP_ADD ioctl. This was
not noticed since only very old versions of resize2fs (before
e2fsprogs 1.42) use this ioctl. However, using a new kernel with an
enterprise Linux userspace will cause attempts to use online resize to
fail with "No reserved GDT blocks".
Fixes: e2b911c535 ("ext4: clean up feature test macros with predicate...")
Cc: stable@kernel.org # v4.4
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: ruippan (潘睿) <ruippan@tencent.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 132d00becb31e88469334e1e62751c81345280e0 upstream.
In case of error, ext4_try_to_write_inline_data() should unlock
and release the page it holds.
Fixes: f19d5870cb ("ext4: add normal write support for inline data")
Cc: stable@kernel.org # 3.8
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 61157b24e60fb3cd1f85f2c76a7b1d628f970144 upstream.
The function frees qf_inode via iput but then pass qf_inode to
lockdep_set_quota_inode on the failure path. This may result in a
use-after-free bug. The patch frees df_inode only when it is never used.
Fixes: daf647d2dd5 ("ext4: add lockdep annotations for i_data_sem")
Cc: stable@kernel.org # 4.6
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e785302dad32228819d8066e5376acd15d0e6ba ]
Missing a dependency. Shouldn't show cifs posix extensions
in Kconfig if CONFIG_CIFS_ALLOW_INSECURE_DIALECTS (ie SMB1
protocol) is disabled.
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit d4fdf8ba0e5808ba9ad6b44337783bd9935e0982 upstream.
Mount fs with option noflush_merge, boot failed for illegal address
fcc in function f2fs_issue_flush:
if (!test_opt(sbi, FLUSH_MERGE)) {
ret = submit_flush_wait(sbi);
atomic_inc(&fcc->issued_flush); -> Here, fcc illegal
return ret;
}
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
[bwh: Backported to 4.9: adjust context]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a538e3ff9dabcdf6c3f477a373c629213d1c3066 upstream.
Matthew pointed out that the ioctx_table is susceptible to spectre v1,
because the index can be controlled by an attacker. The below patch
should mitigate the attack for all of the aio system calls.
Cc: stable@vger.kernel.org
Reported-by: Matthew Wilcox <willy@infradead.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlwYDToACgkQONu9yGCS
aT79dhAAhjCCEjMpcWGXExuCryWYUKJGV6rI1Hk3o5+Jr6tu/dWnhCLQrrSLgyCR
qbhBPW/MLpedxnoeLD0Kzo5XDvziB7dNrVgaure923N/Urst4JTH+hMBX6HHUPWY
vGReKg0a6HNaKsitlTPQaZTNE0uJJ1oCO7mEWYkU571zWaiT8/MT/wo42Ruiab1/
zw4YVlb74fdZRuazAmTIdszC8MxCoqBJQuzl0UvbKcPtosPdZLywi4Rw0LQNgdcf
nO/FZE9GPYPw2G/yV3XMp3qs+vVtJpZQrwrF2xHHDfe7Hosk5bB9iEcl6iSbYvyw
Eir1nD8YTD438sAcLgV3EDRguhQbgBcd23YHFPuyfJrZErZnshfp63iLLYIdZ1Mn
OP47nilY1/FnvxIzJFn0aHlg+9Ix9RepmPWL31xqHb6a0HuYJRJY6ciLln9v+Mld
jG4TtuxlGdkQzbkiSnNVbMVcsWMwX4OHmwQLteZvlzdj5bro5ko+8SVaio5TWBRB
bA9Bw82mKw3BLvlhmgM0Rg0pwJpgXl88r6o5iq2zALVPCUOdFOedoHdCiPpwO1Hl
eFUY2PYx1YZk8qZXX6eh0LhoHM1Lqyd7qSjDbekKGf1oBVUlLe3umhSGivp3j2is
ei1usTw3uM3n6thSeIKPn565gyr/CwXbspo3Ym/YG+719a+XwNU=
=2xi1
-----END PGP SIGNATURE-----
Merge 4.4.168 into android-4.4-p
Changes in 4.4.168
ipv6: Check available headroom in ip6_xmit() even without options
net: 8139cp: fix a BUG triggered by changing mtu with network traffic
net: phy: don't allow __set_phy_supported to add unsupported modes
net: Prevent invalid access to skb->prev in __qdisc_drop_all
rtnetlink: ndo_dflt_fdb_dump() only work for ARPHRD_ETHER devices
tcp: fix NULL ref in tail loss probe
tun: forbid iface creation with rtnl ops
neighbour: Avoid writing before skb->head in neigh_hh_output()
ARM: OMAP2+: prm44xx: Fix section annotation on omap44xx_prm_enable_io_wakeup
ARM: OMAP1: ams-delta: Fix possible use of uninitialized field
sysv: return 'err' instead of 0 in __sysv_write_inode
s390/cpum_cf: Reject request for sampling in event initialization
hwmon: (ina2xx) Fix current value calculation
ASoC: dapm: Recalculate audio map forcely when card instantiated
hwmon: (w83795) temp4_type has writable permission
Btrfs: send, fix infinite loop due to directory rename dependencies
ASoC: omap-mcpdm: Add pm_qos handling to avoid under/overruns with CPU_IDLE
ASoC: omap-dmic: Add pm_qos handling to avoid overruns with CPU_IDLE
exportfs: do not read dentry after free
bpf: fix check of allowed specifiers in bpf_trace_printk
USB: omap_udc: use devm_request_irq()
USB: omap_udc: fix crashes on probe error and module removal
USB: omap_udc: fix omap_udc_start() on 15xx machines
USB: omap_udc: fix USB gadget functionality on Palm Tungsten E
KVM: x86: fix empty-body warnings
net: thunderx: fix NULL pointer dereference in nic_remove
ixgbe: recognize 1000BaseLX SFP modules as 1Gbps
net: hisilicon: remove unexpected free_netdev
drm/ast: fixed reading monitor EDID not stable issue
xen: xlate_mmu: add missing header to fix 'W=1' warning
fscache: fix race between enablement and dropping of object
fscache, cachefiles: remove redundant variable 'cache'
ocfs2: fix deadlock caused by ocfs2_defrag_extent()
hfs: do not free node before using
hfsplus: do not free node before using
debugobjects: avoid recursive calls with kmemleak
ocfs2: fix potential use after free
pstore: Convert console write to use ->write_buf
ALSA: pcm: remove SNDRV_PCM_IOCTL1_INFO internal command
KVM: nVMX: fix msr bitmaps to prevent L2 from accessing L0 x2APIC
KVM: nVMX: mark vmcs12 pages dirty on L2 exit
KVM: nVMX: Eliminate vmcs02 pool
KVM: VMX: introduce alloc_loaded_vmcs
KVM: VMX: make MSR bitmaps per-VCPU
KVM/x86: Add IBPB support
KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES
KVM/VMX: Allow direct access to MSR_IA32_SPEC_CTRL
KVM/SVM: Allow direct access to MSR_IA32_SPEC_CTRL
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL
x86: reorganize SMAP handling in user space accesses
x86: fix SMAP in 32-bit environments
x86: Introduce __uaccess_begin_nospec() and uaccess_try_nospec
x86/usercopy: Replace open coded stac/clac with __uaccess_{begin, end}
x86/uaccess: Use __uaccess_begin_nospec() and uaccess_try_nospec
x86/bugs, KVM: Support the combination of guest and host IBRS
x86/KVM/VMX: Expose SPEC_CTRL Bit(2) to the guest
KVM: SVM: Move spec control call after restore of GS
x86/bugs, KVM: Extend speculation control for VIRT_SPEC_CTRL
x86/speculation: Use synthetic bits for IBRS/IBPB/STIBP
KVM: SVM: Implement VIRT_SPEC_CTRL support for SSBD
bpf: support 8-byte metafield access
bpf/verifier: Add spi variable to check_stack_write()
bpf/verifier: Pass instruction index to check_mem_access() and check_xadd()
bpf: Prevent memory disambiguation attack
wil6210: missing length check in wmi_set_ie
posix-timers: Sanitize overrun handling
mm/hugetlb.c: don't call region_abort if region_chg fails
hugetlbfs: fix offset overflow in hugetlbfs mmap
hugetlbfs: check for pgoff value overflow
hugetlbfs: fix bug in pgoff overflow checking
swiotlb: clean up reporting
sr: pass down correctly sized SCSI sense buffer
mm: remove write/force parameters from __get_user_pages_locked()
mm: remove write/force parameters from __get_user_pages_unlocked()
mm/nommu.c: Switch __get_user_pages_unlocked() to use __get_user_pages()
mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
mm: replace get_user_pages_locked() write/force parameters with gup_flags
mm: replace get_vaddr_frames() write/force parameters with gup_flags
mm: replace get_user_pages() write/force parameters with gup_flags
mm: replace __access_remote_vm() write parameter with gup_flags
mm: replace access_remote_vm() write parameter with gup_flags
proc: don't use FOLL_FORCE for reading cmdline and environment
proc: do not access cmdline nor environ from file-backed areas
media: dvb-frontends: fix i2c access helpers for KASAN
matroxfb: fix size of memcpy
staging: speakup: Replace strncpy with memcpy
rocker: fix rocker_tlv_put_* functions for KASAN
selftests: Move networking/timestamping from Documentation
Linux 4.4.168
Change-Id: Icd04a723739ae5e38258a2f6b0aee875f306a0bc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 7f7ccc2ccc2e70c6054685f5e3522efa81556830 upstream.
proc_pid_cmdline_read() and environ_read() directly access the target
process' VM to retrieve the command line and environment. If this
process remaps these areas onto a file via mmap(), the requesting
process may experience various issues such as extra delays if the
underlying device is slow to respond.
Let's simply refuse to access file-backed areas in these functions.
For this we add a new FOLL_ANON gup flag that is passed to all calls
to access_remote_vm(). The code already takes care of such failures
(including unmapped areas). Accesses via /proc/pid/mem were not
changed though.
This was assigned CVE-2018-1120.
Note for stable backports: the patch may apply to kernels prior to 4.11
but silently miss one location; it must be checked that no call to
access_remote_vm() keeps zero as the last argument.
Reported-by: Qualys Security Advisory <qsa@qualys.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
- Update the extra call to access_remote_vm() from proc_pid_cmdline_read()
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 272ddc8b37354c3fe111ab26d25e792629148eee upstream.
Now that Lorenzo cleaned things up and made the FOLL_FORCE users
explicit, it becomes obvious how some of them don't really need
FOLL_FORCE at all.
So remove FOLL_FORCE from the proc code that reads the command line and
arguments from user space.
The mem_rw() function actually does want FOLL_FORCE, because gdd (and
possibly many other debuggers) use it as a much more convenient version
of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
conditional on actually being a ptracer. This does not actually do
that, just moves adds a comment to that effect and moves the gup_flags
settings next to each other.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6347e8d5bcce33fc36e651901efefbe2c93a43ef upstream.
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.
We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 768ae309a96103ed02eb1e111e838c87854d8b51 upstream.
This removes the 'write' and 'force' from get_user_pages() and replaces
them with 'gup_flags' to make the use of FOLL_FORCE explicit in callers
as use of this flag can result in surprising behaviour (and hence bugs)
within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4:
- Drop changes in rapidio, vchiq, goldfish
- Keep the "write" variable in amdgpu_ttm_tt_pin_userptr() as it's still
needed
- Also update calls from various other places that now use
get_user_pages_remote() upstream, which were updated there by commit
9beae1ea8930 "mm: replace get_user_pages_remote() write/force ..."
- Also update calls from hfi1 and ipath
- Adjust context]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5df63c2a149ae65a9ec239e7c2af44efa6f79beb upstream.
This is a fix for a regression in 32 bit kernels caused by an invalid
check for pgoff overflow in hugetlbfs mmap setup. The check incorrectly
specified that the size of a loff_t was the same as the size of a long.
The regression prevents mapping hugetlbfs files at offsets greater than
4GB on 32 bit kernels.
On 32 bit kernels conversion from a page based unsigned long can not
overflow a loff_t byte offset. Therefore, skip this check if
sizeof(unsigned long) != sizeof(loff_t).
Link: http://lkml.kernel.org/r/20180330145402.5053-1-mike.kravetz@oracle.com
Fixes: 63489f8e8211 ("hugetlbfs: check for pgoff value overflow")
Reported-by: Dan Rue <dan.rue@linaro.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Anders Roxell <anders.roxell@linaro.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Nic Losby <blurbdust@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 63489f8e821144000e0bdca7e65a8d1cc23a7ee7 upstream.
A vma with vm_pgoff large enough to overflow a loff_t type when
converted to a byte offset can be passed via the remap_file_pages system
call. The hugetlbfs mmap routine uses the byte offset to calculate
reservations and file size.
A sequence such as:
mmap(0x20a00000, 0x600000, 0, 0x66033, -1, 0);
remap_file_pages(0x20a00000, 0x600000, 0, 0x20000000000000, 0);
will result in the following when task exits/file closed,
kernel BUG at mm/hugetlb.c:749!
Call Trace:
hugetlbfs_evict_inode+0x2f/0x40
evict+0xcb/0x190
__dentry_kill+0xcb/0x150
__fput+0x164/0x1e0
task_work_run+0x84/0xa0
exit_to_usermode_loop+0x7d/0x80
do_syscall_64+0x18b/0x190
entry_SYSCALL_64_after_hwframe+0x3d/0xa2
The overflowed pgoff value causes hugetlbfs to try to set up a mapping
with a negative range (end < start) that leaves invalid state which
causes the BUG.
The previous overflow fix to this code was incomplete and did not take
the remap_file_pages system call into account.
[mike.kravetz@oracle.com: v3]
Link: http://lkml.kernel.org/r/20180309002726.7248-1-mike.kravetz@oracle.com
[akpm@linux-foundation.org: include mmdebug.h]
[akpm@linux-foundation.org: fix -ve left shift count on sh]
Link: http://lkml.kernel.org/r/20180308210502.15952-1-mike.kravetz@oracle.com
Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap")
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Nic Losby <blurbdust@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Yisheng Xie <xieyisheng1@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[bwh: Backported to 4.4: Use a conditional WARN() instead of VM_WARN()]
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 045c7a3f53d9403b62d396b6d051c4be5044cdb4 upstream.
If mmap() maps a file, it can be passed an offset into the file at which
the mapping is to start. Offset could be a negative value when
represented as a loff_t. The offset plus length will be used to update
the file size (i_size) which is also a loff_t.
Validate the value of offset and offset + length to make sure they do
not overflow and appear as negative.
Found by syzcaller with commit ff8c0c53c475 ("mm/hugetlb.c: don't call
region_abort if region_chg fails") applied. Prior to this commit, the
overflow would still occur but we would luckily return ENOMEM.
To reproduce:
mmap(0, 0x2000, 0, 0x40021, 0xffffffffffffffffULL, 0x8000000000000000ULL);
Resulted in,
kernel BUG at mm/hugetlb.c:742!
Call Trace:
hugetlbfs_evict_inode+0x80/0xa0
evict+0x24a/0x620
iput+0x48f/0x8c0
dentry_unlink_inode+0x31f/0x4d0
__dentry_kill+0x292/0x5e0
dput+0x730/0x830
__fput+0x438/0x720
____fput+0x1a/0x20
task_work_run+0xfe/0x180
exit_to_usermode_loop+0x133/0x150
syscall_return_slowpath+0x184/0x1c0
entry_SYSCALL_64_fastpath+0xab/0xad
Fixes: ff8c0c53c475 ("mm/hugetlb.c: don't call region_abort if region_chg fails")
Link: http://lkml.kernel.org/r/1491951118-30678-1-git-send-email-mike.kravetz@oracle.com
Reported-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Acked-by: Hillf Danton <hillf.zj@alibaba-inc.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 70ad35db3321a6d129245979de4ac9d06eed897c ]
Maybe I'm missing something, but I don't know why it needs to copy the
input buffer to psinfo->buf and then write. Instead we can write the
input buffer directly. The only implementation that supports console
message (i.e. ramoops) already does it for ftrace messages.
For the upcoming virtio backend driver, it needs to protect psinfo->buf
overwritten from console messages. If it could use ->write_buf method
instead of ->write, the problem will be solved easily.
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 164f7e586739d07eb56af6f6d66acebb11f315c8 ]
ocfs2_get_dentry() calls iput(inode) to drop the reference count of
inode, and if the reference count hits 0, inode is freed. However, in
this function, it then reads inode->i_generation, which may result in a
use after free bug. Move the put operation later.
Link: http://lkml.kernel.org/r/1543109237-110227-1-git-send-email-bianpan2016@163.com
Fixes: 781f200cb7a("ocfs2: Remove masklog ML_EXPORT.")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Cc: Changwei Ge <ge.changwei@h3c.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c7d7d620dcbd2a1c595092280ca943f2fced7bbd ]
hfs_bmap_free() frees node via hfs_bnode_put(node). However it then
reads node->this when dumping error message on an error path, which may
result in a use-after-free bug. This patch frees node only when it is
never used.
Link: http://lkml.kernel.org/r/1543053441-66942-1-git-send-email-bianpan2016@163.com
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ce96a407adef126870b3f4a1b73529dd8aa80f49 ]
hfs_bmap_free() frees the node via hfs_bnode_put(node). However, it
then reads node->this when dumping error message on an error path, which
may result in a use-after-free bug. This patch frees the node only when
it is never again used.
Link: http://lkml.kernel.org/r/1542963889-128825-1-git-send-email-bianpan2016@163.com
Fixes: a1185ffa2fc ("HFS rewrite")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Joe Perches <joe@perches.com>
Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com>
Cc: Viacheslav Dubeyko <slava@dubeyko.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit e21e57445a64598b29a6f629688f9b9a39e7242a ]
ocfs2_defrag_extent may fall into deadlock.
ocfs2_ioctl_move_extents
ocfs2_ioctl_move_extents
ocfs2_move_extents
ocfs2_defrag_extent
ocfs2_lock_allocators_move_extents
ocfs2_reserve_clusters
inode_lock GLOBAL_BITMAP_SYSTEM_INODE
__ocfs2_flush_truncate_log
inode_lock GLOBAL_BITMAP_SYSTEM_INODE
As backtrace shows above, ocfs2_reserve_clusters() will call inode_lock
against the global bitmap if local allocator has not sufficient cluters.
Once global bitmap could meet the demand, ocfs2_reserve_cluster will
return success with global bitmap locked.
After ocfs2_reserve_cluster(), if truncate log is full,
__ocfs2_flush_truncate_log() will definitely fall into deadlock because
it needs to inode_lock global bitmap, which has already been locked.
To fix this bug, we could remove from
ocfs2_lock_allocators_move_extents() the code which intends to lock
global allocator, and put the removed code after
__ocfs2_flush_truncate_log().
ocfs2_lock_allocators_move_extents() is referred by 2 places, one is
here, the other does not need the data allocator context, which means
this patch does not affect the caller so far.
Link: http://lkml.kernel.org/r/20181101071422.14470-1-lchen@suse.com
Signed-off-by: Larry Chen <lchen@suse.com>
Reviewed-by: Changwei Ge <ge.changwei@h3c.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <jiangqi903@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 31ffa563833576bd49a8bf53120568312755e6e2 ]
Variable 'cache' is being assigned but is never used hence it is
redundant and can be removed.
Cleans up clang warning:
warning: variable 'cache' set but not used [-Wunused-but-set-variable]
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c5a94f434c82529afda290df3235e4d85873c5b4 ]
It was observed that a process blocked indefintely in
__fscache_read_or_alloc_page(), waiting for FSCACHE_COOKIE_LOOKING_UP
to be cleared via fscache_wait_for_deferred_lookup().
At this time, ->backing_objects was empty, which would normaly prevent
__fscache_read_or_alloc_page() from getting to the point of waiting.
This implies that ->backing_objects was cleared *after*
__fscache_read_or_alloc_page was was entered.
When an object is "killed" and then "dropped",
FSCACHE_COOKIE_LOOKING_UP is cleared in fscache_lookup_failure(), then
KILL_OBJECT and DROP_OBJECT are "called" and only in DROP_OBJECT is
->backing_objects cleared. This leaves a window where
something else can set FSCACHE_COOKIE_LOOKING_UP and
__fscache_read_or_alloc_page() can start waiting, before
->backing_objects is cleared
There is some uncertainty in this analysis, but it seems to be fit the
observations. Adding the wake in this patch will be handled correctly
by __fscache_read_or_alloc_page(), as it checks if ->backing_objects
is empty again, after waiting.
Customer which reported the hang, also report that the hang cannot be
reproduced with this fix.
The backtrace for the blocked process looked like:
PID: 29360 TASK: ffff881ff2ac0f80 CPU: 3 COMMAND: "zsh"
#0 [ffff881ff43efbf8] schedule at ffffffff815e56f1
#1 [ffff881ff43efc58] bit_wait at ffffffff815e64ed
#2 [ffff881ff43efc68] __wait_on_bit at ffffffff815e61b8
#3 [ffff881ff43efca0] out_of_line_wait_on_bit at ffffffff815e625e
#4 [ffff881ff43efd08] fscache_wait_for_deferred_lookup at ffffffffa04f2e8f [fscache]
#5 [ffff881ff43efd18] __fscache_read_or_alloc_page at ffffffffa04f2ffe [fscache]
#6 [ffff881ff43efd58] __nfs_readpage_from_fscache at ffffffffa0679668 [nfs]
#7 [ffff881ff43efd78] nfs_readpage at ffffffffa067092b [nfs]
#8 [ffff881ff43efda0] generic_file_read_iter at ffffffff81187a73
#9 [ffff881ff43efe50] nfs_file_read at ffffffffa066544b [nfs]
#10 [ffff881ff43efe70] __vfs_read at ffffffff811fc756
#11 [ffff881ff43efee8] vfs_read at ffffffff811fccfa
#12 [ffff881ff43eff18] sys_read at ffffffff811fda62
#13 [ffff881ff43eff50] entry_SYSCALL_64_fastpath at ffffffff815e986e
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2084ac6c505a58f7efdec13eba633c6aaa085ca5 ]
The function dentry_connected calls dput(dentry) to drop the previously
acquired reference to dentry. In this case, dentry can be released.
After that, IS_ROOT(dentry) checks the condition
(dentry == dentry->d_parent), which may result in a use-after-free bug.
This patch directly compares dentry with its parent obtained before
dropping the reference.
Fixes: a056cc8934c("exportfs: stop retrying once we race with
rename/remove")
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a4390aee72713d9e73f1132bcdeb17d72fbbf974 ]
When doing an incremental send, due to the need of delaying directory move
(rename) operations we can end up in infinite loop at
apply_children_dir_moves().
An example scenario that triggers this problem is described below, where
directory names correspond to the numbers of their respective inodes.
Parent snapshot:
.
|--- 261/
|--- 271/
|--- 266/
|--- 259/
|--- 260/
| |--- 267
|
|--- 264/
| |--- 258/
| |--- 257/
|
|--- 265/
|--- 268/
|--- 269/
| |--- 262/
|
|--- 270/
|--- 272/
| |--- 263/
| |--- 275/
|
|--- 274/
|--- 273/
Send snapshot:
.
|-- 275/
|-- 274/
|-- 273/
|-- 262/
|-- 269/
|-- 258/
|-- 271/
|-- 268/
|-- 267/
|-- 270/
|-- 259/
| |-- 265/
|
|-- 272/
|-- 257/
|-- 260/
|-- 264/
|-- 263/
|-- 261/
|-- 266/
When processing inode 257 we delay its move (rename) operation because its
new parent in the send snapshot, inode 272, was not yet processed. Then
when processing inode 272, we delay the move operation for that inode
because inode 274 is its ancestor in the send snapshot. Finally we delay
the move operation for inode 274 when processing it because inode 275 is
its new parent in the send snapshot and was not yet moved.
When finishing processing inode 275, we start to do the move operations
that were previously delayed (at apply_children_dir_moves()), resulting in
the following iterations:
1) We issue the move operation for inode 274;
2) Because inode 262 depended on the move operation of inode 274 (it was
delayed because 274 is its ancestor in the send snapshot), we issue the
move operation for inode 262;
3) We issue the move operation for inode 272, because it was delayed by
inode 274 too (ancestor of 272 in the send snapshot);
4) We issue the move operation for inode 269 (it was delayed by 262);
5) We issue the move operation for inode 257 (it was delayed by 272);
6) We issue the move operation for inode 260 (it was delayed by 272);
7) We issue the move operation for inode 258 (it was delayed by 269);
8) We issue the move operation for inode 264 (it was delayed by 257);
9) We issue the move operation for inode 271 (it was delayed by 258);
10) We issue the move operation for inode 263 (it was delayed by 264);
11) We issue the move operation for inode 268 (it was delayed by 271);
12) We verify if we can issue the move operation for inode 270 (it was
delayed by 271). We detect a path loop in the current state, because
inode 267 needs to be moved first before we can issue the move
operation for inode 270. So we delay again the move operation for
inode 270, this time we will attempt to do it after inode 267 is
moved;
13) We issue the move operation for inode 261 (it was delayed by 263);
14) We verify if we can issue the move operation for inode 266 (it was
delayed by 263). We detect a path loop in the current state, because
inode 270 needs to be moved first before we can issue the move
operation for inode 266. So we delay again the move operation for
inode 266, this time we will attempt to do it after inode 270 is
moved (its move operation was delayed in step 12);
15) We issue the move operation for inode 267 (it was delayed by 268);
16) We verify if we can issue the move operation for inode 266 (it was
delayed by 270). We detect a path loop in the current state, because
inode 270 needs to be moved first before we can issue the move
operation for inode 266. So we delay again the move operation for
inode 266, this time we will attempt to do it after inode 270 is
moved (its move operation was delayed in step 12). So here we added
again the same delayed move operation that we added in step 14;
17) We attempt again to see if we can issue the move operation for inode
266, and as in step 16, we realize we can not due to a path loop in
the current state due to a dependency on inode 270. Again we delay
inode's 266 rename to happen after inode's 270 move operation, adding
the same dependency to the empty stack that we did in steps 14 and 16.
The next iteration will pick the same move dependency on the stack
(the only entry) and realize again there is still a path loop and then
again the same dependency to the stack, over and over, resulting in
an infinite loop.
So fix this by preventing adding the same move dependency entries to the
stack by removing each pending move record from the red black tree of
pending moves. This way the next call to get_pending_dir_moves() will
not return anything for the current parent inode.
A test case for fstests, with this reproducer, follows soon.
Signed-off-by: Robbie Ko <robbieko@synology.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
[Wrote changelog with example and more clear explanation]
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c4b7d1ba7d263b74bb72e9325262a67139605cde ]
Fixes gcc '-Wunused-but-set-variable' warning:
fs/sysv/inode.c: In function '__sysv_write_inode':
fs/sysv/inode.c:239:6: warning:
variable 'err' set but not used [-Wunused-but-set-variable]
__sysv_write_inode should return 'err' instead of 0
Fixes: 05459ca81a ("repair sysv_write_inode(), switch sysv to simple_fsync()")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sasha Levin <sashal@kernel.org>