android_kernel_oneplus_msm8998/fs
Zygo Blaxell f9b4a2e04c btrfs: add missing memset while reading compressed inline extents
[ Upstream commit e1699d2d7bf6e6cce3e1baff19f9dd4595a58664 ]

This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b978188c ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d5750 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a418 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

        item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
                inode generation 50 transid 50 size 47424 nbytes 49141
                block group 0 mode 100644 links 1 uid 0 gid 0
                rdev 0 flags 0x0(none)
        item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
                inode ref index 3 namelen 10 name: DB_File.so
        item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
                inline extent data size 1341 ram 4085 compress(zlib)
        item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
                extent data disk byte 5367308288 nr 20480
                extent data offset 0 nr 45056 ram 45056
                extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating the same inline extent and hole pattern:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

	# touch foo
	# chattr +c foo
	# xfs_io -f -c "pwrite -W 0 1000" foo
	# xfs_io -f -c "falloc 4 8188" foo
	# od -x foo
	# echo 3 >/proc/sys/vm/drop_caches
	# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed
by zeros:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
	0001760 0000 0000 0000 0000 0000 0000 0000 0000
	*
	0020000

Actual output:  the data after the first 1000 bytes
will be different each run:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
	0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
	0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
	(...)

Signed-off-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Reviewed-by: Chris Mason <clm@fb.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:57 +01:00
..
9p fs/9p: Compare qid.path in v9fs_test_inode 2017-11-30 08:37:22 +00:00
adfs
affs affs: fix remount failure when there are no options changed 2016-06-07 18:14:32 -07:00
afs afs: Fix afs_kill_pages() 2017-12-20 10:04:56 +01:00
autofs4 autofs: fix careless error in recent commit 2017-12-20 10:04:51 +01:00
befs
bfs
btrfs btrfs: add missing memset while reading compressed inline extents 2017-12-20 10:04:57 +01:00
cachefiles FS-Cache: Add missing initialization of ret in cachefiles_write_page() 2015-11-16 20:38:43 -05:00
ceph ceph: drop negative child dentries before try pruning inode's alias 2017-12-20 10:04:52 +01:00
cifs cifs: check MaxPathNameComponentLength != 0 before using it 2017-11-08 10:06:27 +01:00
coda coda: fix 'kernel memory exposure attempt' in fsync 2017-11-24 08:32:25 +01:00
configfs configfs: Fix race between create_link and configfs_rmdir 2017-06-26 07:13:08 +02:00
cramfs
debugfs dentry name snapshots 2017-08-06 19:19:42 -07:00
devpts devpts: clean up interface to pty drivers 2016-08-16 09:30:49 +02:00
dlm dlm: avoid double-free on error path in dlm_device_{register,unregister} 2017-09-13 14:09:45 -07:00
ecryptfs eCryptfs: use after free in ecryptfs_release_messaging() 2017-11-30 08:37:20 +00:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs osd fs: __r4w_get_page rely on PageUptodate for uptodate 2015-12-12 10:15:34 -08:00
exportfs
ext2 posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
ext4 ext4: fix crash when a directory's i_size is too small 2017-12-20 10:04:52 +01:00
f2fs fscrypto: require write access to mount to set encryption policy 2017-10-27 10:23:18 +02:00
fat fat: fix using uninitialized fields of fat_inode/fsinfo_inode 2017-03-15 09:57:15 +08:00
freevxfs
fscache FS-Cache: fix dereference of NULL user_key_payload 2017-10-27 10:23:18 +02:00
fuse fuse: fix READDIRPLUS skipping an entry 2017-11-02 09:40:49 +01:00
gfs2 GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next 2017-10-08 10:14:16 +02:00
hfs hfs: fix B-tree corruption after insertion at position 0 2015-09-10 13:29:01 -07:00
hfsplus posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
hostfs hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() 2016-09-30 10:18:39 +02:00
hpfs hpfs: implement the show_options method 2016-06-01 12:15:54 -07:00
hugetlbfs mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
isofs isofs: fix timestamps beyond 2027 2017-11-30 08:37:20 +00:00
jbd2 jbd2: don't leak modified metadata buffers on an aborted journal 2017-03-12 06:37:26 +01:00
jffs2 posix_acl: Clear SGID bit when setting file permissions 2016-10-31 04:13:58 -06:00
jfs fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
kernfs kernfs: don't depend on d_find_any_alias() when generating notifications 2016-09-24 10:07:36 +02:00
lockd Mainly smaller bugfixes and cleanup. We're still finding some bugs from 2015-11-11 20:11:28 -08:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix
ncpfs ncpfs: fix a braino in OOM handling in ncp_fill_cache() 2016-03-16 08:42:59 -07:00
nfs NFSv4.1 respect server's max size in CREATE_SESSION 2017-12-20 10:04:57 +01:00
nfs_common
nfsd NFSD: fix nfsd_reset_versions for NFSv4. 2017-12-20 10:04:53 +01:00
nilfs2 nilfs2: fix race condition that causes file system corruption 2017-11-30 08:37:20 +00:00
nls
notify dentry name snapshots 2017-08-06 19:19:42 -07:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 Revert "ocfs2: should wait dio before inode lock in ocfs2_setattr()" 2017-12-09 18:42:43 +01:00
omfs
openpromfs
overlayfs ovl: fsync after copy-up 2016-11-10 16:36:34 +01:00
proc mm: larger stack guard gap, between vmas 2017-06-26 07:13:11 +02:00
pstore pstore: Use dynamic spinlock initializer 2017-08-06 19:19:43 -07:00
qnx4
qnx6
quota quota: Fix possible GPF due to uninitialised pointers 2016-04-12 09:08:56 -07:00
ramfs mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
reiserfs fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-06-17 06:39:38 +02:00
squashfs squashfs: xattr simplifications 2015-11-13 20:34:33 -05:00
sysfs sysfs: be careful of error returns from ops->show() 2017-04-12 12:38:33 +02:00
sysv fix sysvfs symlinks 2015-11-23 21:11:08 -05:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs ubifs: Fix journal replay wrt. xattr nodes 2017-01-26 08:23:48 +01:00
udf udf: Fix deadlock between writeback and udf_setsize() 2017-07-27 15:06:09 -07:00
ufs ufs_getfrag_block(): we only grab ->truncate_mutex on block creation path 2017-06-14 13:16:24 +02:00
xfs xfs: remove kmem_zalloc_greedy 2017-10-08 10:14:20 +02:00
aio.c aio: mark AIO pseudo-fs noexec 2016-10-07 15:23:47 +02:00
anon_inodes.c
attr.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c binfmt_elf: use ELF_ET_DYN_BASE only for PIE 2017-07-21 07:44:57 +02:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c fs/block_dev: always invalidate cleancache in invalidate_bdev() 2017-05-20 14:27:01 +02:00
buffer.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
char_dev.c
compat.c
compat_binfmt_elf.c
compat_ioctl.c i2c-dev: Fix typo in ioctl name reference 2015-10-23 23:26:43 +02:00
coredump.c coredump: Ensure proper size of sparse core files 2017-07-05 14:37:20 +02:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c dentry name snapshots 2017-08-06 19:19:42 -07:00
dcookies.c
direct-io.c direct-io: Prevent NULL pointer access in submit_page_section 2017-10-18 09:20:42 +02:00
drop_caches.c
eventfd.c
eventpoll.c epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() 2017-09-07 08:34:10 +02:00
exec.c exec: Limit arg stack to at most 75% of _STK_LIM 2017-07-21 07:44:57 +02:00
fcntl.c fs: add a VALID_OPEN_FLAGS 2017-07-15 11:57:44 +02:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
file.c vfs: clear remainder of 'full_fds_bits' in dup_fd() 2015-11-05 23:05:32 -08:00
file_table.c
filesystems.c
fs-writeback.c writeback: fix memory leak in wb_queue_work() 2017-12-20 10:04:54 +01:00
fs_pin.c
fs_struct.c
inode.c vfs: fix deadlock in file_remove_privs() on overlayfs 2016-08-10 11:49:30 +02:00
internal.h
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c
locks.c locks: use file_inode() 2016-08-10 11:49:27 +02:00
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
mbcache.c
mount.h mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
mpage.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
namei.c dentry name snapshots 2017-08-06 19:19:42 -07:00
namespace.c mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c fs: completely ignore unknown open flags 2017-07-15 11:57:44 +02:00
pipe.c pipe: limit the per-user amount of pages allocated in pipes 2016-06-07 18:14:35 -07:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-07-21 07:44:58 +02:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2017-04-30 05:49:28 +02:00
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-26 08:23:47 +01:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-04-12 09:08:55 -07:00
read_write.c vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets 2017-10-05 09:41:45 +02:00
readdir.c
select.c
seq_file.c Make file credentials available to the seqfile interfaces 2017-08-06 19:19:42 -07:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-23 17:43:09 +01:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-14 13:16:24 +02:00
statfs.c
super.c fs/super.c: fix race between freeze_super() and thaw_super() 2016-10-28 03:01:32 -04:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-05-08 07:46:01 +02:00
userfaultfd.c userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE 2017-12-20 10:04:53 +01:00
utimes.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
xattr.c lsm: fix smack_inode_removexattr and xattr_getsecurity memleak 2017-10-12 11:27:32 +02:00