android_kernel_oneplus_msm8998/fs
Filipe Manana c8421505b8 Btrfs: fix loading of orphan roots leading to BUG_ON
commit 909c3a22da3b8d2cfd3505ca5658f0176859d400 upstream.

When looking for orphan roots during mount we can end up hitting a
BUG_ON() (at root-item.c:btrfs_find_orphan_roots()) if a log tree is
replayed and qgroups are enabled. This is because after a log tree is
replayed, a transaction commit is made, which triggers qgroup extent
accounting which in turn does backref walking which ends up reading and
inserting all roots in the radix tree fs_info->fs_root_radix, including
orphan roots (deleted snapshots). So after the log tree is replayed, when
finding orphan roots we hit the BUG_ON with the following trace:

[118209.182438] ------------[ cut here ]------------
[118209.183279] kernel BUG at fs/btrfs/root-tree.c:314!
[118209.184074] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[118209.185123] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic ppdev xor raid6_pq evdev sg parport_pc parport acpi_cpufreq tpm_tis tpm psmouse
processor i2c_piix4 serio_raw pcspkr i2c_core button loop autofs4 ext4 crc16 mbcache jbd2 sd_mod sr_mod cdrom ata_generic virtio_scsi ata_piix libata
virtio_pci virtio_ring virtio scsi_mod e1000 floppy [last unloaded: btrfs]
[118209.186318] CPU: 14 PID: 28428 Comm: mount Tainted: G        W       4.5.0-rc5-btrfs-next-24+ #1
[118209.186318] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
[118209.186318] task: ffff8801ec131040 ti: ffff8800af34c000 task.ti: ffff8800af34c000
[118209.186318] RIP: 0010:[<ffffffffa04237d7>]  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318] RSP: 0018:ffff8800af34faa8  EFLAGS: 00010246
[118209.186318] RAX: 00000000ffffffef RBX: 00000000ffffffef RCX: 0000000000000001
[118209.186318] RDX: 0000000080000000 RSI: 0000000000000001 RDI: 00000000ffffffff
[118209.186318] RBP: ffff8800af34fb08 R08: 0000000000000001 R09: 0000000000000000
[118209.186318] R10: ffff8800af34f9f0 R11: 6db6db6db6db6db7 R12: ffff880171b97000
[118209.186318] R13: ffff8801ca9d65e0 R14: ffff8800afa2e000 R15: 0000160000000000
[118209.186318] FS:  00007f5bcb914840(0000) GS:ffff88023edc0000(0000) knlGS:0000000000000000
[118209.186318] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[118209.186318] CR2: 00007f5bcaceb5d9 CR3: 00000000b49b5000 CR4: 00000000000006e0
[118209.186318] Stack:
[118209.186318]  fffffbffffffffff 010230ffffffffff 0101000000000000 ff84000000000000
[118209.186318]  fbffffffffffffff 30ffffffffffffff 0000000000000101 ffff880082348000
[118209.186318]  0000000000000000 ffff8800afa2e000 ffff8800afa2e000 0000000000000000
[118209.186318] Call Trace:
[118209.186318]  [<ffffffffa042e2db>] open_ctree+0x1e37/0x21b9 [btrfs]
[118209.186318]  [<ffffffffa040a753>] btrfs_mount+0x97e/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffffa0409f81>] btrfs_mount+0x1ac/0xaed [btrfs]
[118209.186318]  [<ffffffff8108e1c0>] ? trace_hardirqs_on+0xd/0xf
[118209.186318]  [<ffffffff8108c26b>] ? lockdep_init_map+0xb9/0x1b3
[118209.186318]  [<ffffffff8117b87e>] mount_fs+0x67/0x131
[118209.186318]  [<ffffffff81192d2b>] vfs_kern_mount+0x6c/0xde
[118209.186318]  [<ffffffff81195637>] do_mount+0x8a6/0x9e8
[118209.186318]  [<ffffffff8119598d>] SyS_mount+0x77/0x9f
[118209.186318]  [<ffffffff81493017>] entry_SYSCALL_64_fastpath+0x12/0x6b
[118209.186318] Code: 64 00 00 85 c0 89 c3 75 24 f0 41 80 4c 24 20 20 49 8b bc 24 f0 01 00 00 4c 89 e6 e8 e8 65 00 00 85 c0 89 c3 74 11 83 f8 ef 75 02 <0f> 0b
4c 89 e7 e8 da 72 00 00 eb 1c 41 83 bc 24 00 01 00 00 00
[118209.186318] RIP  [<ffffffffa04237d7>] btrfs_find_orphan_roots+0x1fc/0x244 [btrfs]
[118209.186318]  RSP <ffff8800af34faa8>
[118209.230735] ---[ end trace 83938f987d85d477 ]---

So fix this by not treating the error -EEXIST, returned when attempting
to insert a root already inserted by the backref walking code, as an error.

The following test case for xfstests reproduces the bug:

  seq=`basename $0`
  seqres=$RESULT_DIR/$seq
  echo "QA output created by $seq"
  tmp=/tmp/$$
  status=1	# failure is the default!
  trap "_cleanup; exit \$status" 0 1 2 3 15

  _cleanup()
  {
      _cleanup_flakey
      cd /
      rm -f $tmp.*
  }

  # get standard environment, filters and checks
  . ./common/rc
  . ./common/filter
  . ./common/dmflakey

  # real QA test starts here
  _supported_fs btrfs
  _supported_os Linux
  _require_scratch
  _require_dm_target flakey
  _require_metadata_journaling $SCRATCH_DEV

  rm -f $seqres.full

  _scratch_mkfs >>$seqres.full 2>&1
  _init_flakey
  _mount_flakey

  _run_btrfs_util_prog quota enable $SCRATCH_MNT

  # Create 2 directories with one file in one of them.
  # We use these just to trigger a transaction commit later, moving the file from
  # directory a to directory b and doing an fsync against directory a.
  mkdir $SCRATCH_MNT/a
  mkdir $SCRATCH_MNT/b
  touch $SCRATCH_MNT/a/f
  sync

  # Create our test file with 2 4K extents.
  $XFS_IO_PROG -f -s -c "pwrite -S 0xaa 0 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Create a snapshot and delete it. This doesn't really delete the snapshot
  # immediately, just makes it inaccessible and invisible to user space, the
  # snapshot is deleted later by a dedicated kernel thread (cleaner kthread)
  # which is woke up at the next transaction commit.
  # A root orphan item is inserted into the tree of tree roots, so that if a
  # power failure happens before the dedicated kernel thread does the snapshot
  # deletion, the next time the filesystem is mounted it resumes the snapshot
  # deletion.
  _run_btrfs_util_prog subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap
  _run_btrfs_util_prog subvolume delete $SCRATCH_MNT/snap

  # Now overwrite half of the extents we wrote before. Because we made a snapshpot
  # before, which isn't really deleted yet (since no transaction commit happened
  # after we did the snapshot delete request), the non overwritten extents get
  # referenced twice, once by the default subvolume and once by the snapshot.
  $XFS_IO_PROG -c "pwrite -S 0xbb 4K 8K" $SCRATCH_MNT/foobar | _filter_xfs_io

  # Now move file f from directory a to directory b and fsync directory a.
  # The fsync on the directory a triggers a transaction commit (because a file
  # was moved from it to another directory) and the file fsync leaves a log tree
  # with file extent items to replay.
  mv $SCRATCH_MNT/a/f $SCRATCH_MNT/a/b
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/a
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foobar

  echo "File digest before power failure:"
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  # Now simulate a power failure and mount the filesystem to replay the log tree.
  # After the log tree was replayed, we used to hit a BUG_ON() when processing
  # the root orphan item for the deleted snapshot. This is because when processing
  # an orphan root the code expected to be the first code inserting the root into
  # the fs_info->fs_root_radix radix tree, while in reallity it was the second
  # caller attempting to do it - the first caller was the transaction commit that
  # took place after replaying the log tree, when updating the qgroup counters.
  _flakey_drop_and_remount

  echo "File digest before after failure:"
  # Must match what he got before the power failure.
  md5sum $SCRATCH_MNT/foobar | _filter_scratch

  _unmount_flakey
  status=0
  exit

Fixes: 2d9e977610 ("Btrfs: use btrfs_get_fs_root in resolve_indirect_ref")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-09 15:34:53 -08:00
..
9p 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping 2015-12-08 14:51:16 -05:00
adfs
affs fs/affs: make root lookup from blkdev logical size 2015-09-10 13:29:01 -07:00
afs
autofs4
befs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
bfs
btrfs Btrfs: fix loading of orphan roots leading to BUG_ON 2016-03-09 15:34:53 -08:00
cachefiles FS-Cache: Add missing initialization of ret in cachefiles_write_page() 2015-11-16 20:38:43 -05:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2015-11-13 09:24:40 -08:00
cifs Fix cifs_uniqueid_to_ino_t() function for s390x 2016-03-09 15:34:50 -08:00
coda fs/coda: fix readlink buffer overflow 2015-09-10 13:29:01 -07:00
configfs configfs: allow dynamic group creation 2015-11-20 16:17:32 -08:00
cramfs
debugfs debugfs: fix refcount imbalance in start_creating 2015-11-11 02:04:44 -05:00
devpts pty: make sure super_block is still valid in final /dev/tty close 2016-02-25 12:01:14 -08:00
dlm net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2015-11-07 13:05:44 -08:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs osd fs: __r4w_get_page rely on PageUptodate for uptodate 2015-12-12 10:15:34 -08:00
exportfs
ext2 ext2, ext4: warn when mounting with dax enabled 2015-11-16 09:43:54 -08:00
ext4 ext4: fix bh->b_state corruption 2016-03-03 15:07:08 -08:00
f2fs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
fat fat: fix fake_offset handling on error path 2015-11-20 16:17:32 -08:00
freevxfs freevxfs: Grammar s/an negative/a negative/ 2015-08-07 13:59:24 +02:00
fscache FS-Cache: Handle a write to the page immediately beyond the EOF marker 2015-11-11 02:11:02 -05:00
fuse Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse 2015-12-11 10:56:41 -08:00
gfs2 Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
hfs hfs: fix B-tree corruption after insertion at position 0 2015-09-10 13:29:01 -07:00
hfsplus xattr handlers: Pass handler to operations instead of flags 2015-11-13 20:34:32 -05:00
hostfs uml: fix hostfs mknod() 2016-03-03 15:07:12 -08:00
hpfs hpfs: don't truncate the file when delete fails 2016-03-03 15:07:30 -08:00
hugetlbfs fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list() 2016-02-25 12:01:22 -08:00
isofs
jbd2 Ext4 bug fixes for v4.4, including fixes for post-2038 time encodings, 2015-12-07 10:25:00 -08:00
jffs2 xattr handlers: Pass handler to operations instead of flags 2015-11-13 20:34:32 -05:00
jfs fs/jfs: remove unnecessary new_valid_dev() checks 2015-11-09 15:11:24 -08:00
kernfs kernfs: implement kernfs_path_len() 2015-08-18 15:49:15 -07:00
lockd Mainly smaller bugfixes and cleanup. We're still finding some bugs from 2015-11-11 20:11:28 -08:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-07-04 19:36:06 -07:00
ncpfs ncpfs: don't allow negative timeouts 2015-11-20 16:17:32 -08:00
nfs NFSv4: Fix a dentry leak on alias use 2016-03-03 15:07:28 -08:00
nfs_common lockd: NLM grace period shouldn't block NFSv4 opens 2015-08-13 10:22:06 -04:00
nfsd nfsd: don't hold ls_mutex across a layout recall 2015-12-16 11:49:58 -05:00
nilfs2 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-11 09:45:24 -08:00
nls
notify inotify: actually check for invalid bits in sys_inotify_add_watch() 2015-11-05 19:34:48 -08:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 ocfs2: unlock inode if deleting inode from orphan fails 2016-03-03 15:07:10 -08:00
omfs
openpromfs
overlayfs ovl: setattr: check permissions before copy-up 2016-02-25 12:01:24 -08:00
proc numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390 2016-02-25 12:01:22 -08:00
pstore pstore: fix code comment to match code 2015-11-02 13:41:52 -08:00
qnx4
qnx6
quota Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-09-05 20:34:28 -07:00
ramfs mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
reiserfs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
romfs
squashfs squashfs: xattr simplifications 2015-11-13 20:34:33 -05:00
sysfs platform/chrome: Branch for v4.4 2015-11-13 21:53:18 -08:00
sysv fix sysvfs symlinks 2015-11-23 21:11:08 -05:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
udf udf: Check output buffer length when converting name to CS0 2016-02-25 12:01:18 -08:00
ufs fix ufs write vs readpage race when writing into a hole 2015-09-09 10:43:12 -07:00
xfs xfs: log mount failures don't wait for buffers to be released 2016-02-25 12:01:24 -08:00
aio.c mm: move ->mremap() from file_operations to vm_operations_struct 2015-09-04 16:54:41 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-11 09:45:24 -08:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: detach bdev inode from its wb in __blkdev_put() 2015-12-04 11:02:17 -07:00
buffer.c vfs: remove unused wrapper block_page_mkwrite() 2015-11-11 02:19:33 -05:00
char_dev.c fs/char_dev.c: fix incorrect documentation for unregister_chrdev_region 2015-08-05 13:49:35 -07:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c i2c-dev: Fix typo in ioctl name reference 2015-10-23 23:26:43 +02:00
coredump.c coredump: change zap_threads() and zap_process() to use for_each_thread() 2015-11-06 17:50:42 -08:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c use ->d_seq to get coherency between ->d_inode and ->d_flags 2016-03-09 15:34:49 -08:00
dcookies.c
direct-io.c block: fix use-after-free in dio_bio_complete 2016-03-03 15:07:28 -08:00
drop_caches.c inode: convert inode_sb_list_lock to per-sb 2015-08-17 18:39:46 -04:00
eventfd.c
eventpoll.c
exec.c vfs: Commit to never having exectuables on proc and sysfs. 2015-07-10 10:39:25 -05:00
fcntl.c
fhandle.c
file.c vfs: clear remainder of 'full_fds_bits' in dup_fd() 2015-11-05 23:05:32 -08:00
file_table.c fs, file table: reinit files_stat.max_files after deferred memory initialisation 2015-08-07 04:39:40 +03:00
filesystems.c
fs-writeback.c writeback: flush inode cgroup wb switches instead of pinning super_block 2016-03-09 15:34:52 -08:00
fs_pin.c
fs_struct.c
inode.c fs: fix inode.c kernel-doc warning 2015-11-11 02:18:27 -05:00
internal.h inode: rename i_wb_list to i_io_list 2015-08-17 23:38:10 -04:00
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c fs: Set the size of empty dirs to 0. 2015-08-12 15:28:45 -05:00
locks.c locks: fix unlock when fcntl_setlk races with a close 2016-03-03 15:07:12 -08:00
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
mbcache.c
mount.h
mpage.c mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
namei.c do_last(): ELOOP failure exit should be done after leaving RCU mode 2016-03-03 15:07:30 -08:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-09-01 16:13:25 -07:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c vfs: Commit to never having exectuables on proc and sysfs. 2015-07-10 10:39:25 -05:00
pipe.c fs/pipe.c: return error code rather than 0 in pipe_write() 2015-11-11 02:18:26 -05:00
pnode.c
pnode.h mnt: Clarify and correct the disconnect logic in umount_tree 2015-07-22 20:33:27 -05:00
posix_acl.c xattr handlers: Pass handler to operations instead of flags 2015-11-13 20:34:32 -05:00
proc_namespace.c
read_write.c
readdir.c
select.c
seq_file.c fs, seqfile: always allow oom killer 2015-11-06 17:50:42 -08:00
signalfd.c signalfd: fix information leak in signalfd_copyinfo 2015-08-07 04:39:40 +03:00
splice.c vfs: Avoid softlockups with sendfile(2) 2015-11-23 21:15:30 -05:00
stack.c
stat.c fs/stat.c: remove unnecessary new_valid_dev() check 2015-11-09 15:11:24 -08:00
statfs.c
super.c writeback: flush inode cgroup wb switches instead of pinning super_block 2016-03-09 15:34:52 -08:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-02-25 12:01:25 -08:00
userfaultfd.c userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" 2015-09-22 15:09:53 -07:00
utimes.c
xattr.c 9p: xattr simplifications 2015-11-13 20:34:33 -05:00