android_kernel_oneplus_msm8998/fs
Filipe Manana 1653a3b0e9 Btrfs: fix file/data loss caused by fsync after rename and new inode
commit 56f23fdbb600e6087db7b009775b95ce07cc3195 upstream.

If we rename an inode A (be it a file or a directory), create a new
inode B with the old name of inode A and under the same parent directory,
fsync inode B and then power fail, at log tree replay time we end up
removing inode A completely. If inode A is a directory then all its files
are gone too.

Example scenarios where this happens:
This is reproducible with the following steps, taken from a couple of
test cases written for fstests which are going to be submitted upstream
soon:

   # Scenario 1

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir -p /mnt/a/x
   echo "hello" > /mnt/a/x/foo
   echo "world" > /mnt/a/x/bar
   sync
   mv /mnt/a/x /mnt/a/y
   mkdir /mnt/a/x
   xfs_io -c fsync /mnt/a/x
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and
   the directory "y" does not exist nor do the files "foo" and
   "bar" exist anywhere (neither in "y" nor in "x", nor the root
   nor anywhere).

   # Scenario 2

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/a
   echo "hello" > /mnt/a/foo
   sync
   mv /mnt/a/foo /mnt/a/bar
   echo "world" > /mnt/a/foo
   xfs_io -c fsync /mnt/a/foo
   <power failure happens>

   The next time the fs is mounted, log tree replay happens and the
   file "bar" does not exists anymore. A file with the name "foo"
   exists and it matches the second file we created.

Another related problem that does not involve file/data loss is when a
new inode is created with the name of a deleted snapshot and we fsync it:

   mkfs.btrfs -f /dev/sdc
   mount /dev/sdc /mnt
   mkdir /mnt/testdir
   btrfs subvolume snapshot /mnt /mnt/testdir/snap
   btrfs subvolume delete /mnt/testdir/snap
   rmdir /mnt/testdir
   mkdir /mnt/testdir
   xfs_io -c fsync /mnt/testdir # or fsync some file inside /mnt/testdir
   <power failure>

   The next time the fs is mounted the log replay procedure fails because
   it attempts to delete the snapshot entry (which has dir item key type
   of BTRFS_ROOT_ITEM_KEY) as if it were a regular (non-root) entry,
   resulting in the following error that causes mount to fail:

   [52174.510532] BTRFS info (device dm-0): failed to delete reference to snap, inode 257 parent 257
   [52174.512570] ------------[ cut here ]------------
   [52174.513278] WARNING: CPU: 12 PID: 28024 at fs/btrfs/inode.c:3986 __btrfs_unlink_inode+0x178/0x351 [btrfs]()
   [52174.514681] BTRFS: Transaction aborted (error -2)
   [52174.515630] Modules linked in: btrfs dm_flakey dm_mod overlay crc32c_generic ppdev xor raid6_pq acpi_cpufreq parport_pc tpm_tis sg parport tpm evdev i2c_piix4 proc
   [52174.521568] CPU: 12 PID: 28024 Comm: mount Tainted: G        W       4.5.0-rc6-btrfs-next-27+ #1
   [52174.522805] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS by qemu-project.org 04/01/2014
   [52174.524053]  0000000000000000 ffff8801df2a7710 ffffffff81264e93 ffff8801df2a7758
   [52174.524053]  0000000000000009 ffff8801df2a7748 ffffffff81051618 ffffffffa03591cd
   [52174.524053]  00000000fffffffe ffff88015e6e5000 ffff88016dbc3c88 ffff88016dbc3c88
   [52174.524053] Call Trace:
   [52174.524053]  [<ffffffff81264e93>] dump_stack+0x67/0x90
   [52174.524053]  [<ffffffff81051618>] warn_slowpath_common+0x99/0xb2
   [52174.524053]  [<ffffffffa03591cd>] ? __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff81051679>] warn_slowpath_fmt+0x48/0x50
   [52174.524053]  [<ffffffffa03591cd>] __btrfs_unlink_inode+0x178/0x351 [btrfs]
   [52174.524053]  [<ffffffff8118f5e9>] ? iput+0xb0/0x284
   [52174.524053]  [<ffffffffa0359fe8>] btrfs_unlink_inode+0x1c/0x3d [btrfs]
   [52174.524053]  [<ffffffffa038631e>] check_item_in_log+0x1fe/0x29b [btrfs]
   [52174.524053]  [<ffffffffa0386522>] replay_dir_deletes+0x167/0x1cf [btrfs]
   [52174.524053]  [<ffffffffa038739e>] fixup_inode_link_count+0x289/0x2aa [btrfs]
   [52174.524053]  [<ffffffffa038748a>] fixup_inode_link_counts+0xcb/0x105 [btrfs]
   [52174.524053]  [<ffffffffa038a5ec>] btrfs_recover_log_trees+0x258/0x32c [btrfs]
   [52174.524053]  [<ffffffffa03885b2>] ? replay_one_extent+0x511/0x511 [btrfs]
   [52174.524053]  [<ffffffffa034f288>] open_ctree+0x1dd4/0x21b9 [btrfs]
   [52174.524053]  [<ffffffffa032b753>] btrfs_mount+0x97e/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffffa032af81>] btrfs_mount+0x1ac/0xaed [btrfs]
   [52174.524053]  [<ffffffff8108e1b7>] ? trace_hardirqs_on+0xd/0xf
   [52174.524053]  [<ffffffff8108c262>] ? lockdep_init_map+0xb9/0x1b3
   [52174.524053]  [<ffffffff8117bafa>] mount_fs+0x67/0x131
   [52174.524053]  [<ffffffff81193003>] vfs_kern_mount+0x6c/0xde
   [52174.524053]  [<ffffffff8119590f>] do_mount+0x8a6/0x9e8
   [52174.524053]  [<ffffffff811358dd>] ? strndup_user+0x3f/0x59
   [52174.524053]  [<ffffffff81195c65>] SyS_mount+0x77/0x9f
   [52174.524053]  [<ffffffff814935d7>] entry_SYSCALL_64_fastpath+0x12/0x6b
   [52174.561288] ---[ end trace 6b53049efb1a3ea6 ]---

Fix this by forcing a transaction commit when such cases happen.
This means we check in the commit root of the subvolume tree if there
was any other inode with the same reference when the inode we are
fsync'ing is a new inode (created in the current transaction).

Test cases for fstests, covering all the scenarios given above, were
submitted upstream for fstests:

  * fstests: generic test for fsync after renaming directory
    https://patchwork.kernel.org/patch/8694281/

  * fstests: generic test for fsync after renaming file
    https://patchwork.kernel.org/patch/8694301/

  * fstests: add btrfs test for fsync after snapshot deletion
    https://patchwork.kernel.org/patch/8670671/

Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:14 +09:00
..
9p 9p: ->evict_inode() should kick out ->i_data, not ->i_mapping 2015-12-08 14:51:16 -05:00
adfs
affs fs/affs: make root lookup from blkdev logical size 2015-09-10 13:29:01 -07:00
afs
autofs4
befs
bfs
btrfs Btrfs: fix file/data loss caused by fsync after rename and new inode 2016-04-20 15:42:14 +09:00
cachefiles FS-Cache: Add missing initialization of ret in cachefiles_write_page() 2015-11-16 20:38:43 -05:00
ceph Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client 2015-11-13 09:24:40 -08:00
cifs Fix cifs_uniqueid_to_ino_t() function for s390x 2016-03-09 15:34:50 -08:00
coda fs/coda: fix readlink buffer overflow 2015-09-10 13:29:01 -07:00
configfs configfs: allow dynamic group creation 2015-11-20 16:17:32 -08:00
cramfs
debugfs debugfs: fix refcount imbalance in start_creating 2015-11-11 02:04:44 -05:00
devpts pty: make sure super_block is still valid in final /dev/tty close 2016-02-25 12:01:14 -08:00
dlm net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
ecryptfs Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2015-11-07 13:05:44 -08:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs osd fs: __r4w_get_page rely on PageUptodate for uptodate 2015-12-12 10:15:34 -08:00
exportfs
ext2 ext2, ext4: warn when mounting with dax enabled 2015-11-16 09:43:54 -08:00
ext4 ext4: ignore quota mount options if the quota feature is enabled 2016-04-20 15:42:13 +09:00
f2fs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
fat fat: fix fake_offset handling on error path 2015-11-20 16:17:32 -08:00
freevxfs freevxfs: Grammar s/an negative/a negative/ 2015-08-07 13:59:24 +02:00
fscache FS-Cache: Handle a write to the page immediately beyond the EOF marker 2015-11-11 02:11:02 -05:00
fuse fuse: Add reference counting for fuse_io_priv 2016-04-12 09:08:58 -07:00
gfs2 Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
hfs hfs: fix B-tree corruption after insertion at position 0 2015-09-10 13:29:01 -07:00
hfsplus xattr handlers: Pass handler to operations instead of flags 2015-11-13 20:34:32 -05:00
hostfs uml: fix hostfs mknod() 2016-03-03 15:07:12 -08:00
hpfs hpfs: don't truncate the file when delete fails 2016-03-03 15:07:30 -08:00
hugetlbfs fs/hugetlbfs/inode.c: fix bugs in hugetlb_vmtruncate_list() 2016-02-25 12:01:22 -08:00
isofs
jbd2 jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path 2016-04-12 09:08:53 -07:00
jffs2 jffs2: reduce the breakage on recovery from halfway failed rename() 2016-03-16 08:42:58 -07:00
jfs fs/jfs: remove unnecessary new_valid_dev() checks 2015-11-09 15:11:24 -08:00
kernfs kernfs: implement kernfs_path_len() 2015-08-18 15:49:15 -07:00
lockd Mainly smaller bugfixes and cleanup. We're still finding some bugs from 2015-11-11 20:11:28 -08:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix
ncpfs ncpfs: fix a braino in OOM handling in ncp_fill_cache() 2016-03-16 08:42:59 -07:00
nfs nfs: use file_dentry() 2016-04-20 15:42:13 +09:00
nfs_common lockd: NLM grace period shouldn't block NFSv4 opens 2015-08-13 10:22:06 -04:00
nfsd nfsd: fix deadlock secinfo+readdir compound 2016-04-12 09:09:03 -07:00
nilfs2 Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-11 09:45:24 -08:00
nls
notify inotify: actually check for invalid bits in sys_inotify_add_watch() 2015-11-05 19:34:48 -08:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list 2016-04-12 09:09:04 -07:00
omfs
openpromfs
overlayfs fs: add file_dentry() 2016-04-20 15:42:12 +09:00
proc numa: fix /proc/<pid>/numa_maps for hugetlbfs on s390 2016-02-25 12:01:22 -08:00
pstore pstore: fix code comment to match code 2015-11-02 13:41:52 -08:00
qnx4
qnx6
quota quota: Fix possible GPF due to uninitialised pointers 2016-04-12 09:08:56 -07:00
ramfs mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
reiserfs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
romfs
squashfs squashfs: xattr simplifications 2015-11-13 20:34:33 -05:00
sysfs platform/chrome: Branch for v4.4 2015-11-13 21:53:18 -08:00
sysv fix sysvfs symlinks 2015-11-23 21:11:08 -05:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-13 18:02:30 -08:00
udf udf: Check output buffer length when converting name to CS0 2016-02-25 12:01:18 -08:00
ufs fix ufs write vs readpage race when writing into a hole 2015-09-09 10:43:12 -07:00
xfs xfs: fix two memory leaks in xfs_attr_list.c error paths 2016-04-12 09:08:56 -07:00
aio.c mm: move ->mremap() from file_operations to vm_operations_struct 2015-09-04 16:54:41 -07:00
anon_inodes.c
attr.c
bad_inode.c
binfmt_aout.c
binfmt_elf.c Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-11-11 09:45:24 -08:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c
binfmt_script.c
block_dev.c block: detach bdev inode from its wb in __blkdev_put() 2015-12-04 11:02:17 -07:00
buffer.c vfs: remove unused wrapper block_page_mkwrite() 2015-11-11 02:19:33 -05:00
char_dev.c fs/char_dev.c: fix incorrect documentation for unregister_chrdev_region 2015-08-05 13:49:35 -07:00
compat.c
compat_binfmt_elf.c
compat_ioctl.c i2c-dev: Fix typo in ioctl name reference 2015-10-23 23:26:43 +02:00
coredump.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c fs: add file_dentry() 2016-04-20 15:42:12 +09:00
dcookies.c
direct-io.c block: fix use-after-free in dio_bio_complete 2016-03-03 15:07:28 -08:00
drop_caches.c inode: convert inode_sb_list_lock to per-sb 2015-08-17 18:39:46 -04:00
eventfd.c
eventpoll.c
exec.c
fcntl.c
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
file.c vfs: clear remainder of 'full_fds_bits' in dup_fd() 2015-11-05 23:05:32 -08:00
file_table.c fs, file table: reinit files_stat.max_files after deferred memory initialisation 2015-08-07 04:39:40 +03:00
filesystems.c
fs-writeback.c writeback, cgroup: fix use of the wrong bdi_writeback which mismatches the inode 2016-04-12 09:09:04 -07:00
fs_pin.c
fs_struct.c
inode.c fs: fix inode.c kernel-doc warning 2015-11-11 02:18:27 -05:00
internal.h inode: rename i_wb_list to i_io_list 2015-08-17 23:38:10 -04:00
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c fs: Set the size of empty dirs to 0. 2015-08-12 15:28:45 -05:00
locks.c locks: fix unlock when fcntl_setlk races with a close 2016-03-03 15:07:12 -08:00
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
mbcache.c
mount.h
mpage.c mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
namei.c do_last(): ELOOP failure exit should be done after leaving RCU mode 2016-03-03 15:07:30 -08:00
namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2015-09-01 16:13:25 -07:00
no-block.c
nsfs.c fs/seq_file: convert int seq_vprint/seq_printf/etc... returns to void 2015-09-11 15:21:34 -07:00
open.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
pipe.c fs/pipe.c: return error code rather than 0 in pipe_write() 2015-11-11 02:18:26 -05:00
pnode.c
pnode.h mnt: Clarify and correct the disconnect logic in umount_tree 2015-07-22 20:33:27 -05:00
posix_acl.c xattr handlers: Pass handler to operations instead of flags 2015-11-13 20:34:32 -05:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-04-12 09:08:55 -07:00
read_write.c
readdir.c
select.c
seq_file.c fs, seqfile: always allow oom killer 2015-11-06 17:50:42 -08:00
signalfd.c signalfd: fix information leak in signalfd_copyinfo 2015-08-07 04:39:40 +03:00
splice.c splice: handle zero nr_pages in splice_to_pipe() 2016-04-12 09:08:55 -07:00
stack.c
stat.c fs/stat.c: remove unnecessary new_valid_dev() check 2015-11-09 15:11:24 -08:00
statfs.c
super.c writeback: flush inode cgroup wb switches instead of pinning super_block 2016-03-09 15:34:52 -08:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-02-25 12:01:25 -08:00
userfaultfd.c userfaultfd: don't block on the last VM updates at exit time 2016-03-16 08:43:01 -07:00
utimes.c
xattr.c 9p: xattr simplifications 2015-11-13 20:34:33 -05:00