android_kernel_oneplus_msm8998/fs/btrfs
Filipe Manana d63e1c7753 Btrfs: fix assertion failure during fsync and use of stale transaction
commit 410f954cb1d1c79ae485dd83a175f21954fd87cd upstream.

Sometimes when fsync'ing a file we need to log that other inodes exist and
when we need to do that we acquire a reference on the inodes and then drop
that reference using iput() after logging them.

That generally is not a problem except if we end up doing the final iput()
(dropping the last reference) on the inode and that inode has a link count
of 0, which can happen in a very short time window if the logging path
gets a reference on the inode while it's being unlinked.

In that case we end up getting the eviction callback, btrfs_evict_inode(),
invoked through the iput() call chain which needs to drop all of the
inode's items from its subvolume btree, and in order to do that, it needs
to join a transaction at the helper function evict_refill_and_join().
However because the task previously started a transaction at the fsync
handler, btrfs_sync_file(), it has current->journal_info already pointing
to a transaction handle and therefore evict_refill_and_join() will get
that transaction handle from btrfs_join_transaction(). From this point on,
two different problems can happen:

1) evict_refill_and_join() will often change the transaction handle's
   block reserve (->block_rsv) and set its ->bytes_reserved field to a
   value greater than 0. If evict_refill_and_join() never commits the
   transaction, the eviction handler ends up decreasing the reference
   count (->use_count) of the transaction handle through the call to
   btrfs_end_transaction(), and after that point we have a transaction
   handle with a NULL ->block_rsv (which is the value prior to the
   transaction join from evict_refill_and_join()) and a ->bytes_reserved
   value greater than 0. If after the eviction/iput completes the inode
   logging path hits an error or it decides that it must fallback to a
   transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
   non-zero value from btrfs_log_dentry_safe(), and because of that
   non-zero value it tries to commit the transaction using a handle with
   a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
   the transaction commit hit an assertion failure at
   btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
   the ->block_rsv is NULL. The produced stack trace for that is like the
   following:

   [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
   [192922.917553] ------------[ cut here ]------------
   [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
   [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
   [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
   [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
   [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
   (...)
   [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
   [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
   [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
   [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
   [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
   [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
   [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
   [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
   [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
   [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
   [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
   [192922.925105] Call Trace:
   [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
   [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
   [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
   [192922.926731]  do_fsync+0x38/0x60
   [192922.927138]  __x64_sys_fdatasync+0x13/0x20
   [192922.927543]  do_syscall_64+0x60/0x1c0
   [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
   (...)
   [192922.934077] ---[ end trace f00808b12068168f ]---

2) If evict_refill_and_join() decides to commit the transaction, it will
   be able to do it, since the nested transaction join only increments the
   transaction handle's ->use_count reference counter and it does not
   prevent the transaction from getting committed. This means that after
   eviction completes, the fsync logging path will be using a transaction
   handle that refers to an already committed transaction. What happens
   when using such a stale transaction can be unpredictable, we are at
   least having a use-after-free on the transaction handle itself, since
   the transaction commit will call kmem_cache_free() against the handle
   regardless of its ->use_count value, or we can end up silently losing
   all the updates to the log tree after that iput() in the logging path,
   or using a transaction handle that in the meanwhile was allocated to
   another task for a new transaction, etc, pretty much unpredictable
   what can happen.

In order to fix both of them, instead of using iput() during logging, use
btrfs_add_delayed_iput(), so that the logging path of fsync never drops
the last reference on an inode, that step is offloaded to a safe context
(usually the cleaner kthread).

The assertion failure issue was sporadically triggered by the test case
generic/475 from fstests, which loads the dm error target while fsstress
is running, which lead to fsync failing while logging inodes with -EIO
errors and then trying later to commit the transaction, triggering the
assertion failure.

CC: stable@vger.kernel.org # 4.4+
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-09-21 07:12:42 +02:00
..
tests btrfs: tests/qgroup: Fix wrong tree backref level 2018-05-30 07:49:09 +02:00
acl.c btrfs: preserve i_mode if __btrfs_set_acl() fails 2018-03-11 16:19:47 +01:00
async-thread.c btrfs: limit async_work allocation and worker func duration 2017-01-06 11:16:06 +01:00
async-thread.h btrfs: limit async_work allocation and worker func duration 2017-01-06 11:16:06 +01:00
backref.c Btrfs: do not start a transaction at iterate_extent_inodes() 2019-06-11 12:23:38 +02:00
backref.h
btrfs_inode.h Btrfs: Direct I/O: Fix space accounting 2015-09-21 13:47:55 -07:00
check-integrity.c Merge branch 'cleanups/for-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.4 2015-10-21 18:21:40 -07:00
check-integrity.h
compression.c btrfs: assign error values to the correct bio structs 2016-10-22 12:26:54 +02:00
compression.h
ctree.c Btrfs: memset to avoid stale content in btree leaf 2019-01-16 22:16:07 +01:00
ctree.h btrfs: tree-checker: Verify block_group_item 2019-01-16 22:16:09 +01:00
delayed-inode.c btrfs: limit async_work allocation and worker func duration 2017-01-06 11:16:06 +01:00
delayed-inode.h btrfs: properly set the termination value of ctx->pos in readdir 2016-02-25 12:01:15 -08:00
delayed-ref.c btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans 2015-10-26 19:44:39 -07:00
delayed-ref.h btrfs: qgroup: Fix a race in delayed_ref which leads to abort trans 2015-10-26 19:44:39 -07:00
dev-replace.c btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-07-10 09:56:44 +02:00
dev-replace.h
dir-item.c
disk-io.c btrfs: wait on ordered extents on abort cleanup 2019-01-26 09:42:50 +01:00
disk-io.h btrfs: don't create or leak aliased root while cleaning up orphans 2018-11-10 07:41:36 -08:00
export.c BTRFS: support NFSv2 export 2015-10-06 06:55:23 -07:00
export.h
extent-tree.c btrfs: Honour FITRIM range constraints during free space trim 2019-06-11 12:23:50 +02:00
extent-tree.h
extent_io.c Btrfs: fix corruption reading shared and compressed extents after hole punching 2019-03-23 08:44:36 +01:00
extent_io.h btrfs: struct-funcs, constify readers 2019-01-16 22:16:07 +01:00
extent_map.c btrfs: cleanup, stop casting for extent_map->lookup everywhere 2019-01-16 22:16:06 +01:00
extent_map.h btrfs: cleanup, stop casting for extent_map->lookup everywhere 2019-01-16 22:16:06 +01:00
file-item.c
file.c Btrfs: fix race between ranged fsync and writeback of adjacent ranges 2019-06-11 12:23:52 +02:00
free-space-cache.c Btrfs: fix use-after-free when dumping free space 2018-12-13 09:21:32 +01:00
free-space-cache.h Btrfs: keep track of largest extent in bitmaps 2015-10-21 18:55:40 -07:00
hash.c
hash.h
inode-item.c Btrfs: consolidate btrfs_error() to btrfs_std_error() 2015-09-29 16:30:00 +02:00
inode-map.c Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-03-03 15:07:12 -08:00
inode-map.h Btrfs: Initialize btrfs_root->highest_objectid when loading tree root and subvolume roots 2016-03-03 15:07:12 -08:00
inode.c Btrfs: fix null pointer dereference on compressed write path error 2018-11-21 09:27:38 +01:00
ioctl.c btrfs: Ensure btrfs_trim_fs can trim the whole filesystem 2018-12-01 09:46:41 +01:00
Kconfig
locking.c btrfs: comment the rest of implicit barriers before waitqueue_active 2015-10-10 18:42:00 +02:00
locking.h
lzo.c
Makefile btrfs: Move leaf and node validation checker to tree-checker.c 2019-01-16 22:16:08 +01:00
math.h
ordered-data.c Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
ordered-data.h Btrfs: change how we wait for pending ordered extents 2015-10-21 18:51:40 -07:00
orphan.c
print-tree.c
print-tree.h
props.c btrfs: cleanup iterating over prop_handlers array 2015-10-21 18:28:48 +02:00
props.h
qgroup.c btrfs: qgroup: Dirty all qgroups before rescan 2018-11-21 09:27:38 +01:00
qgroup.h btrfs: waiting on qgroup rescan should not always be interruptible 2016-09-07 08:32:43 +02:00
raid56.c btrfs: raid56: properly unmap parity page in finish_parity_scrub() 2019-04-03 06:23:26 +02:00
raid56.h Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation 2015-08-09 07:34:26 -07:00
rcu-string.h
reada.c btrfs: start readahead also in seed devices 2019-07-10 09:56:33 +02:00
relocation.c btrfs: Handle owner mismatch gracefully when walking up tree 2018-11-21 09:27:37 +01:00
root-tree.c btrfs: don't create or leak aliased root while cleaning up orphans 2018-11-10 07:41:36 -08:00
scrub.c btrfs: cleanup, stop casting for extent_map->lookup everywhere 2019-01-16 22:16:06 +01:00
send.c Btrfs: send, fix infinite loop due to directory rename dependencies 2018-12-17 21:55:10 +01:00
send.h
struct-funcs.c btrfs: struct-funcs, constify readers 2019-01-16 22:16:07 +01:00
super.c Btrfs: ensure path name is null terminated at btrfs_control_ioctl 2018-12-13 09:21:26 +01:00
sysfs.c btrfs: sysfs: don't leak memory when failing add fsid 2019-06-11 12:23:52 +02:00
sysfs.h Btrfs: rename btrfs_kobj_rm_device to btrfs_sysfs_rm_device_link 2015-09-29 16:29:59 +02:00
transaction.c btrfs: release metadata before running delayed refs 2018-12-13 09:21:27 +01:00
transaction.h btrfs: account for non-CoW'd blocks in btrfs_abort_transaction 2016-07-27 09:47:33 -07:00
tree-checker.c btrfs: tree-checker: Fix misleading group system information 2019-01-16 22:16:10 +01:00
tree-checker.h btrfs: tree-checker: Fix false panic for sanity test 2019-01-16 22:16:08 +01:00
tree-defrag.c Btrfs: cleanup: remove unnecessary check before btrfs_free_path is called 2015-08-31 11:46:41 -07:00
tree-log.c Btrfs: fix assertion failure during fsync and use of stale transaction 2019-09-21 07:12:42 +02:00
tree-log.h
ulist.c
ulist.h
uuid-tree.c btrfs: return the actual error value from from btrfs_uuid_tree_iterate 2017-11-30 08:37:28 +00:00
volumes.c btrfs: fix minimum number of chunk errors for DUP 2019-08-06 18:28:26 +02:00
volumes.h btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-07-10 09:56:44 +02:00
xattr.c Btrfs: fix race when listing an inode's xattrs 2015-11-09 18:34:40 +00:00
xattr.h
zlib.c