Commit graph

566119 commits

Author SHA1 Message Date
Jaegeuk Kim
79c3cf3d3b f2fs: fix wrong #endif
We have to cover whole headerfile with last #endif.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-14 13:57:33 -07:00
Jaegeuk Kim
b88b75de83 f2fs: don't clear CP_QUOTA_NEED_FSCK_FLAG
If we met this once, let fsck.f2fs clear this only.
Note that, this addresses all the subtle fault injection test.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-14 13:57:32 -07:00
Chao Yu
6a19f80150 f2fs: don't allow negative ->write_io_size_bits
As Dan reported:

"We put an upper bound on ->write_io_size_bits but we don't have a lower
bound."

So let's add lower bound check for ->write_io_size_bits in parse_options().

[We don't allow configuring ->write_io_size_bits to zero, since at least
we need to fill one dummy page for aligned IO.]

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-14 13:57:30 -07:00
Chao Yu
52bff3f6ce f2fs: fix to check inline_xattr_size boundary correctly
We use below condition to check inline_xattr_size boundary:

	if (!F2FS_OPTION(sbi).inline_xattr_size ||
		F2FS_OPTION(sbi).inline_xattr_size >=
				DEF_ADDRS_PER_INODE -
				F2FS_TOTAL_EXTRA_ATTR_SIZE -
				DEF_INLINE_RESERVED_SIZE -
				DEF_MIN_INLINE_SIZE)

There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.

.bitmap		1 * 1 = 1
.reserved	1 * 1 = 1
.dentry		11 * 2 = 22
.filename	8 * 2 = 16
total		40

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-03-14 13:57:28 -07:00
Jaegeuk Kim
2599cc3d72 Revert "f2fs: fix to avoid deadlock of atomic file operations"
This reverts commit f3ac182210162c7e76997a8566a7f9869349f3d8.
2019-02-26 10:05:35 -08:00
Jaegeuk Kim
d5aa6ed6b6 Revert "f2fs: fix to check inline_xattr_size boundary correctly"
This reverts commit 802a643228462e0e3f6e84977d912f1871f61467.
2019-02-26 10:05:34 -08:00
Sahitya Tummala
011f2e498d f2fs: do not use mutex lock in atomic context
Fix below warning coming because of using mutex lock in atomic context.

BUG: sleeping function called from invalid context at kernel/locking/mutex.c:98
in_atomic(): 1, irqs_disabled(): 0, pid: 585, name: sh
Preemption disabled at: __radix_tree_preload+0x28/0x130
Call trace:
 dump_backtrace+0x0/0x2b4
 show_stack+0x20/0x28
 dump_stack+0xa8/0xe0
 ___might_sleep+0x144/0x194
 __might_sleep+0x58/0x8c
 mutex_lock+0x2c/0x48
 f2fs_trace_pid+0x88/0x14c
 f2fs_set_node_page_dirty+0xd0/0x184

Do not use f2fs_radix_tree_insert() to avoid doing cond_resched() with
spin_lock() acquired.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:51:27 -08:00
Chao Yu
ecf753b370 f2fs: fix potential data inconsistence of checkpoint
Previously, we changed lock from cp_rwsem to node_change, it solved
the deadlock issue which was caused by below race condition:

Thread A			Thread B
- f2fs_setattr
 - f2fs_lock_op  -- read_lock
 - dquot_transfer
  - __dquot_transfer
   - dquot_acquire
    - commit_dqblk
     - f2fs_quota_write
      - f2fs_write_begin
       - f2fs_write_failed
				- write_checkpoint
				 - block_operations
				  - f2fs_lock_all  -- write_lock
        - f2fs_truncate_blocks
         - f2fs_lock_op  -- read_lock

But it breaks the sematics of cp_rwsem, in other callers like:
- f2fs_file_write_iter -> f2fs_write_begin -> f2fs_write_failed
- f2fs_direct_IO -> f2fs_write_failed

We allow to truncate dnode w/o cp_rwsem held, result in incorrect sit
bitmap update, which can cause further data corruption.

So this patch reverts previous fix implementation, and try to fix
deadlock by skipping calling f2fs_truncate_blocks() in f2fs_write_failed()
only for quota file, and keep the preallocated data/node in the tail of
quota file, we can expecte that the preallocated space can be used to
store quota info latter soon.

Fixes: af033b2aa8a8 ("f2fs: guarantee journalled quota data by checkpoint")
Signed-off-by: Gao Xiang <gaoxiang25@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:51:25 -08:00
Chao Yu
8ef83050d5 f2fs: fix to avoid deadlock of atomic file operations
Thread A				Thread B
- __fput
 - f2fs_release_file
  - drop_inmem_pages
   - mutex_lock(&fi->inmem_lock)
   - __revoke_inmem_pages
    - lock_page(page)
					- open
					- f2fs_setattr
					- truncate_setsize
					 - truncate_inode_pages_range
					  - lock_page(page)
					  - truncate_cleanup_page
					   - f2fs_invalidate_page
					    - drop_inmem_page
					    - mutex_lock(&fi->inmem_lock);

We may encounter above ABBA deadlock as reported by Kyungtae Kim:

I'm reporting a bug in linux-4.17.19: "INFO: task hung in
drop_inmem_page" (no reproducer)

I think this might be somehow related to the following:
https://groups.google.com/forum/#!searchin/syzkaller-bugs/INFO$3A$20task$20hung$20in$20%7Csort:date/syzkaller-bugs/c6soBTrdaIo/AjAzPeIzCgAJ

=========================================
INFO: task syz-executor7:10822 blocked for more than 120 seconds.
      Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7   D27024 10822   6346 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:2867 [inline]
 __schedule+0x721/0x1e60 kernel/sched/core.c:3515
 schedule+0x88/0x1c0 kernel/sched/core.c:3559
 schedule_preempt_disabled+0x18/0x30 kernel/sched/core.c:3617
 __mutex_lock_common kernel/locking/mutex.c:833 [inline]
 __mutex_lock+0x5bd/0x1410 kernel/locking/mutex.c:893
 mutex_lock_nested+0x1b/0x20 kernel/locking/mutex.c:908
 drop_inmem_page+0xcb/0x810 fs/f2fs/segment.c:327
 f2fs_invalidate_page+0x337/0x5e0 fs/f2fs/data.c:2401
 do_invalidatepage mm/truncate.c:165 [inline]
 truncate_cleanup_page+0x261/0x330 mm/truncate.c:187
 truncate_inode_pages_range+0x552/0x1610 mm/truncate.c:367
 truncate_inode_pages mm/truncate.c:478 [inline]
 truncate_pagecache+0x6d/0x90 mm/truncate.c:801
 truncate_setsize+0x81/0xa0 mm/truncate.c:826
 f2fs_setattr+0x44f/0x1270 fs/f2fs/file.c:781
 notify_change+0xa62/0xe80 fs/attr.c:313
 do_truncate+0x12e/0x1e0 fs/open.c:63
 do_last fs/namei.c:2955 [inline]
 path_openat+0x2042/0x29f0 fs/namei.c:3505
 do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
 do_sys_open+0x35e/0x4e0 fs/open.c:1101
 __do_sys_open fs/open.c:1119 [inline]
 __se_sys_open fs/open.c:1114 [inline]
 __x64_sys_open+0x89/0xc0 fs/open.c:1114
 do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f734e459c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f734e45a6cc RCX: 00000000004497b9
RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
RBP: 000000000071bea0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e45a700
INFO: task syz-executor7:10858 blocked for more than 120 seconds.
      Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor7   D28880 10858   6346 0x00000004
Call Trace:
 context_switch kernel/sched/core.c:2867 [inline]
 __schedule+0x721/0x1e60 kernel/sched/core.c:3515
 schedule+0x88/0x1c0 kernel/sched/core.c:3559
 __rwsem_down_write_failed_common kernel/locking/rwsem-xadd.c:565 [inline]
 rwsem_down_write_failed+0x5e6/0xc90 kernel/locking/rwsem-xadd.c:594
 call_rwsem_down_write_failed+0x17/0x30 arch/x86/lib/rwsem.S:117
 __down_write arch/x86/include/asm/rwsem.h:142 [inline]
 down_write+0x58/0xa0 kernel/locking/rwsem.c:72
 inode_lock include/linux/fs.h:713 [inline]
 do_truncate+0x120/0x1e0 fs/open.c:61
 do_last fs/namei.c:2955 [inline]
 path_openat+0x2042/0x29f0 fs/namei.c:3505
 do_filp_open+0x1bd/0x2c0 fs/namei.c:3540
 do_sys_open+0x35e/0x4e0 fs/open.c:1101
 __do_sys_open fs/open.c:1119 [inline]
 __se_sys_open fs/open.c:1114 [inline]
 __x64_sys_open+0x89/0xc0 fs/open.c:1114
 do_syscall_64+0xc4/0x4e0 arch/x86/entry/common.c:287
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f734e3b4c68 EFLAGS: 00000246 ORIG_RAX: 0000000000000002
RAX: ffffffffffffffda RBX: 00007f734e3b56cc RCX: 00000000004497b9
RDX: 0000000000000104 RSI: 00000000000a8280 RDI: 0000000020000080
RBP: 000000000071c238 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 0000000000007230 R14: 00000000006f02d0 R15: 00007f734e3b5700
INFO: task syz-executor5:10829 blocked for more than 120 seconds.
      Not tainted 4.17.19 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
syz-executor5   D28760 10829   6308 0x80000002
Call Trace:
 context_switch kernel/sched/core.c:2867 [inline]
 __schedule+0x721/0x1e60 kernel/sched/core.c:3515
 schedule+0x88/0x1c0 kernel/sched/core.c:3559
 io_schedule+0x21/0x80 kernel/sched/core.c:5179
 wait_on_page_bit_common mm/filemap.c:1100 [inline]
 __lock_page+0x2b5/0x390 mm/filemap.c:1273
 lock_page include/linux/pagemap.h:483 [inline]
 __revoke_inmem_pages+0xb35/0x11c0 fs/f2fs/segment.c:231
 drop_inmem_pages+0xa3/0x3e0 fs/f2fs/segment.c:306
 f2fs_release_file+0x2c7/0x330 fs/f2fs/file.c:1556
 __fput+0x2c7/0x780 fs/file_table.c:209
 ____fput+0x1a/0x20 fs/file_table.c:243
 task_work_run+0x151/0x1d0 kernel/task_work.c:113
 exit_task_work include/linux/task_work.h:22 [inline]
 do_exit+0x8ba/0x30a0 kernel/exit.c:865
 do_group_exit+0x13b/0x3a0 kernel/exit.c:968
 get_signal+0x6bb/0x1650 kernel/signal.c:2482
 do_signal+0x84/0x1b70 arch/x86/kernel/signal.c:810
 exit_to_usermode_loop+0x155/0x190 arch/x86/entry/common.c:162
 prepare_exit_to_usermode arch/x86/entry/common.c:196 [inline]
 syscall_return_slowpath arch/x86/entry/common.c:265 [inline]
 do_syscall_64+0x445/0x4e0 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x4497b9
RSP: 002b:00007f1c68e74ce8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 000000000071bf80 RCX: 00000000004497b9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000071bf80
RBP: 000000000071bf80 R08: 0000000000000000 R09: 000000000071bf58
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f1c68e759c0 R15: 00007f1c68e75700

This patch tries to use trylock_page to mitigate such deadlock condition
for fix.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:51:23 -08:00
Chao Yu
be2a96a3ba f2fs: fix to check inline_xattr_size boundary correctly
We use below condition to check inline_xattr_size boundary:

	if (!F2FS_OPTION(sbi).inline_xattr_size ||
		F2FS_OPTION(sbi).inline_xattr_size >=
				DEF_ADDRS_PER_INODE -
				F2FS_TOTAL_EXTRA_ATTR_SIZE -
				DEF_INLINE_RESERVED_SIZE -
				DEF_MIN_INLINE_SIZE)

There is there problems in that check:
- we should allow inline_xattr_size equaling to min size of inline
{data,dentry} area.
- F2FS_TOTAL_EXTRA_ATTR_SIZE and inline_xattr_size are based on
different size unit, previous one is 4 bytes, latter one is 1 bytes.
- DEF_MIN_INLINE_SIZE only indicate min size of inline data area,
however, we need to consider min size of inline dentry area as well,
minimal inline dentry should at least contain two entries: '.' and
'..', so that min inline_dentry size is 40 bytes.

.bitmap		1 * 1 = 1
.reserved	1 * 1 = 1
.dentry		11 * 2 = 22
.filename	8 * 2 = 16
total		40

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:51:22 -08:00
Chengguang Xu
b8fe720a61 f2fs: jump to label 'free_node_inode' when failing from d_make_root()
When sb->s_root is NULL dput() will do nothing,
so jump to label 'free_node_inode' instead of lable
'free_root_inode' when failing from d_make_root().

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:51:20 -08:00
Chao Yu
843809bad2 f2fs: fix to document inline_xattr_size option
We missed to add document for inline_xattr_size mount option in f2fs.txt,
add it.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:50:48 -08:00
zhengliang
b339a10ec6 f2fs: fix to data block override node segment by mistake
The following race could lead to data block override node segment by mistake.

Task A            |    Task B         |  Task C            |    Task D
=======           |   ========        |==========          |  =========
open file         |                   |                    |
white file        |                   |                    |
submit bio        |                   |                    |
wait io complete  |                   |                    |
		  |   remove file     |                    |
........          |   iput_final      |                    |
		  |                   |   sync             |
		  |                   |  do checkpoint     |
		  |		      |  data segment free |
		  |                   |                    | create file1
		  |		      |		           | allocate node segment(if it is the same segment freed by Task C)
f2fs_write_end_io |		      |                    |

So we need to guarantee io complete before truncate inode
in f2fs_drop_inode.

Signed-off-by: Zheng Liang <zhengliang6@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:50:31 -08:00
Geliang Tang
04c71ef5d9 f2fs: fix typos in code comments
lengh -> length

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:50:29 -08:00
Jaegeuk Kim
9eb897a050 f2fs: sync filesystem after roll-forward recovery
Some works after roll-forward recovery can get an error which will release
all the data structures. Let's flush them in order to make it clean.

One possible corruption came from:

[   90.400500] list_del corruption. prev->next should be ffffffed1f566208, but was (null)
[   90.675349] Call trace:
[   90.677869]  __list_del_entry_valid+0x94/0xb4
[   90.682351]  remove_dirty_inode+0xac/0x114
[   90.686563]  __f2fs_write_data_pages+0x6a8/0x6c8
[   90.691302]  f2fs_write_data_pages+0x40/0x4c
[   90.695695]  do_writepages+0x80/0xf0
[   90.699372]  __writeback_single_inode+0xdc/0x4ac
[   90.704113]  writeback_sb_inodes+0x280/0x440
[   90.708501]  wb_writeback+0x1b8/0x3d0
[   90.712267]  wb_workfn+0x1a8/0x4d4
[   90.715765]  process_one_work+0x1c0/0x3d4
[   90.719883]  worker_thread+0x224/0x344
[   90.723739]  kthread+0x120/0x130
[   90.727055]  ret_from_fork+0x10/0x18

Reported-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:50:02 -08:00
Jaegeuk Kim
6df501312c fs: export evict_inodes
This comes by 799ea9e9c599 ("xfs: evict all inodes involved with log redo item")

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:50:00 -08:00
Jaegeuk Kim
22cb0dec0e f2fs: flush quota blocks after turnning it off
After quota_off, we'll get some dirty blocks. If put_super don't have a chance
to flush them by checkpoint, it causes NULL pointer exception in end_io after
iput(node_inode). (e.g., by checkpoint=disable)

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:34:08 -08:00
Jaegeuk Kim
909e1f05af f2fs: avoid null pointer exception in dcc_info
If dcc_info is not set yet, we can get null pointer panic.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:34:06 -08:00
Jaegeuk Kim
0335a980d2 f2fs: don't wake up too frequently, if there is lots of IOs
Otherwise, it wakes up discard thread which will sleep again by busy IOs
in a loop.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:34:05 -08:00
Jaegeuk Kim
c21ea75cec f2fs: try to keep CP_TRIMMED_FLAG after successful umount
If every discard were issued successfully, we can avoid further discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:34:03 -08:00
Jaegeuk Kim
3907a1e399 f2fs: add quick mode of checkpoint=disable for QA
This mode returns mount() quickly with EAGAIN. We can trigger this by
shutdown(F2FS_GOING_DOWN_NEED_FSCK).

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:34:01 -08:00
Jaegeuk Kim
35f7e2f6b1 f2fs: run discard jobs when put_super
When we umount f2fs, we need to avoid long delay due to discard commands, which
is actually taking tens of seconds, if storage is very slow on UNMAP. So, this
patch introduces timeout-based work on it.

By default, let me give 5 seconds for discard.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:59 -08:00
Chao Yu
c90a357aee f2fs: fix to set sbi dirty correctly
In order to record direct IO count, we add two additional type in
enum count_type: F2FS_DIO_{WRITE,READ}, but those IO won't dirty
filesystem metadata, so we don't need to set filesystem dirty in
inc_page_count(), fix it.

Fixes: 02b16d0a34a1 ("f2fs: add to account direct IO")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:57 -08:00
Sheng Yong
1b64f91923 f2fs: UBSAN: set boolean value iostat_enable correctly
When setting /sys/fs/f2fs/<DEV>/iostat_enable with non-bool value, UBSAN
reports the following warning.

[ 7562.295484] ================================================================================
[ 7562.296531] UBSAN: Undefined behaviour in fs/f2fs/f2fs.h:2776:10
[ 7562.297651] load of value 64 is not a valid value for type '_Bool'
[ 7562.298642] CPU: 1 PID: 7487 Comm: dd Not tainted 4.20.0-rc4+ #79
[ 7562.298653] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[ 7562.298662] Call Trace:
[ 7562.298760]  dump_stack+0x46/0x5b
[ 7562.298811]  ubsan_epilogue+0x9/0x40
[ 7562.298830]  __ubsan_handle_load_invalid_value+0x72/0x90
[ 7562.298863]  f2fs_file_write_iter+0x29f/0x3f0
[ 7562.298905]  __vfs_write+0x115/0x160
[ 7562.298922]  vfs_write+0xa7/0x190
[ 7562.298934]  ksys_write+0x50/0xc0
[ 7562.298973]  do_syscall_64+0x4a/0xe0
[ 7562.298992]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 7562.299001] RIP: 0033:0x7fa45ec19c00
[ 7562.299004] Code: 73 01 c3 48 8b 0d 88 92 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d dd eb 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ce 8f 01 00 48 89 04 24
[ 7562.299044] RSP: 002b:00007ffca52b49e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 7562.299052] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fa45ec19c00
[ 7562.299059] RDX: 0000000000000400 RSI: 000000000093f000 RDI: 0000000000000001
[ 7562.299065] RBP: 000000000093f000 R08: 0000000000000004 R09: 0000000000000000
[ 7562.299071] R10: 00007ffca52b47b0 R11: 0000000000000246 R12: 0000000000000400
[ 7562.299077] R13: 000000000093f000 R14: 000000000093f400 R15: 0000000000000000
[ 7562.299091] ================================================================================

So, if iostat_enable is enabled, set its value as true.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:56 -08:00
Sheng Yong
b8243df07b f2fs: add brackets for macros
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:54 -08:00
Sheng Yong
9ca1182a3c f2fs: check if file namelen exceeds max value
Dentry bitmap is not enough to detect incorrect dentries. So this patch
also checks the namelen value of a dentry.

Signed-off-by: Gong Chen <gongchen4@huawei.com>
Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:52 -08:00
Chao Yu
82e2ea2753 f2fs: fix to trigger fsck if dirent.name_len is zero
While traversing dirents in f2fs_fill_dentries(), if bitmap is valid,
filename length should not be zero, otherwise, directory structure
consistency could be corrupted, in this case, let's print related
info and set SBI_NEED_FSCK to trigger fsck for repairing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:51 -08:00
Greg Kroah-Hartman
1ba66eab9c f2fs: no need to check return value of debugfs_create functions
When calling debugfs functions, there is no need to ever check the
return value.  The function can work or not, but the code logic should
never do something different based on this.

Cc: Jaegeuk Kim <jaegeuk@kernel.org>
Cc: Chao Yu <yuchao0@huawei.com>
Cc: linux-f2fs-devel@lists.sourceforge.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:49 -08:00
Jaegeuk Kim
b9aad0e3d0 f2fs: export FS_NOCOW_FL flag to user
This exports pin_file status to user.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:47 -08:00
Chao Yu
b5897a1fad f2fs: check inject_rate validity during configuring
Type of inject_rate is unsigned int, let's check new value's
validity during configuring.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:45 -08:00
YueHaibing
f3a0a4ad1e f2fs: remove set but not used variable 'err'
Fixes gcc '-Wunused-but-set-variable' warning:

fs/f2fs/data.c: In function 'f2fs_dio_submit_bio':
fs/f2fs/data.c:2585:6: warning:
 variable 'err' set but not used [-Wunused-but-set-variable]

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:33:27 -08:00
Zhikang Zhang
746a1117c5 f2fs: fix compile warnings: 'struct *' declared inside parameter list
We meet these compile warnings below, which caused by missing declare structs:
struct f2fs_io_info, struct extent, struct f2fs_sb_info.

warning: 'struct f2fs_io_info' declared inside parameter list
warning: 'struct extent_info' declared inside parameter list
warning: 'struct f2fs_sb_info' declared inside parameter list

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:32:56 -08:00
Chengguang Xu
b224ba20b4 f2fs: change error code to -ENOMEM from -EINVAL
The error case of failing allocating memory should
return -ENOMEM.

Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-02-04 10:32:54 -08:00
Jaegeuk Kim
4cb399ef78 f2fs: don't access node/meta inode mapping after iput
This fixes wrong access of address spaces of node and meta inodes after iput.

Fixes: 60aa4d5536ab ("f2fs: fix use-after-free issue when accessing sbi->stat_info")
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-04 13:07:56 -08:00
Jaegeuk Kim
45afbd3805 f2fs: wait on atomic writes to count F2FS_CP_WB_DATA
Otherwise, we can get wrong counts incurring checkpoint hang.

IO_W (CP:  -24, Data:   24, Flush: (   0    0    1), Discard: (   0    0))

Thread A                        Thread B
- f2fs_write_data_pages
 -  __write_data_page
  - f2fs_submit_page_write
   - inc_page_count(F2FS_WB_DATA)
     type is F2FS_WB_DATA due to file is non-atomic one
- f2fs_ioc_start_atomic_write
 - set_inode_flag(FI_ATOMIC_FILE)
                                - f2fs_write_end_io
                                 - dec_page_count(F2FS_WB_CP_DATA)
                                   type is F2FS_WB_DATA due to file becomes
                                   atomic one

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-04 13:07:54 -08:00
Jaegeuk Kim
91aae9fd92 f2fs: sanity check of xattr entry size
There is a security report where f2fs_getxattr() has a hole to expose wrong
memory region when the image is malformed like this.

f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4

Cc: <stable@vger.kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:57 -08:00
Sahitya Tummala
652f47425f f2fs: fix use-after-free issue when accessing sbi->stat_info
iput() on sbi->node_inode can update sbi->stat_info
in the below context, if the f2fs_write_checkpoint()
has failed with error.

f2fs_balance_fs_bg+0x1ac/0x1ec
f2fs_write_node_pages+0x4c/0x260
do_writepages+0x80/0xbc
__writeback_single_inode+0xdc/0x4ac
writeback_single_inode+0x9c/0x144
write_inode_now+0xc4/0xec
iput+0x194/0x22c
f2fs_put_super+0x11c/0x1e8
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78

Fix this by moving f2fs_destroy_stats() further below iput() in
both f2fs_put_super() and f2fs_fill_super() paths.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:55 -08:00
Chao Yu
c0d46fe1b7 f2fs: check PageWriteback flag for ordered case
For all ordered cases in f2fs_wait_on_page_writeback(), we need to
check PageWriteback status, so let's clean up to relocate the check
into f2fs_wait_on_page_writeback().

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:53 -08:00
Martin Blumenstingl
7d78945c83 f2fs: fix validation of the block count in sanity_check_raw_super
Treat "block_count" from struct f2fs_super_block as 64-bit little endian
value in sanity_check_raw_super() because struct f2fs_super_block
declares "block_count" as "__le64".

This fixes a bug where the superblock validation fails on big endian
devices with the following error:
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 1th superblock
  F2FS-fs (sda1): Wrong segment_count / block_count (61439 > 0)
  F2FS-fs (sda1): Can't find valid F2FS filesystem in 2th superblock
As result of this the partition cannot be mounted.

With this patch applied the superblock validation works fine and the
partition can be mounted again:
  F2FS-fs (sda1): Mounted with checkpoint version = 7c84

My little endian x86-64 hardware was able to mount the partition without
this fix.
To confirm that mounting f2fs filesystems works on big endian machines
again I tested this on a 32-bit MIPS big endian (lantiq) device.

Fixes: 0cfe75c5b01199 ("f2fs: enhance sanity_check_raw_super() to avoid potential overflows")
Cc: stable@vger.kernel.org
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:51 -08:00
Jaegeuk Kim
ed5b0cbccc f2fs: fix missing unlock(sbi->gc_mutex)
This fixes missing unlock call.

Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:49 -08:00
Chao Yu
0f9dc70b5f f2fs: clean up structure extent_node
The union in struct extent_node wass only to indicate below fields

	struct rb_node rb_node;
	union {
		struct {
			unsigned int fofs;
			unsigned int len;
		...
	...

can be parsed as fields in struct rb_entry, but they were never be
used explicitly before, so let's remove them for cleanup.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:47 -08:00
Qiuyang Sun
491624d7ea f2fs: fix block address for __check_sit_bitmap
Should use lstart (logical start address) instead of start (in dev) here.
This fixes a bug in multi-device scenarios.

Signed-off-by: Qiuyang Sun <sunqiuyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:46 -08:00
Sahitya Tummala
ac9f979f2e f2fs: fix sbi->extent_list corruption issue
When there is a failure in f2fs_fill_super() after/during
the recovery of fsync'd nodes, it frees the current sbi and
retries again. This time the mount is successful, but the files
that got recovered before retry, still holds the extent tree,
whose extent nodes list is corrupted since sbi and sbi->extent_list
is freed up. The list_del corruption issue is observed when the
file system is getting unmounted and when those recoverd files extent
node is being freed up in the below context.

list_del corruption. prev->next should be fffffff1e1ef5480, but was (null)
<...>
kernel BUG at kernel/msm-4.14/lib/list_debug.c:53!
lr : __list_del_entry_valid+0x94/0xb4
pc : __list_del_entry_valid+0x94/0xb4
<...>
Call trace:
__list_del_entry_valid+0x94/0xb4
__release_extent_node+0xb0/0x114
__free_extent_tree+0x58/0x7c
f2fs_shrink_extent_tree+0xdc/0x3b0
f2fs_leave_shrinker+0x28/0x7c
f2fs_put_super+0xfc/0x1e0
generic_shutdown_super+0x70/0xf4
kill_block_super+0x2c/0x5c
kill_f2fs_super+0x44/0x50
deactivate_locked_super+0x60/0x8c
deactivate_super+0x68/0x74
cleanup_mnt+0x40/0x78
__cleanup_mnt+0x1c/0x28
task_work_run+0x48/0xd0
do_notify_resume+0x678/0xe98
work_pending+0x8/0x14

Fix this by not creating extents for those recovered files if shrinker is
not registered yet. Once mount is successful and shrinker is registered,
those files can have extents again.

Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:44 -08:00
Chao Yu
51aeb0d993 f2fs: clean up checkpoint flow
This patch cleans up checkpoint flow a bit:
- remove unneeded circulation of flushing meta pages.
- don't flush nat_bits pages in prior to other checkpoint pages.
- add bug_on to check remained meta pages after flushing.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:42 -08:00
Jaegeuk Kim
7beec23358 f2fs: flush stale issued discard candidates
Sometimes, I could observe # of issuing_discard to be 1 which blocks background
jobs due to is_idle()=false.
The only way to get out of it was to trigger gc_urgent. This patch avoids that
by checking any candidates as done in the list.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:41 -08:00
Jaegeuk Kim
6f3ef9d3c4 f2fs: correct wrong spelling, issing_*
Let's use "queued" instead of "issuing".

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:39 -08:00
Jaegeuk Kim
c7495fa806 f2fs: use kvmalloc, if kmalloc is failed
One report says memalloc failure during mount.

 (unwind_backtrace) from [<c010cd4c>] (show_stack+0x10/0x14)
 (show_stack) from [<c049c6b8>] (dump_stack+0x8c/0xa0)
 (dump_stack) from [<c024fcf0>] (warn_alloc+0xc4/0x160)
 (warn_alloc) from [<c0250218>] (__alloc_pages_nodemask+0x3f4/0x10d0)
 (__alloc_pages_nodemask) from [<c0270450>] (kmalloc_order_trace+0x2c/0x120)
 (kmalloc_order_trace) from [<c03fa748>] (build_node_manager+0x35c/0x688)
 (build_node_manager) from [<c03de494>] (f2fs_fill_super+0xf0c/0x16cc)
 (f2fs_fill_super) from [<c02a5864>] (mount_bdev+0x15c/0x188)
 (mount_bdev) from [<c03da624>] (f2fs_mount+0x18/0x20)
 (f2fs_mount) from [<c02a68b8>] (mount_fs+0x158/0x19c)
 (mount_fs) from [<c02c3c9c>] (vfs_kern_mount+0x78/0x134)
 (vfs_kern_mount) from [<c02c76ac>] (do_mount+0x474/0xca4)
 (do_mount) from [<c02c8264>] (SyS_mount+0x94/0xbc)
 (SyS_mount) from [<c0108180>] (ret_fast_syscall+0x0/0x48)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:37 -08:00
Yunlong Song
26c5b3d768 f2fs: remove redundant comment of unused wio_mutex
Commit 089842de ("f2fs: remove codes of unused wio_mutex") removes codes
of unused wio_mutex, but missing the comment, so delete it.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:35 -08:00
Chao Yu
b7b4be2913 f2fs: fix to reorder set_page_dirty and wait_on_page_writeback
This patch reorders flow from

- update page
- set_page_dirty
- wait_on_page_writeback

to

- wait_on_page_writeback
- update page
- set_page_dirty

The reason is:
- set_page_dirty will increase reference of dirty page, the reference
should be cleared before wait_on_page_writeback to keep its consistency.
- some devices need stable page during page writebacking, so we
should not change page's data.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:33 -08:00
Sheng Yong
7c3f76cff3 f2fs: clear PG_writeback if IPU failed
If IPU failed, nothing is commited, we should end page writeback.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2019-01-02 16:17:31 -08:00