There is one case that we can leave bio in f2fs, result in hanging
page writeback waiter.
Thread A Thread B
- f2fs_write_cache_pages
- f2fs_submit_page_write
page #0 cached in bio #0 of cold log
- f2fs_submit_page_write
page #1 cached in bio #1 of warm log
- f2fs_write_cache_pages
- f2fs_submit_page_write
bio is full, submit bio #1 contain page #1
- f2fs_submit_merged_write_cond(, page #1)
fail to submit bio #0 due to page #1 is not in any cached bios.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now we support crc32 checksum for superblock.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Junling Zheng <zhengjunling@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to account skip count of background GC, and show stat
info via 'status' debugfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch supports to account meta IO, it enables to show write IO
from f2fs more comprehensively via 'status' debugfs entry.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes missing up_read call.
Fixes: c9b60788fc76 ("f2fs: fix to do sanity check with block address in main area")
Cc: <stable@vger.kernel.org> # 4.19+
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch avoids BUG_ON when f2fs_get_meta_page_nofail got EIO during
xfstests/generic/475.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Mark inode dirty explicitly in the end of recover_inode() to make sure
that all recoverable fields can be persisted later.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
inode.i_gc_failures is used to indicate that skip count of migrating
on blocks of inode, we should guarantee it can be recovered in sudden
power-off case.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Testcase to reproduce this bug:
1. mkfs.f2fs /dev/sdd
2. mount -t f2fs /dev/sdd /mnt/f2fs
3. touch /mnt/f2fs/file
4. sync
5. chattr +A /mnt/f2fs/file
6. xfs_io -f /mnt/f2fs/file -c "fsync"
7. godown /mnt/f2fs
8. umount /mnt/f2fs
9. mount -t f2fs /dev/sdd /mnt/f2fs
10. lsattr /mnt/f2fs/file
-----------------N- /mnt/f2fs/file
But actually, we expect the corrct result is:
-------A---------N- /mnt/f2fs/file
The reason is we didn't recover inode.i_flags field during mount,
fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This is related to
ee70daaba82d ("xfs: update i_size after unwritten conversion in dio completion")
If we update i_size during dio_write, dio_read can read out stale data, which
breaks xfstests/465.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Currently we show mount option "io_bits=%u" as "io_size=%uKB",
it will cause option parsing problem(unrecognized mount option)
in remount.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Step to reproduce this bug:
1. logon as root
2. mount -t f2fs /dev/sdd /mnt;
3. touch /mnt/file;
4. chown system /mnt/file; chgrp system /mnt/file;
5. xfs_io -f /mnt/file -c "fsync";
6. godown /mnt;
7. umount /mnt;
8. mount -t f2fs /dev/sdd /mnt;
After step 8) we will expect file's uid/gid are all system, but during
recovery, these two fields were not been recovered, fix it.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If we have an error in f2fs_build_free_nids, we're able to fall into a loop
to find free nids.
Suggested-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This helps to control the frequency of submission of discard and
GC requests independently, based on the need.
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to support injecting error for write IO, this can simulate
IO error like fail_make_request or dm_flakey does.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Remove the verbose license text from f2fs files and replace them with
SPDX tags. This does not change the license of any of the code.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It's a little bit strange when fault_injection related
options fail with -EINVAL which were already disabled
from config, so surround all fault_injection related option
parsing code using CONFIG_F2FS_FAULT_INJECTION. Meanwhile,
slightly change warning message to keep consistency with
option POSIX_ACL and FS_XATTR.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In the call trace below, we might sleep in function dput().
So in order to avoid sleeping under spin_lock, we remove f2fs_mark_inode_dirty_sync
from __try_update_largest_extent && __drop_largest_extent.
BUG: sleeping function called from invalid context at fs/dcache.c:796
Call trace:
dump_backtrace+0x0/0x3f4
show_stack+0x24/0x30
dump_stack+0xe0/0x138
___might_sleep+0x2a8/0x2c8
__might_sleep+0x78/0x10c
dput+0x7c/0x750
block_dump___mark_inode_dirty+0x120/0x17c
__mark_inode_dirty+0x344/0x11f0
f2fs_mark_inode_dirty_sync+0x40/0x50
__insert_extent_tree+0x2e0/0x2f4
f2fs_update_extent_tree_range+0xcf4/0xde8
f2fs_update_extent_cache+0x114/0x12c
f2fs_update_data_blkaddr+0x40/0x50
write_data_page+0x150/0x314
do_write_data_page+0x648/0x2318
__write_data_page+0xdb4/0x1640
f2fs_write_cache_pages+0x768/0xafc
__f2fs_write_data_pages+0x590/0x1218
f2fs_write_data_pages+0x64/0x74
do_writepages+0x74/0xe4
__writeback_single_inode+0xdc/0x15f0
writeback_sb_inodes+0x574/0xc98
__writeback_inodes_wb+0x190/0x204
wb_writeback+0x730/0xf14
wb_check_old_data_flush+0x1bc/0x1c8
wb_workfn+0x554/0xf74
process_one_work+0x440/0x118c
worker_thread+0xac/0x974
kthread+0x1a0/0x1c8
ret_from_fork+0x10/0x1c
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add a plug to merge readahead IO in readdir(), expecting it can
reduce bio count before submitting to block layer.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In fill_super -> init_percpu_info, we should destroy percpu counter
in error path, otherwise memory allcoated for percpu counter will
leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It needs to release memory allocated for sbi->write_io in error path,
otherwise, it will cause memory leak.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
default_acl and acl of newly created inode will be initiated
as ACL_NOT_CACHED in vfs function inode_init_always() and later
will be updated by calling xxx_init_acl() in specific filesystems.
Howerver, when default_acl and acl are NULL then they keep the value
of ACL_NOT_CACHED, this patch tries to cache NULL for acl/default_acl
in this case.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Acked-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
generic/417 reported as blow:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/devf2fs/inode.c:695!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 1 PID: 21697 Comm: umount Tainted: G W O 4.18.0-rc2+ #39
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
Call Trace:
? _raw_spin_unlock+0x2c/0x50
evict+0xa8/0x170
dispose_list+0x34/0x40
evict_inodes+0x118/0x120
generic_shutdown_super+0x41/0x100
? rcu_read_lock_sched_held+0x97/0xa0
kill_block_super+0x22/0x50
kill_f2fs_super+0x6f/0x80 [f2fs]
deactivate_locked_super+0x3d/0x70
deactivate_super+0x40/0x60
cleanup_mnt+0x39/0x70
__cleanup_mnt+0x10/0x20
task_work_run+0x81/0xa0
exit_to_usermode_loop+0x59/0xa7
do_fast_syscall_32+0x1f5/0x22c
entry_SYSENTER_32+0x53/0x86
EIP: f2fs_evict_inode+0x556/0x580 [f2fs]
It can simply reproduced with scripts:
Enable quota feature during mkfs.
Testcase1:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4k" -c "fsync"
4. godown /mnt/f2fs
5. umount /mnt/f2fs
6. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
7. umount /mnt/f2fs
Testcase2:
1. mkfs.f2fs /dev/zram0
2. mount -t f2fs /dev/zram0 /mnt/f2fs
3. touch /mnt/f2fs/file
4. create process[pid = x] do:
a) open /mnt/f2fs/file;
b) unlink /mnt/f2fs/file
5. godown -f /mnt/f2fs
6. kill process[pid = x]
7. umount /mnt/f2fs
8. mount -t f2fs -o ro /dev/zram0 /mnt/f2fs
9. umount /mnt/f2fs
The reason is: during recovery, i_{c,m}time of inode will be updated, then
the inode can be set dirty w/o being tracked in sbi->inode_list[DIRTY_META]
global list, so later write_checkpoint will not flush such dirty inode into
node page.
Once umount is called, sync_filesystem() in generic_shutdown_super() will
skip syncng dirty inodes due to sb_rdonly check, leaving dirty inodes
there.
To solve this issue, during umount, add remove SB_RDONLY flag in
sb->s_flags, to make sure sync_filesystem() will not be skipped.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sometimes, some merged IOs could get a chance to be submitted, resulting in
system hang in shutdown test. This issues IOs all the time after shutdown.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
1. Create a file in an encrypted directory
2. Do GC & drop caches
3. Read stale data before its bio for metapage was not issued yet
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Don't limit printing log, so that we will not miss any key messages.
This reverts commit a36c106dffb616250117efb1cab271c19a8f94ff.
In addition, we use printk_ratelimited to avoid too many log prints.
- error injection
- discard submission failure
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When dev is busy, discard thread wake up timeout can be aligned with the
exact time that it needs to wait for dev to come out of busy. This helps
to avoid unnecessary periodic wakeups and thus save some power.
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
https://bugzilla.kernel.org/show_bug.cgi?id=200951
These is a NULL pointer dereference issue reported in bugzilla:
Hi,
in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
There are four partitions:
sda1: FAT /boot
sda2: F2FS /
sda3: F2FS /home
sda4: F2FS
The bridge is ASMT1153e which uses the "uas" driver.
There is no TRIM pass-through, so, when mounting it reports:
mounting with "discard" option, but the device does not support discard
The USB host is USB3.0 and UASP capable. It is the one on RK3399.
Given this everything works fine, except there is no TRIM support.
In order to enable TRIM a new UDEV rule is added [1]:
/etc/udev/rules.d/10-sata-bridge-trim.rules:
ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
After reboot any F2FS write hangs forever and dmesg reports:
Unable to handle kernel NULL pointer dereference
Also tested on a x86_64 system: works fine even with TRIM enabled.
same disc
same bridge
different usb host controller
different cpu architecture
not root filesystem
Regards,
Vicenç.
[1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
[000000000000003e] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
Hardware name: Sapphire-RK3399 Board (DT)
pstate: 00000005 (nzcv daif -PAN -UAO)
pc : update_sit_entry+0x304/0x4b0
lr : update_sit_entry+0x108/0x4b0
sp : ffff00000ca13bd0
x29: ffff00000ca13bd0 x28: 000000000000003e
x27: 0000000000000020 x26: 0000000000080000
x25: 0000000000000048 x24: ffff8000ebb85cf8
x23: 0000000000000253 x22: 00000000ffffffff
x21: 00000000000535f2 x20: 00000000ffffffdf
x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
x17: 0000000007ce6926 x16: 000000001c83ffa8
x15: 0000000000000000 x14: ffff8000f602df90
x13: 0000000000000006 x12: 0000000000000040
x11: 0000000000000228 x10: 0000000000000000
x9 : 0000000000000000 x8 : 0000000000000000
x7 : 00000000000535f2 x6 : ffff8000ebff3440
x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
x3 : 00000000ffffffff x2 : 0000000000000020
x1 : 0000000000000000 x0 : ffff8000eb9e5800
Process nvim (pid: 957, stack limit = 0x0000000063a78320)
Call trace:
update_sit_entry+0x304/0x4b0
f2fs_invalidate_blocks+0x98/0x140
truncate_node+0x90/0x400
f2fs_remove_inode_page+0xe8/0x340
f2fs_evict_inode+0x2b0/0x408
evict+0xe0/0x1e0
iput+0x160/0x260
do_unlinkat+0x214/0x298
__arm64_sys_unlinkat+0x3c/0x68
el0_svc_handler+0x94/0x118
el0_svc+0x8/0xc
Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
---[ end trace a0f21a307118c477 ]---
The reason is it is possible to enable discard flag on block queue via
UDEV, but during mount, f2fs will initialize se->discard_map only if
this flag is set, once the flag is set after mount, f2fs may dereference
NULL pointer on se->discard_map.
So this patch does below changes to fix this issue:
- initialize and update se->discard_map all the time.
- don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
during mount.
- don't issue small discard on zoned block device.
- introduce some functions to enhance the readability.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Tested-by: Vicente Bergas <vicencb@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Add additinal sanity check for irregular case(e.g. corruption).
If size of extended attribution is smaller than size of acl header,
then return -EINVAL.
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
If fw build timestamp passed by QMI is a non-NULL terminated string,
it might result in a out-of-bounds read in icnss_get_soc_info. Hence,
manually NULL terminate the string.
Change-Id: I252196cd12784d841b29303c42591efc59da64f1
CRs-Fixed: 2322317
Signed-off-by: Yuanyuan Liu <yuanliu@codeaurora.org>
This config will be useful to debug data corruptions without
losing context.
Change-Id: Ic4e3f65f6edffd15c2252393035ee6702e44e13b
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
Dload type imem offset is corrected for MSM8998, so that
correct imem address is updated.
Change-Id: I519603641753ec39d86fbf923bd80afcd6b1345d
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
HDMI PHY registers for MSM8998 chipset are not programmed as
per the latest hardware programming guide for the chipset.
This can affect HDMI electrical compliance test cases due
to low signal strength.
Fix up the HDMI PHY driver to use the latest values for the
PHY registers.
Signed-off-by: Abhinav Kumar <abhinavk@codeaurora.org>
Change-Id: I96dd6ffaf127940d34af898c5a2982209a6fc83c
The function ion_handle_get_by_id() is called from function
msm_ion_custom_ioctl(), so we retained it even though it got
deleted in commit 2c155709e4
("staging: android: ion: fix ION_IOC_{MAP,SHARE} use-after-free").
This can lead to same use-after-free scenario with
ION_IOC_{CLEAN_CACHES,INV_CACHES,CLEAN_INV_CACHES} so removing
this unsafe function and holding client->lock for entire operation.
Change-Id: I536cfa69465ad692794500e3b31ac137d04940ff
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
With the pcie-ep device, pcie0 can work in endpoint mode.
But it's disabled by default for the function conflict with
pcie0 in host mode.
Change-Id: Ia39edf8dd8fc2a1b3ed79db13aa6f85065ebf7b0
Signed-off-by: Yimin Peng <yiminp@codeaurora.org>