Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
commit f819874f58 ("f2fs: check cap_resource only for data blocks")
This patch changes the rule to check cap_resource for data blocks, not inode
or node blocks in order to avoid selinux denial.
Change-Id: I875d7ccf7cce7b833a1c11cb0eef0b504b823c4a
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
commit 3e7a141175 ("Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"")
This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.
This reverts commit fe76b796fc5194cc3d57265002e3a748566d073f.
Change-Id: I3d4728d894d1af41a2f1e30ebc375907abd5ffc8
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
c18b4f60c8 ("f2fs: refactor read path to allow multiple postprocessing steps")
Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only. But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.
To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step. The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step. When that completes, it continues to the next step. Pages that
fail any postprocessing step have PageError set. Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.
Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.
This may also be useful for other future f2fs features such as
compression.
Change-Id: I742be348b9dfc2113200bcc5366a84e978371a54
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs update from Jaegeuk Kim:
"In this round, we've mainly focused on performance tuning and critical
bug fixes occurred in low-end devices. Sheng Yong introduced
lost_found feature to keep missing files during recovery instead of
thrashing them. We're preparing coming fsverity implementation. And,
we've got more features to communicate with users for better
performance. In low-end devices, some memory-related issues were
fixed, and subtle race condtions and corner cases were addressed as
well.
Enhancements:
- large nat bitmaps for more free node ids
- add three block allocation policies to pass down write hints given by user
- expose extension list to user and introduce hot file extension
- tune small devices seamlessly for low-end devices
- set readdir_ra by default
- give more resources under gc_urgent mode regarding to discard and cleaning
- introduce fsync_mode to enforce posix or not
- nowait aio support
- add lost_found feature to keep dangling inodes
- reserve bits for future fsverity feature
- add test_dummy_encryption for FBE
Bug fixes:
- don't use highmem for dentry pages
- align memory boundary for bitops
- truncate preallocated blocks in write errors
- guarantee i_times on fsync call
- clear CP_TRIMMED_FLAG correctly
- prevent node chain loop during recovery
- avoid data race between atomic write and background cleaning
- avoid unnecessary selinux violation warnings on resgid option
- GFP_NOFS to avoid deadlock in quota and read paths
- fix f2fs_skip_inode_update to allow i_size recovery
In addition to the above, there are several minor bug fixes and clean-ups"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
42bf67fc54 f2fs: remain written times to update inode during fsync
6cb5aa02bf f2fs: make assignment of t->dentry_bitmap more readable
a8d07f1f9c f2fs: truncate preallocated blocks in error case
86444d6006 f2fs: fix a wrong condition in f2fs_skip_inode_update
db2188a687 f2fs: reserve bits for fs-verity
ee2e74b3f0 f2fs: Add a segment type check in inplace write
0192e0a450 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
49338842e9 f2fs: don't track new nat entry in nat set
d6a69d5e65 f2fs: clean up with F2FS_BLK_ALIGN
2c8834a7a2 f2fs: check blkaddr more accuratly before issue a bio
6ab573a9d9 f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
7419dcb8be f2fs: introduce a new mount option test_dummy_encryption
9321e22c03 f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
8a57196158 f2fs: release locks before return in f2fs_ioc_gc_range()
739ace131c f2fs: align memory boundary for bitops
4c55abe4f8 f2fs: remove unneeded set_cold_node()
30654507e0 f2fs: add nowait aio support
d909e94106 f2fs: wrap all options with f2fs_sb_info.mount_opt
5738be52b3 f2fs: Don't overwrite all types of node to keep node chain
0bdeb167c8 f2fs: introduce mount option for fsync mode
6bc490f0ee f2fs: fix to restore old mount option in ->remount_fs
0c9c3e0344 f2fs: wrap sb_rdonly with f2fs_readonly
6c6611223a f2fs: avoid selinux denial on CAP_SYS_RESOURCE
076a6f32fe f2fs: support hot file extension
58edcdbca6 f2fs: fix to avoid race in between atomic write and background GC
1e0aeb0af9 f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
10b2d001d6 f2fs: issue discard aggressively in the gc_urgent mode
a5052f32b9 f2fs: set readdir_ra by default
1aa536a624 f2fs: add auto tuning for small devices
0ffdffc8f1 f2fs: add mount option for segment allocation policy
b798298912 f2fs: don't stop GC if GC is contended
766d232169 f2fs: expose extension_list sysfs entry
98b329de50 f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
4d409fa334 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
1f6bac14c1 f2fs: remove redundant initialization of pointer 'p'
946aefc754 f2fs: flush cp pack except cp pack 2 page at first
e5081a52ac f2fs: clean up f2fs_sb_has_xxx functions
a292477154 f2fs: remove redundant check of page type when submit bio
190e64a819 f2fs: fix to handle looped node chain during recovery
889d980876 f2fs: handle quota for orphan inodes
92b12bb1a2 f2fs: support passing down write hints to block layer with F2FS policy
22fa74c2b0 f2fs: support passing down write hints given by users to block layer
180900373e f2fs: fix to clear CP_TRIMMED_FLAG
0671fae134 f2fs: support large nat bitmap
eceb943d5d f2fs: fix to check extent cache in f2fs_drop_extent_tree
2e2a339c98 f2fs: restrict inline_xattr_size configuration
41dda11641 f2fs: fix heap mode to reset it back
39575737bb f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
7e0e7995ee fscrypt: fix build with pre-4.6 gcc versions
31d3279a4f fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
82bec88856 fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
168a907828 fscrypt: calculate NUL-padding length in one place only
042ae9f4cf fscrypt: move fscrypt_symlink_data to fscrypt_private.h
f9550c24c2 fscrypt: remove fscrypt_fname_usr_to_disk()
7ac4756a24 f2fs: switch to fscrypt_get_symlink()
6b76f58e24 f2fs: switch to fscrypt ->symlink() helper functions
fd457d2c4e fscrypt: new helper function - fscrypt_get_symlink()
a1cdacb7ae fscrypt: new helper functions for ->symlink()
7f43602f4d fscrypt: trim down fscrypt.h includes
d9cadc11bd fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
e6fe930580 fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
efefa434f4 fscrypt: move fscrypt_operations declaration to fscrypt_supp.h
7ed178bc8a fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
3f16e09dad fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
8216a0b51a fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
dfe0b3b1b6 fscrypt: move fscrypt_control_page() to supp/notsupp headers
3a2c791778 fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
39ed8376d6 ("f2fs: don't put dentry page in pagecache into highmem")
Previous dentry page uses highmem, which will cause panic in platforms
using highmem (such as arm), since the address space of dentry pages
from highmem directly goes into the decryption path via the function
fscrypt_fname_disk_to_usr. But sg_init_one assumes the address is not
from highmem, and then cause panic since it doesn't call kmap_high but
kunmap_high is triggered at the end. To fix this problem in a simple
way, this patch avoids to put dentry page in pagecache into highmem.
Change-Id: I0c87dafb92fce72bf70403a15d28c73992c03203
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix coding style]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we've followed up to support some generic features such
as cgroup, block reservation, linking fscrypt_ops, delivering
write_hints, and some ioctls. And, we could fix some corner cases in
terms of power-cut recovery and subtle deadlocks.
Enhancements:
- bitmap operations to handle NAT blocks
- readahead to improve readdir speed
- switch to use fscrypt_*
- apply write hints for direct IO
- add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
- modify b_avail and b_free to consider root reserved blocks
- support cgroup writeback
- support FIEMAP_FLAG_XATTR for fibmap
- add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
- add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
- support inode creation time
Bug fixs:
- sysfile-based quota operations
- memory footprint accounting
- allow to write data on partial preallocation case
- fix deadlock case on fallocate
- fix to handle fill_super errors
- fix missing inode updates of fsync'ed file
- recover renamed file which was fsycn'ed before
- drop inmemory pages in corner error case
- keep last_disk_size correctly
- recover missing i_inline flags during roll-forward
Various clean-up patches were added as well"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
5f9b3abb91 f2fs: support inode creation time
9fb0de1751 f2fs: rebuild sit page from sit info in mem
1062a0c018 f2fs: stop issuing discard if fs is readonly
fa043fae90 f2fs: clean up duplicated assignment in init_discard_policy
b007190234 f2fs: use GFP_F2FS_ZERO for cleanup
35b11839a1 f2fs: allow to recover node blocks given updated checkpoint
e56500860b f2fs: recover some i_inline flags
64aa9569a1 f2fs: correct removexattr behavior for null valued extended attribute
70b3a923da f2fs: drop page cache after fs shutdown
8069a0e983 f2fs: stop gc/discard thread after fs shutdown
bb924f7777 f2fs: hanlde error case in f2fs_ioc_shutdown
700b53f21e f2fs: split need_inplace_update
f31d52811c f2fs: fix to update last_disk_size correctly
eeb0118b83 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
c1b74c9670 f2fs: clean up error path of fill_super
d5efd57e01 f2fs: avoid hungtask when GC encrypted block if io_bits is set
c4027d0843 f2fs: allow quota to use reserved blocks
18d267c273 f2fs: fix to drop all inmem pages correctly
4dca47531e f2fs: speed up defragment on sparse file
999f806a7c f2fs: support F2FS_IOC_PRECACHE_EXTENTS
84960fca96 f2fs: add an ioctl to disable GC for specific file
292c8e1cfd f2fs: prevent newly created inode from being dirtied incorrectly
58b1f5b0fc f2fs: support FIEMAP_FLAG_XATTR
6afa9a94d0 f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
10f4a4140b f2fs: check node page again in write end io
b203c58dfd f2fs: fix to caclulate required free section correctly
d49132d45c f2fs: handle newly created page when revoking inmem pages
2ce6b9d816 f2fs: add resgid and resuid to reserve root blocks
f53dcf6799 f2fs: implement cgroup writeback support
1338f376d5 f2fs: remove unused pend_list_tag
d4f19f6266 f2fs: avoid high cpu usage in discard thread
b78e9302e2 f2fs: make local functions static
62438ba87b f2fs: add reserved blocks for root user
06a366757f f2fs: check segment type in __f2fs_replace_block
4c6bc4be37 f2fs: update inode info to inode page for new file
591b336387 f2fs: show precise # of blocks that user/root can use
b242d7edc5 f2fs: clean up unneeded declaration
87b8168e9e f2fs: continue to do direct IO if we only preallocate partial blocks
2b4d859bd9 f2fs: enable quota at remount from r to w
54bf13a0ad f2fs: skip stop_checkpoint for user data writes
25ef3006ba f2fs: fix missing error number for xattr operation
cff2c7fe41 f2fs: recover directory operations by fsync
e2bb618a0a f2fs: return error during fill_super
8a2c11d865 f2fs: fix an error case of missing update inode page
cd38d5ada5 f2fs: fix potential hangtask in f2fs_trace_pid
e81cafbeba f2fs: no need return value in restore summary process
04d44000d6 f2fs: use unlikely for release case
925d0933d8 f2fs: don't return value in truncate_data_blocks_range
f7986c416d f2fs: clean up f2fs_map_blocks
e4f5e26cda f2fs: clean up hash codes
1f994d4708 f2fs: fix error handling in fill_super
e7db649b5f f2fs: spread f2fs_k{m,z}alloc
5d4e487b99 f2fs: inject fault to kvmalloc
8b33886c37 f2fs: inject fault to kzalloc
d946807987 f2fs: remove a redundant conditional expression
3bc01114a3 f2fs: apply write hints to select the type of segment for direct write
c80f019591 f2fs: switch to fscrypt_prepare_setattr()
bb8b850365 f2fs: switch to fscrypt_prepare_lookup()
9ab470eaf8 f2fs: switch to fscrypt_prepare_rename()
aeaac517a1 f2fs: switch to fscrypt_prepare_link()
101c6a96ad f2fs: switch to fscrypt_file_open()
6d025237a1 f2fs: remove repeated f2fs_bug_on
b01e03d724 f2fs: remove an excess variable
e1f9be2f7c f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
e5c7c86010 f2fs: remove unused parameter
f130dbb98a f2fs: still write data if preallocate only partial blocks
47ee9b2598 f2fs: introduce sysfs readdir_ra to readahead inode block in readdir
55e2f89181 f2fs: fix concurrent problem for updating free bitmap
e1398f6554 f2fs: remove unneeded memory footprint accounting
2d69561135 f2fs: no need to read nat block if nat_block_bitmap is set
4dd2d07338 f2fs: reserve nid resource for quota sysfile
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Due to the merge of 4.4.116, there is a build error in f2fs due to
inode_nohighmem() being defined twice. This patch removes the f2fs-only
instance of the function as it's no longer needed.
Bug: 72320324
Change-Id: If14f1e167498bceb2e434420181923952f7748ba
Cc: Jaegeuk Kim <jaegeuk@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:
ba1ade7101 fscrypt: resolve some cherry-pick bugs
9e32f17d24 fscrypt: move to generic async completion
4ecacbed6e crypto: introduce crypto wait for async op
42d89da82b fscrypt: lock mutex before checking for bounce page pool
2286508d17 fscrypt: new helper function - fscrypt_prepare_setattr()
5cbdd42ad2 fscrypt: new helper function - fscrypt_prepare_lookup()
a31feba5c1 fscrypt: new helper function - fscrypt_prepare_rename()
95efafb623 fscrypt: new helper function - fscrypt_prepare_link()
2b4b4f98dd fscrypt: new helper function - fscrypt_file_open()
8c815f381c fscrypt: new helper function - fscrypt_require_key()
272e435025 fscrypt: remove unneeded empty fscrypt_operations structs
1034eeec51 fscrypt: remove ->is_encrypted()
32c0d3ae9d fscrypt: switch from ->is_encrypted() to IS_ENCRYPTED()
a4781dd1f1 fs, fscrypt: add an S_ENCRYPTED inode flag
ff0a3dbc93 fscrypt: clean up include file mess
bc4a61c60b fscrypt: fix dereference of NULL user_key_payload
a53dc7e005 fscrypt: make ->dummy_context() return bool
Change-Id: I461d742adc7b77177df91429a1fd9c8624a698d6
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
Pull f2fs updates from Jaegeuk Kim:
"In this round, we introduce sysfile-based quota support which is
required for Android by default. In addition, we allow that users are
able to reserve some blocks in runtime to mitigate performance drops
in low free space.
Enhancements:
- assign proper data segments according to write_hints given by user
- issue cache_flush on dirty devices only among multiple devices
- exploit cp_error flag and add more faults to enhance fault
injection test
- conduct more readaheads during f2fs_readdir
- add a range for discard commands
Bug fixes:
- fix zero stat->st_blocks when inline_data is set
- drop crypto key and free stale memory pointer while evict_inode is
failing
- fix some corner cases in free space and segment management
- fix wrong last_disk_size
This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"
Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:
56a07b0705 f2fs: deny accessing encryption policy if encryption is off
c394842e26 f2fs: inject fault in inc_valid_node_count
9262922510 f2fs: fix to clear FI_NO_PREALLOC
e6cfc5de2d f2fs: expose quota information in debugfs
c4cd2efe83 f2fs: separate nat entry mem alloc from nat_tree_lock
48c72b4c8c f2fs: validate before set/clear free nat bitmap
baf9275a4b f2fs: avoid opened loop codes in __add_ino_entry
47af6c72d9 f2fs: apply write hints to select the type of segments for buffered write
ac98191605 f2fs: introduce scan_curseg_cache for cleanup
ca28e9670e f2fs: optimize the way of traversing free_nid_bitmap
460688b59e f2fs: keep scanning until enough free nids are acquired
0186182c0c f2fs: trace checkpoint reason in fsync()
5d4b6efcfd f2fs: keep isize once block is reserved cross EOF
3c8f767e13 f2fs: avoid race in between GC and block exchange
4423778adf f2fs: save a multiplication for last_nid calculation
3e3b405575 f2fs: fix summary info corruption
44889e4879 f2fs: remove dead code in update_meta_page
55c7b9595b f2fs: remove unneeded semicolon
8b92814117 f2fs: don't bother with inode->i_version
42c7c71824 f2fs: check curseg space before foreground GC
c5470498e5 f2fs: use rw_semaphore to protect SIT cache
82750d346a f2fs: support quota sys files
26dfec49b2 f2fs: add quota_ino feature infra
ddb8e2ae98 f2fs: optimize __update_nat_bits
f46ae958c7 f2fs: modify for accurate fggc node io stat
c713fdb5a2 Revert "f2fs: handle dirty segments inside refresh_sit_entry"
873ec505cb f2fs: add a function to move nid
ae66786296 f2fs: export SSR allocation threshold
90c28a18d2 f2fs: give correct trimmed blocks in fstrim
5612922fb0 f2fs: support bio allocation error injection
583b7a274c f2fs: support get_page error injection
09a073cc8c f2fs: add missing sysfs description
e945474a9c f2fs: support soft block reservation
b7b2e629b6 f2fs: handle error case when adding xattr entry
7368e30495 f2fs: support flexible inline xattr size
ada4061e19 f2fs: show current cp state
5b8ff1301a f2fs: add missing quota_initialize
46d4a691f0 f2fs: show # of dirty segments via sysfs
fc13f9d7ce f2fs: stop all the operations by cp_error flag
91bea0c391 f2fs: remove several redundant assignments
807486c795 f2fs: avoid using timespec
03b1cb0bb4 f2fs: fix to correct no_fggc_candidate
5c15033cea Revert "f2fs: return wrong error number on f2fs_quota_write"
5f5f593222 f2fs: remove obsolete pointer for truncate_xattr_node
032a690682 f2fs: retry ENOMEM for quota_read|write
171b638fc4 f2fs: limit # of inmemory pages
83ed7a615f f2fs: update ctx->pos correctly when hitting hole in directory
4d6e68be25 f2fs: relocate readahead codes in readdir()
c8be47b540 f2fs: allow readdir() to be interrupted
2b903fe94c f2fs: trace f2fs_readdir
bb0db666d4 f2fs: trace f2fs_lookup
40d6250f04 f2fs: skip searching non-exist range in truncate_hole
8e84f379df f2fs: expose some sectors to user in inline data or dentry case
cb98f70dea f2fs: avoid stale fi->gdirty_list pointer
5562a3c539 f2fs/crypto: drop crypto key at evict_inode only
85853e7e38 f2fs: fix to avoid race when accessing last_disk_size
0c47a892d5 f2fs: Fix bool initialization/comparison
68e801abc5 f2fs: give up CP_TRIMMED_FLAG if it drops discards
df74eacb20 f2fs: trace f2fs_remove_discard
bd502c6e3e f2fs: reduce cmd_lock coverage in __issue_discard_cmd
a34ab5ca4f f2fs: split discard policy
1e65afd14d f2fs: wrap discard policy
684447dad1 f2fs: support issuing/waiting discard in range
27eaad0938 f2fs: fix to flush multiple device in checkpoint
08bb9d68d5 f2fs: enhance multiple device flush
9c2526ac2e f2fs: fix to show ino management cache size correctly
814b463d26 f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
f555b0a117 f2fs: obsolete ALLOC_NID_LIST list
75d3164ae1 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
4de0ceb6b7 f2fs: allow readpages with NULL file pointer
322a45d172 f2fs: show flush list status in sysfs
6d625a93b4 f2fs: introduce read_xattr_block
8ea6e1c327 f2fs: introduce read_inline_xattr
dbce11e9ee Revert "f2fs: reuse nids more aggressively"
131bc9f6b7 Revert "f2fs: node segment is prior to data segment selected victim"
Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
commit d41519a69b35b10af7fda867fb9100df24fdf403 upstream.
On sparc, if we have an alloca() like situation, as is the case with
SHASH_DESC_ON_STACK(), we can end up referencing deallocated stack
memory. The result can be that the value is clobbered if a trap
or interrupt arrives at just the right instruction.
It only occurs if the function ends returning a value from that
alloca() area and that value can be placed into the return value
register using a single instruction.
For example, in lib/libcrc32c.c:crc32c() we end up with a return
sequence like:
return %i7+8
lduw [%o5+16], %o0 ! MEM[(u32 *)__shash_desc.1_10 + 16B],
%o5 holds the base of the on-stack area allocated for the shash
descriptor. But the return released the stack frame and the
register window.
So if an intererupt arrives between 'return' and 'lduw', then
the value read at %o5+16 can be corrupted.
Add a data compiler barrier to work around this problem. This is
exactly what the gcc fix will end up doing as well, and it absolutely
should not change the code generated for other cpus (unless gcc
on them has the same bug :-)
With crucial insight from Eric Sandeen.
Reported-by: Anatoly Pugachev <matorola@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As Ju Hyung Park reported:
"When 'fstrim' is called for manual trim, a BUG() can be triggered
randomly with this patch.
I'm seeing this issue on both x86 Desktop and arm64 Android phone.
On x86 Desktop, this was caused during Ubuntu boot-up. I have a
cronjob installed which calls 'fstrim -v /' during boot. On arm64
Android, this was caused during GC looping with 1ms gc_min_sleep_time
& gc_max_sleep_time."
Root cause of this issue is that f2fs_wait_discard_bios can only be
used by f2fs_put_super, because during put_super there must be no
other referrers, so it can ignore discard entry's reference count
when removing the entry, otherwise in other caller we will hit bug_on
in __remove_discard_cmd as there may be other issuer added reference
count in discard entry.
Thread A Thread B
- issue_discard_thread
- f2fs_ioc_fitrim
- f2fs_trim_fs
- f2fs_wait_discard_bios
- __issue_discard_cmd
- __submit_discard_cmd
- __wait_discard_cmd
- dc->ref++
- __wait_one_discard_bio
- __wait_discard_cmd
- __remove_discard_cmd
- f2fs_bug_on(sbi, dc->ref)
Change-Id: I8fb5c8215e6222ae853e7781218d5084e1f11166
Fixes: 969d1b180d987c2be02de890d0fff0f66a0e80de
Reported-by: Ju Hyung Park <qkrwngud825@gmail.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(cherry picked from commit 638164a2718f337ea224b747cf5977ef143166a4)
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.
Changes include:
commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")
Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
commit b01a92019cac30398ef75b560d2668b399f4e393 upstream.
This patch simply cleans up the names for flush/discard commands.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 599a09b2c1ac222e6aad0c22515d1ccde7c3b702 upstream.
This patch adds a mirror for nat version bitmap, and use it to detect
in-memory bitmap corruption which may be caused by bit-transition of
cache or memory overflow.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 5fe457430e554a2f5188f13c1a2e36ad845640c5 upstream.
This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 25290fa5591d81767713db304e0d567bf991786f upstream.
If there is no candidate to submit discard command during f2sf_trim_fs, let's
return without checkpoint.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 4e6a8d9b224f886362ea6e8f6046b541437c944f upstream.
This patch relaxes async discard commands to avoid waiting its end_io during
checkpoint.
Instead of waiting them during checkpoint, it will be done when actually reusing
them.
Test on initial partition of nvme drive.
# time fstrim /mnt/test
Before : 6.158s
After : 4.822s
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 26a28a0c1eb756ba18bfb1f93309c4b4406b9cd9 upstream.
This patch adds to show the max number of atomic operations which are
conducting concurrently.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 0a595ebaaa6b53a2226d3fee2a2fd616ea5ba378 upstream.
This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.
Note that,
- it requires "-o mode=lfs".
- IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
- read IO is still 4KB.
- do checkpoint at fsync, if dummy NODE page was written.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 275b66b09e85cf0520dc610dd89706952751a473 upstream.
This patch is based on commit 275b66b09e85 (f2fs: support async discard).
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 70fd76140a6cb63262bd47b68d57b42e889c10ee upstream.
This patch backported ("block,fs: use REQ_* flags directly")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 5eba8c5d1fb3af28b2073ba5228d4998196c1bcc upstream.
f2fs_sync_file() remount_ro
- f2fs_readonly
- destroy_flush_cmd_control
- f2fs_issue_flush
- no fcc pointer!
So, this patch doesn't free fcc in this case, but just stop its kernel thread
which sends flush commands.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 204706c7accfabb67b97eef9f9a28361b6201199 upstream.
This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76.
The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 26787236b36660baf4d136281d40b5bb33a570ec upstream.
If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.
This will resolve the below scenario.
1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file
Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 97dd26ad834739d4e4ea35fd7ab5f92824de4cbb upstream.
If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 281518c694a5228d6c46fac83529fb3e2c331281 upstream.
For below two cases, we can't guarantee data consistence:
a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered
b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered
The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.
Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 04d47e673863c637a2b44ad34a558aeb5d0a727e upstream.
Thread A Thread B Thread C
- f2fs_create
- f2fs_new_inode
- f2fs_lock_op
- alloc_nid
alloc last nid
- f2fs_unlock_op
- f2fs_create
- f2fs_new_inode
- f2fs_lock_op
- alloc_nid
as node count still not
be increased, we will
loop in alloc_nid
- f2fs_write_node_pages
- f2fs_balance_fs_bg
- f2fs_sync_fs
- write_checkpoint
- block_operations
- f2fs_lock_all
- f2fs_lock_op
While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.
In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 36951b38d13ac7cce9fcf89e0e01c22ed0d05688 upstream.
Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.
Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 7702bdbe505a22380dd958e2ee35124c7c414806 upstream.
If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 3c62be17d4f562f43fe1d03b48194399caa35aa5 upstream.
This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.
Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 6ae1be13e85f4c42c8ca371fda50ae39eebbfd96 upstream.
Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.
Revert: 38aa0889b2 (f2fs: align direct_io'ed data to section)
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 178053e2f1f9ccdb61ff6c2bd8644b53fc98e72e upstream.
With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 96ba2decb4241aa2c6b61cfc8489d648769eff99 upstream.
Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 0bfd7a091c19132489a0f977b8dbf9f6b5ae0a1c upstream.
SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit ed6bd4b146527e7c6934e3582c47d7b857802676 upstream.
Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when
error ocurrs, user may just list part of dirents in target directory
without any hints.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 35782b233f37e48ecc469d9c7232f3f6a7fad41a upstream.
This patch removes percpu_count usage due to performance regression in iozone.
Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 7c45729a4d6d1c90879e6c5c2df325c2f6db7191 upstream.
This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.
The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount
In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 02027d42c3f747945f19111d3da2092ed2148ac8 upstream.
This is for backport only.
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 3a2ad5672bb36ee9c07bab97dadc8b0f70d391f4 upstream.
Let build_free_nids support sync/async methods, in allocation flow of nids,
we use synchronuous method, so that we can avoid looping in alloc_nid when
free memory is low; in unblock_operations and f2fs_balance_fs_bg we use
asynchronuous method in where low memory condition can interrupt us.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit b8559dc242d1d47dcf99660a4d6afded727e0cc0 upstream.
During free nid allocation, in order to do preallocation, we will tag free
nid entry as allocated one and still leave it in free nid list, for other
allocators who want to grab free nids, it needs to traverse the free nid
list for lookup. It becomes overhead in scenario of allocating free nid
intensively by multithreads.
This patch splits free nid list to two list: {free,alloc}_nid_list, to
keep free nids and preallocated free nids separately, after that, traverse
latency will be gone, besides split nid_cnt for separate statistic.
Additionally, introduce __insert_nid_to_list and __remove_nid_from_list for
cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: modify f2fs_bug_on to avoid needless branches]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 8508e44ae98622f841f5ef29d0bf3d5db4e0c1cc upstream.
We don't guarantee cp_addr is fixed by cp_version.
This is to sync with f2fs-tools.
Cc: stable@vger.kernel.org
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
commit 6332cd32c8290a80e929fc044dc5bdba77396e33 upstream.
If user has no key under an encrypted dir, fscrypt gives digested dentries.
Previously, when looking up a dentry, f2fs only checks its hash value with
first 4 bytes of the digested dentry, which didn't handle hash collisions fully.
This patch enhances to check entire dentry bytes likewise ext4.
Eric reported how to reproduce this issue by:
# seq -f "edir/abcdefghijklmnopqrstuvwxyz012345%.0f" 100000 | xargs touch
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
100000
# sync
# echo 3 > /proc/sys/vm/drop_caches
# keyctl new_session
# find edir -type f | xargs stat -c %i | sort | uniq | wc -l
99999
Cc: <stable@vger.kernel.org>
Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
(fixed f2fs_dentry_hash() to work even when the hash is 0)
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b53cf9815bb4744958d41f3795d5d5a1d365e2d upstream.
Filesystem encryption ostensibly supported revoking a keyring key that
had been used to "unlock" encrypted files, causing those files to become
"locked" again. This was, however, buggy for several reasons, the most
severe of which was that when key revocation happened to be detected for
an inode, its fscrypt_info was immediately freed, even while other
threads could be using it for encryption or decryption concurrently.
This could be exploited to crash the kernel or worse.
This patch fixes the use-after-free by removing the code which detects
the keyring key having been revoked, invalidated, or expired. Instead,
an encrypted inode that is "unlocked" now simply remains unlocked until
it is evicted from memory. Note that this is no worse than the case for
block device-level encryption, e.g. dm-crypt, and it still remains
possible for a privileged user to evict unused pages, inodes, and
dentries by running 'sync; echo 3 > /proc/sys/vm/drop_caches', or by
simply unmounting the filesystem. In fact, one of those actions was
already needed anyway for key revocation to work even somewhat sanely.
This change is not expected to break any applications.
In the future I'd like to implement a real API for fscrypt key
revocation that interacts sanely with ongoing filesystem operations ---
waiting for existing operations to complete and blocking new operations,
and invalidating and sanitizing key material and plaintext from the VFS
caches. But this is a hard problem, and for now this bug must be fixed.
This bug affected almost all versions of ext4, f2fs, and ubifs
encryption, and it was potentially reachable in any kernel configured
with encryption support (CONFIG_EXT4_ENCRYPTION=y,
CONFIG_EXT4_FS_ENCRYPTION=y, CONFIG_F2FS_FS_ENCRYPTION=y, or
CONFIG_UBIFS_FS_ENCRYPTION=y). Note that older kernels did not use the
shared fs/crypto/ code, but due to the potential security implications
of this bug, it may still be worthwhile to backport this fix to them.
Fixes: b7236e21d5 ("ext4 crypto: reorganize how we store keys in the inode")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Acked-by: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
different competitors
Since we use different page cache (normally inode's page cache for R/W
and meta inode's page cache for GC) to cache the same physical block
which is belong to an encrypted inode. Writeback of these two page
cache should be exclusive, but now we didn't handle writeback state
well, so there may be potential racing problem:
a)
kworker: f2fs_gc:
- f2fs_write_data_pages
- f2fs_write_data_page
- do_write_data_page
- write_data_page
- f2fs_submit_page_mbio
(page#1 in inode's page cache was queued
in f2fs bio cache, and be ready to write
to new blkaddr)
- gc_data_segment
- move_encrypted_block
- pagecache_get_page
(page#2 in meta inode's page cache
was cached with the invalid datas
of physical block located in new
blkaddr)
- f2fs_submit_page_mbio
(page#1 was submitted, later, page#2
with invalid data will be submitted)
b)
f2fs_gc:
- gc_data_segment
- move_encrypted_block
- f2fs_submit_page_mbio
(page#1 in meta inode's page cache was
queued in f2fs bio cache, and be ready
to write to new blkaddr)
user thread:
- f2fs_write_begin
- f2fs_submit_page_bio
(we submit the request to block layer
to update page#2 in inode's page cache
with physical block located in new
blkaddr, so here we may read gabbage
data from new blkaddr since GC hasn't
writebacked the page#1 yet)
This patch fixes above potential racing problem for encrypted inode.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After finishing building free nid cache, we will try to readahead
asynchronously 4 more pages for the next reloading, the count of
readahead nid pages is fixed.
In some case, like SMR drive, read less sectors with fixed count
each time we trigger RA may be low efficient, since we will face
high seeking overhead, so we'd better let user to configure this
parameter from sysfs in specific workload.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now, we use ra_meta_pages to reads continuous physical blocks as much as
possible to improve performance of following reads. However, ra_meta_pages
uses a synchronous readahead approach by submitting bio with READ, as READ
is with high priority, it can not be used in the case of preloading blocks,
and it's not sure when these RAed pages will be used.
This patch supports asynchronous readahead in ra_meta_pages by tagging bio
with READA flag in order to allow preloading.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In recovery or checkpoint flow, we grab pages temperarily in meta inode's
mapping for caching temperary data, actually, datas in these pages were
not meta data of f2fs, but still we tag them with REQ_META flag. However,
lower device like eMMC may do some optimization for data of such type.
So in order to avoid wrong optimization, we'd better remove such flag
for temperary non-meta pages.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>