IS_ENCRYPTED() now gives the same information as
i_sb->s_cop->is_encrypted() but is more efficient, since IS_ENCRYPTED()
is just a simple flag check. Prepare to remove ->is_encrypted() by
switching all callers to IS_ENCRYPTED().
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Introduce a flag S_ENCRYPTED which can be set in ->i_flags to indicate
that the inode is encrypted using the fscrypt (fs/crypto/) mechanism.
Checking this flag will give the same information that
inode->i_sb->s_cop->is_encrypted(inode) currently does, but will be more
efficient. This will be useful for adding higher-level helper functions
for filesystems to use. For example we'll be able to replace this:
if (ext4_encrypted_inode(inode)) {
ret = fscrypt_get_encryption_info(inode);
if (ret)
return ret;
if (!fscrypt_has_encryption_key(inode))
return -ENOKEY;
}
with this:
ret = fscrypt_require_key(inode);
if (ret)
return ret;
... since we'll be able to retain the fast path for unencrypted files as
a single flag check, using an inline function. This wasn't possible
before because we'd have had to frequently call through the
->i_sb->s_cop->is_encrypted function pointer, even when the encryption
support was disabled or not being used.
Note: we don't define S_ENCRYPTED to 0 if CONFIG_FS_ENCRYPTION is
disabled because we want to continue to return an error if an encrypted
file is accessed without encryption support, rather than pretending that
it is unencrypted.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Acked-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Filesystems have to include different header files based on whether they
are compiled with encryption support or not. That's nasty and messy.
Instead, rationalise the headers so we have a single include fscrypt.h
and let it decide what internal implementation to include based on the
__FS_HAS_ENCRYPTION define. Filesystems set __FS_HAS_ENCRYPTION to 1
before including linux/fscrypt.h if they are built with encryption
support. Otherwise, they must set __FS_HAS_ENCRYPTION to 0.
Add guards to prevent fscrypt_supp.h and fscrypt_notsupp.h from being
directly included by filesystems.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
[EB: use 1 and 0 rather than defined/undefined]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
When an fscrypt-encrypted file is opened, we request the file's master
key from the keyrings service as a logon key, then access its payload.
However, a revoked key has a NULL payload, and we failed to check for
this. request_key() *does* skip revoked keys, but there is still a
window where the key can be revoked before we acquire its semaphore.
Fix it by checking for a NULL payload, treating it like a key which was
already revoked at the time it was requested.
Fixes: 88bd6ccdcd ("ext4 crypto: add encryption key management facilities")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: <stable@vger.kernel.org> [v4.1+]
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
This makes it consistent with ->is_encrypted(), ->empty_dir(), and
fscrypt_dummy_context_enabled().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
We need to clear FI_NO_PREALLOC flag in error path of f2fs_file_write_iter,
otherwise we will lose the chance to preallocate blocks in latter write()
at one time.
Fixes: dc91de78e5e1 ("f2fs: do not preallocate blocks which has wrong buffer")
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch splits memory allocation part in nat_entry to avoid lock contention.
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In flush_nat_entries, all dirty nats will be flushed and if
their new address isn't NULL_ADDR, their bitmaps will be updated,
the free_nid_count of the bitmaps will be increaced regardless
of whether the nats have already been occupied before.
This could lead to wrong free_nid_count.
So this patch checks the status of the bits beforeactually
set/clear them.
Fixes: 586d1492f301 ("f2fs: skip scanning free nid bitmap of full NAT blocks")
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We will keep __add_ino_entry success all the time, for ENOMEM failure
case, we have already handled it by using __GFP_NOFAIL flag, so we
don't have to use additional opened loop codes here, remove them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Write hints helps F2FS to determine which type of segments would be
selected for buffered write.
This patch implements the mapping from write hints to segment types
as shown below.
hints segment type
----- ------------
WRITE_LIFE_SHORT CURSEG_HOT_DATA
WRITE_LIFE_EXTREME CURSEG_COLD_DATA
others CURSEG_WARM_DATA
the F2FS poliy for hot/cold seperation has precedence over this hints.
And hints are not applied in in-place update.
Signed-off-by: Hyunchul Lee <cheol.lee@lge.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Commit 4ac912427c42 ("f2fs: introduce free nid bitmap") copied codes
from __build_free_nids() into scan_free_nid_bits(), they are redundant,
introduce one common function scan_curseg_cache for cleanup.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We call scan_free_nid_bits only when there isn't many
free nids left, it means that marked bits in free_nid_bitmap
are supposed to be few, use find_next_bit_le is more
efficient in such case.
According to my tests, use find_next_bit_le instead of
test_bit_le will cut down the traversal time to one
third of its original.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
In current version, after scan_free_nid_bits, the scan is over if
nid_cnt[FREE_NID] != 0. In most cases, there are still free nids in the
free list during the scan, and scan_free_nid_bits usually can't increase
nid_cnt[FREE_NID]. It causes that __build_free_nids is called many times
without solving the shortage of the free nids. This patch fixes that.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch slightly changes need_do_checkpoint to return the detail
info that indicates why we need do checkpoint, then caller could print
it with trace message.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Without FADVISE_KEEP_SIZE_BIT, we will try to recover file size
according to last non-hole block, so in fallocate(), we must set
FADVISE_KEEP_SIZE_BIT flag once we have preallocated block cross
EOF, instead of when all preallocation is success. Otherwise, file
size will be incorrect due to lack of this flag.
Simple testcase to reproduce this:
1. echo 2 > /sys/fs/f2fs/<device>/inject_type
2. echo 10 > /sys/fs/f2fs/<device>/inject_rate
3. run tests/generic/392
4. disable fault injection
5. do remount
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
During block exchange in {insert,collapse,move}_range, page-block mapping
is unstable due to mapping moving or recovery, so there should be no
concurrent cache read operation rely on such mapping, nor cache write
operation to mess up block exchange.
So this patch let background GC be aware of that.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Use a slightly easier way to calculate last_nid.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Sometimes, after running generic/270 of fstest, fsck reports summary
info and actual position of block address in direct node becoming
inconsistent.
The root cause is race in between __f2fs_replace_block and change_curseg
as below:
Thread A Thread B
- __clone_blkaddrs
- f2fs_replace_block
- __f2fs_replace_block
- segnoA = GET_SEGNO(sbi, blkaddrA);
- type = se->type:=CURSEG_HOT_DATA
- if (!IS_CURSEG(sbi, segnoA))
type = CURSEG_WARM_DATA
- allocate_data_block
- allocate_segment
- get_ssr_segment
- change_curseg(segnoA, CURSEG_HOT_DATA)
- change_curseg(segnoA, CURSEG_WARM_DATA)
- reset_curseg
- __set_sit_entry_type
- change se->type from CURSEG_HOT_DATA to CURSEG_WARM_DATA
So finally, hot curseg locates in segnoA, but type of segnoA becomes
CURSEG_WARM_DATA.
Then if we invoke __f2fs_replace_block(blkaddrB, blkaddrA, true, false),
as blkaddrA locates in segnoA, so we will move warm type curseg to segnoA,
then change its summary cache and writeback it to summary block.
But segnoA is used by hot type curseg too, once it moves or persist, it
will cover summary block content with inner old summary cache, result in
inconsistent status.
This patch tries to fix this issue by introduce global curseg lock to avoid
race in between __f2fs_replace_block and change_curseg.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
After commit a468f0ef516f ("f2fs: use crc and cp version to determine
roll-forward recovery"), last caller of update_meta_page passing @src
with NULL is gone, so remove related dead code there.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
f2fs does not set the SB_I_VERSION flag, so the i_version will never
be incremented on write. It was recently changed to increment the
i_version on a quota write, which isn't necessary here.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
When we are closing to trigger foreground GC, if there are only a few
of dirty metas, we can log these dirty metas in left space of opened
segments instead of triggering foreground GC.
With this patch, total count of foreground GC triggered by
test/generic/* of fstest suit reduce from 254 to 184.
So let's do the check before foreground GC anyway.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are some cases user didn't update SIT cache under this lock,
so let's use rw_semaphore instead of mutex to enhance concurrently
accessing.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch supports hidden quota files in the system, which will be used for
Android. It requires up-to-date f2fs-tools later than v1.9.0.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds quota_ino feature infra to be used for quota files.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Make three modification for __update_nat_bits:
1. Take the codes of dealing the nat with nid 0 out of the loop
Such nat only needs to be dealt with once at beginning.
2. Use " nat_index == 0" instead of " start_nid == 0" to decide if it's the first nat block
It's better that we don't assume @start_nid is the first nid of the nat block it's in.
3. Use " if (nat_blk->entries[i].block_addr != NULL_ADDR)" to explicitly comfirm the value of block_addr
use constant to make sure the codes is right, even if the value of NULL_ADDR changes.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
modify for accurate fggc node io stat
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit 5e443818fa
The commit should be reverted because call sequence of below two parts
of code must be kept:
a. update sit information, it needs to be updated before segment
allocation since latter allocation may trigger SSR, and SSR allocation
needs latest valid block information of all segments.
b. update segment status, it needs to be updated after segment allocation
since we can skip updating current opened segment status.
Fixes: 5e443818fa ("f2fs: handle dirty segments inside refresh_sit_entry")
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: remove refresh_sit_entry function]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch add a new function to move nid from one state to another.
Move operation is heavily used, by adding a new function for it
we can cut down some branches from several flow.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch exports min_ssr_segments threshold in sysfs to let user
control triggering SSR allocation flexibly.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
We have supported to issue discard in specified range during fstrim,
it needs to return caller with successfully trimmed bytes in that
range instead of bytes of invalid blocks which are scanned in
checkpoint.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to support bio allocation error injection to simulate
out-of-memory test scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to support get_page error injection to simulate
out-of-memory test scenario.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are some missing sysfs entries' description in document, add them.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
It supports to extend reserved_blocks sysfs interface to be soft
threshold, which allows user configure it exceeding current available
user space. This patch also introduces a new sysfs interface called
current_reserved_blocks, which shows the current blocks that have
already been reserved.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch fixes recovering incomplete xattr entries remaining in inline xattr
and xattr block, caused by any kind of errors.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
Now, in product, more and more features based on file encryption were
introduced, their demand of xattr space is increasing, however, inline
xattr has fixed-size of 200 bytes, once inline xattr space is full, new
increased xattr data would occupy additional xattr block which may bring
us more space usage and performance regression during persisting.
In order to resolve above issue, it's better to expand inline xattr size
flexibly according to user's requirement.
So this patch introduces new filesystem feature 'flexible inline xattr',
and new mount option 'inline_xattr_size=%u', once mkfs enables the
feature, we can use the option to make f2fs supporting flexible inline
xattr size.
To support this feature, we add extra attribute i_inline_xattr_size in
inode layout, indicating that how many space inline xattr borrows from
block address mapping space in inode layout, by this, we can easily
locate and store flexible-sized inline xattr data in inode.
Inode disk layout:
+----------------------+
| .i_mode |
| ... |
| .i_ext |
+----------------------+
| .i_extra_isize |
| .i_inline_xattr_size |-----------+
| ... | |
+----------------------+ |
| .i_addr | |
| - block address or | |
| - inline data | |
+----------------------+<---+ v
| inline xattr | +---inline xattr range
+----------------------+<---+
| .i_nid |
+----------------------+
| node_footer |
| (nid, ino, offset) |
+----------------------+
Note that, we have to cnosider backward compatibility which reserved
inline_data space, 200 bytes, all the time, reported by Sheng Yong.
Previous inline data or directory always reserved 200 bytes in inode layout,
even if inline_xattr is disabled. In order to keep inline_dentry's structure
for backward compatibility, we get the space back only from inline_data.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Reported-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds to call quota_intialize in f2fs_set_acl, f2fs_unlink,
and f2fs_rename.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch adds one sysfs entry to show # of dirty segments which can be
used for gc timing by user.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch replaces to use cp_error flag instead of RDONLY for quota off.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There are several assignments to variables that are redundant
as the values are never read when the variables are updated later
and so the redundant statements can be safely removed.
Cleans up clang warnings:
fs/f2fs/segment.c:923:19: warning: Value stored to 'p' during its initialization is never read
fs/f2fs/segment.c:2060:2: warning: Value stored to 'hint' is never read
fs/f2fs/segment.c:2353:2: warning: Value stored to 'start_block' is never read
fs/f2fs/segment.c:2354:2: warning: Value stored to 'end_block' is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
All uses of timespec are deprecated, and this one is not particularly
useful, as the documented method for converting seconds to jiffies
is to multiply by 'HZ'.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
There may be extreme case as below:
For one section contains one segment, and there are total 100 segments
with 10% over-privision ratio in f2fs partition, fggc_threshold will
be rounded down to 460 instead of 460.8 as below caclulation:
sbi->fggc_threshold = div_u64((u64)(main_count - ovp_count) *
BLKS_PER_SEC(sbi), (main_count - resv_count));
If section usage is as:
60 segments which contain 460 valid blocks
40 segments which contain 462 valid blocks
As valid block number in all sections is large than fggc_threshold, so
none of them will be chosen as candidate due to incorrect fggc_threshold.
Let's just soften the term of choosing foreground GC candidates.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This reverts commit 4f31d26b0c17f2aae6a6afeb823a87e20671ab4b.
It turns out that we need to report error number if nothing was written.
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
This patch removes obosolete parameter for truncate_xattr_node.
Suggested-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>