Commit graph

565831 commits

Author SHA1 Message Date
Sheng Yong
7aff5c69da f2fs: do not check F2FS_INLINE_DOTS in recover
Only dir may have F2FS_INLINE_DOTS flag, so there is no need to check
the flag in recover flow.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:45 -07:00
Sheng Yong
23d00b0287 f2fs: remove duplicated dquot_initialize and fix error handling
This patch removes duplicated dquot_initialize in recover_orphan_inode(),
and fix the error handling if dquot_initialize fails.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:43 -07:00
Yunlei He
937f4ef79e f2fs: stop issue discard if something wrong with f2fs
v4->v5: move data corruption check to __submit_discard_cmd, in order to
control discard io submitted more accurately, besides, increase async
thread wait time if data corruption detected.

This patch stop async thread and umount process to issue discard
if something wrong with f2fs, which is similar to fstrim.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:41 -07:00
Chao Yu
a6d74bb282 f2fs: fix return value in f2fs_ioc_commit_atomic_write
In f2fs_ioc_commit_atomic_write, if file is volatile, return -EINVAL to
indicate that commit failure.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:40 -07:00
Yunlei He
258489ec52 f2fs: allocate hot_data for atomic write more strictly
If a file not set type as hot, has dirty pages more than
threshold 64 before starting atomic write, may be lose hot
flag.

v1->v2: move set FI_ATOMIC_FILE flag behind flush dirty pages too,
in case of dirty pages before starting atomic use atomic mode to
write back.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:38 -07:00
Sheng Yong
aa857e0f3b f2fs: check if inmem_pages list is empty correctly
`cur' will never be NULL, we should check inmem_pages list instead.

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:36 -07:00
Chao Yu
9d77ded0a7 f2fs: fix race in between GC and atomic open
Thread					GC thread
- f2fs_ioc_start_atomic_write
 - get_dirty_pages
 - filemap_write_and_wait_range
					- f2fs_gc
					 - do_garbage_collect
					  - gc_data_segment
					   - move_data_page
					    - f2fs_is_atomic_file
					    - set_page_dirty
 - set_inode_flag(, FI_ATOMIC_FILE)

Dirty data page can still be generated by GC in race condition as
above call stack.

This patch adds fi->dio_rwsem[WRITE] in f2fs_ioc_start_atomic_write
to avoid such race.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:35 -07:00
Zhikang Zhang
0d17eb90b5 f2fs: change le32 to le16 of f2fs_inode->i_extra_size
In the structure of f2fs_inode, i_extra_size's type is __le16,
so we should keep type consistent when using it.

Fixes: 704956ecf5bc ("f2fs: support inode checksum")
Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:33 -07:00
Zhikang Zhang
ea2813111f f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
We should check valid_map_mir and block count to ensure
the flushed raw_sit is correct.

Signed-off-by: Zhikang Zhang <zhangzhikang1@huawei.com>
Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:31 -07:00
Chao Yu
9190cadf38 f2fs: correct return value of f2fs_trim_fs
Correct return value in two cases:
- return EINVAL if end boundary is out-of-range.
- return EIO if fs needs off-line check.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:30 -07:00
Chao Yu
17f85d0708 f2fs: fix to show missing bits in FS_IOC_GETFLAGS
This patch fixes to show missing encrypt/inline_data flag in
FS_IOC_GETFLAGS like ext4 does.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:28 -07:00
Chao Yu
3e90db63fc f2fs: remove unneeded F2FS_PROJINHERIT_FL
Now F2FS_FL_USER_VISIBLE and F2FS_FL_USER_MODIFIABLE has included
F2FS_PROJINHERIT_FL, so remove unneeded F2FS_PROJINHERIT_FL when
using visible/modifiable flag macro.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:26 -07:00
Chao Yu
298032d4d4 f2fs: don't use GFP_ZERO for page caches
Related to https://lkml.org/lkml/2018/4/8/661

Sometimes, we need to write meta data to new allocated block address,
then we will allocate a zeroed page in inner inode's address space, and
fill partial data in it, and leave other place with zero value which means
some fields are initial status.

There are two inner inodes (meta inode and node inode) setting __GFP_ZERO,
I have just checked them, for both of them, we can avoid using __GFP_ZERO,
and do initialization by ourselves to avoid unneeded/redundant zeroing
from mm.

Cc: <stable@vger.kernel.org>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:25 -07:00
Yunlei He
fdf61219dc f2fs: issue all big range discards in umount process
This patch modify max_requests to UINT_MAX, to issue
all big range discards in umount.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:23 -07:00
Chao Yu
cd79eb2b5e f2fs: remove redundant block plug
For buffered IO, we don't need to use block plug to cache bio,
for direct IO, generic f2fs_direct_IO has already added block
plug, so let's remove redundant one in .write_iter.

As Yunlei described in his patch:

-f2fs_file_write_iter
  -blk_start_plug
    -__generic_file_write_iter
	...
	  -do_blockdev_direct_IO
	    -blk_start_plug
		...
	    -blk_finish_plug
	...
  -blk_finish_plug

which may conduct performance decrease in our platform

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:21 -07:00
Yunlong Song
ec034d0f14 f2fs: remove unmatched zero_user_segment when convert inline dentry
Since the layout of regular dentry block is different from inline dentry
block, zero_user_segment starting from MAX_INLINE_DATA(dir) is not
correct for regular dentry block, besides, bitmap is already copied and
used, so there is no necessary to zero page at all, so just remove the
zero_user_segment is OK.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:19 -07:00
Chao Yu
71aaced0e1 f2fs: introduce private inode status mapping
Previously, we use generic FS_*_FL defined by vfs to indicate inode status
for each bit of i_flags, so f2fs's flag status definition is tied to vfs'
one, it will be hard for f2fs to reuse bits f2fs never used to indicate
new status..

In order to solve this issue, we introduce private inode status mapping,
Note, for these bits have already been persisted into disk, we should
never change their definition, for other ones, we can remap them for
later new coming status.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-07-08 17:28:17 -07:00
Eric Biggers
e7724207f7 fscrypt: log the crypto algorithm implementations
Log the crypto algorithm driver name for each fscrypt encryption mode on
its first use, also showing a friendly name for the mode.

This will help people determine whether the expected implementations are
being used.  In some cases we've seen people do benchmarks and reject
using encryption for performance reasons, when in fact they used a much
slower implementation of AES-XTS than was possible on the hardware.  It
can make an enormous difference; e.g., AES-XTS on ARM is about 10x
faster with the crypto extensions (AES instructions) than without.

This also makes it more obvious which modes are being used, now that
fscrypt supports multiple combinations of modes.

Example messages (with default modes, on x86_64):

[   35.492057] fscrypt: AES-256-CTS-CBC using implementation "cts(cbc-aes-aesni)"
[   35.492171] fscrypt: AES-256-XTS using implementation "xts-aes-aesni"

Note: algorithms can be dynamically added to the crypto API, which can
result in different implementations being used at different times.  But
this is rare; for most users, showing the first will be good enough.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 14:24:29 -07:00
Herbert Xu
4cbda579cd crypto: api - Add crypto_type_has_alg helper
This patch adds the helper crypto_type_has_alg which is meant
to replace crypto_has_alg for new-style crypto types.  Rather
than hard-coding type/mask information they're now retrieved
from the crypto_type object.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-06-28 14:24:29 -07:00
Herbert Xu
b24dcaae87 crypto: skcipher - Add low-level skcipher interface
This patch allows skcipher algorithms and instances to be created
and registered with the crypto API.  They are accessible through
the top-level skcipher interface, along with ablkcipher/blkcipher
algorithms and instances.

This patch also introduces a new parameter called chunk size
which is meant for ciphers such as CTR and CTS which ostensibly
can handle arbitrary lengths, but still behave like block ciphers
in that you can only process a partial block at the very end.

For these ciphers the block size will continue to be set to 1
as it is now while the chunk size will be set to the underlying
block size.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-06-28 14:24:29 -07:00
Herbert Xu
a9146e4235 crypto: skcipher - Add helper to retrieve driver name
This patch adds the helper crypto_skcipher_driver_name which returns
the driver name of the alg object for a given tfm.  This is needed by
ecryptfs.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-06-28 14:24:29 -07:00
Herbert Xu
a0ca4bdf47 crypto: skcipher - Add default key size helper
While converting ecryptfs over to skcipher I found that it needs
to pick a default key size if one isn't given.  Rather than having
it poke into the guts of the algorithm to get max_keysize, let's
provide a helper that is meant to give a sane default (just in
case we ever get an algorithm that has no maximum key size).

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2018-06-28 14:24:29 -07:00
Eric Biggers
eb13e0b692 fscrypt: add Speck128/256 support
fscrypt currently only supports AES encryption.  However, many low-end
mobile devices have older CPUs that don't have AES instructions, e.g.
the ARMv8 Cryptography Extensions.  Currently, user data on such devices
is not encrypted at rest because AES is too slow, even when the NEON
bit-sliced implementation of AES is used.  Unfortunately, it is
infeasible to encrypt these devices at all when AES is the only option.

Therefore, this patch updates fscrypt to support the Speck block cipher,
which was recently added to the crypto API.  The C implementation of
Speck is not especially fast, but Speck can be implemented very
efficiently with general-purpose vector instructions, e.g. ARM NEON.
For example, on an ARMv7 processor, we measured the NEON-accelerated
Speck128/256-XTS at 69 MB/s for both encryption and decryption, while
AES-256-XTS with the NEON bit-sliced implementation was only 22 MB/s
encryption and 19 MB/s decryption.

There are multiple variants of Speck.  This patch only adds support for
Speck128/256, which is the variant with a 128-bit block size and 256-bit
key size -- the same as AES-256.  This is believed to be the most secure
variant of Speck, and it's only about 6% slower than Speck128/128.
Speck64/128 would be at least 20% faster because it has 20% rounds, and
it can be even faster on CPUs that can't efficiently do the 64-bit
operations needed for Speck128.  However, Speck64's 64-bit block size is
not preferred security-wise.  ARM NEON also supports the needed 64-bit
operations even on 32-bit CPUs, resulting in Speck128 being fast enough
for our targeted use cases so far.

The chosen modes of operation are XTS for contents and CTS-CBC for
filenames.  These are the same modes of operation that fscrypt defaults
to for AES.  Note that as with the other fscrypt modes, Speck will not
be used unless userspace chooses to use it.  Nor are any of the existing
modes (which are all AES-based) being removed, of course.

We intentionally don't make CONFIG_FS_ENCRYPTION select
CONFIG_CRYPTO_SPECK, so people will have to enable Speck support
themselves if they need it.  This is because we shouldn't bloat the
FS_ENCRYPTION dependencies with every new cipher, especially ones that
aren't recommended for most users.  Moreover, CRYPTO_SPECK is just the
generic implementation, which won't be fast enough for many users; in
practice, they'll need to enable CRYPTO_SPECK_NEON to get acceptable
performance.

More details about our choice of Speck can be found in our patches that
added Speck to the crypto API, and the follow-on discussion threads.
We're planning a publication that explains the choice in more detail.
But briefly, we can't use ChaCha20 as we previously proposed, since it
would be insecure to use a stream cipher in this context, with potential
IV reuse during writes on f2fs and/or on wear-leveling flash storage.

We also evaluated many other lightweight and/or ARX-based block ciphers
such as Chaskey-LTS, RC5, LEA, CHAM, Threefish, RC6, NOEKEON, SPARX, and
XTEA.  However, all had disadvantages vs. Speck, such as insufficient
performance with NEON, much less published cryptanalysis, or an
insufficient security level.  Various design choices in Speck make it
perform better with NEON than competing ciphers while still having a
security margin similar to AES, and in the case of Speck128 also the
same available security levels.  Unfortunately, Speck does have some
political baggage attached -- it's an NSA designed cipher, and was
rejected from an ISO standard (though for context, as far as I know none
of the above-mentioned alternatives are ISO standards either).
Nevertheless, we believe it is a good solution to the problem from a
technical perspective.

Certain algorithms constructed from ChaCha or the ChaCha permutation,
such as MEM (Masked Even-Mansour) or HPolyC, may also meet our
performance requirements.  However, these are new constructions that
need more time to receive the cryptographic review and acceptance needed
to be confident in their security.  HPolyC hasn't been published yet,
and we are concerned that MEM makes stronger assumptions about the
underlying permutation than the ChaCha stream cipher does.  In contrast,
the XTS mode of operation is relatively well accepted, and Speck has
over 70 cryptanalysis papers.  Of course, these ChaCha-based algorithms
can still be added later if they become ready.

The best known attack on Speck128/256 is a differential cryptanalysis
attack on 25 of 34 rounds with 2^253 time complexity and 2^125 chosen
plaintexts, i.e. only marginally faster than brute force.  There is no
known attack on the full 34 rounds.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 13:24:35 -07:00
Eric Biggers
27a0e77380 fscrypt: only derive the needed portion of the key
Currently the key derivation function in fscrypt uses the master key
length as the amount of output key material to derive.  This works, but
it means we can waste time deriving more key material than is actually
used, e.g. most commonly, deriving 64 bytes for directories which only
take a 32-byte AES-256-CTS-CBC key.  It also forces us to validate that
the master key length is a multiple of AES_BLOCK_SIZE, which wouldn't
otherwise be necessary.

Fix it to only derive the needed length key.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 13:24:35 -07:00
Eric Biggers
f68a71fa8f fscrypt: separate key lookup from key derivation
Refactor the confusingly-named function 'validate_user_key()' into a new
function 'find_and_derive_key()' which first finds the keyring key, then
does the key derivation.  Among other benefits this avoids the strange
behavior we had previously where if key derivation failed for some
reason, then we would fall back to the alternate key prefix.  Now, we'll
only fall back to the alternate key prefix if a valid key isn't found.

This patch also improves the warning messages that are logged when the
keyring key's payload is invalid.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 13:24:33 -07:00
Eric Biggers
52359cf4fd fscrypt: use a common logging function
Use a common function for fscrypt warning and error messages so that all
the messages are consistently ratelimited, include the "fscrypt:"
prefix, and include the filesystem name if applicable.

Also fix up a few of the log messages to be more descriptive.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:21:30 -07:00
Eric Biggers
ff8e7c745e fscrypt: remove internal key size constants
With one exception, the internal key size constants such as
FS_AES_256_XTS_KEY_SIZE are only used for the 'available_modes' array,
where they really only serve to obfuscate what the values are.  Also
some of the constants are unused, and the key sizes tend to be in the
names of the algorithms anyway.  In the past these values were also
misused, e.g. we used to have FS_AES_256_XTS_KEY_SIZE in places that
technically should have been FS_MAX_KEY_SIZE.

The exception is that FS_AES_128_ECB_KEY_SIZE is used for key
derivation.  But it's more appropriate to use
FS_KEY_DERIVATION_NONCE_SIZE for that instead.

Thus, just put the sizes directly in the 'available_modes' array.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:21:30 -07:00
Eric Biggers
7149dd4d39 fscrypt: remove unnecessary check for non-logon key type
We're passing 'key_type_logon' to request_key(), so the found key is
guaranteed to be of type "logon".  Thus, there is no reason to check
later that the key is really a "logon" key.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:21:27 -07:00
Eric Biggers
56446c9142 fscrypt: make fscrypt_operations.max_namelen an integer
Now ->max_namelen() is only called to limit the filename length when
adding NUL padding, and only for real filenames -- not symlink targets.
It also didn't give the correct length for symlink targets anyway since
it forgot to subtract 'sizeof(struct fscrypt_symlink_data)'.

Thus, change ->max_namelen from a function to a simple 'unsigned int'
that gives the filesystem's maximum filename length.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:19:55 -07:00
Eric Biggers
f572a22ef9 fscrypt: drop empty name check from fname_decrypt()
fname_decrypt() is validating that the encrypted filename is nonempty.
However, earlier a stronger precondition was already enforced: the
encrypted filename must be at least 16 (FS_CRYPTO_BLOCK_SIZE) bytes.

Drop the redundant check for an empty filename.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
0077eff1d2 fscrypt: drop max_namelen check from fname_decrypt()
fname_decrypt() returns an error if the input filename is longer than
the inode's ->max_namelen() as given by the filesystem.  But, this
doesn't actually make sense because the filesystem provided the input
filename in the first place, where it was subject to the filesystem's
limits.  And fname_decrypt() has no internal limit itself.

Thus, remove this unnecessary check.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
3f7af9d27f fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
In fscrypt_setup_filename(), remove the unnecessary check for
fscrypt_get_encryption_info() returning EOPNOTSUPP.  There's no reason
to handle this error differently from any other.  I think there may have
been some confusion because the "notsupp" version of
fscrypt_get_encryption_info() returns EOPNOTSUPP -- but that's not
applicable from inside fs/crypto/.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
52c51f7b7b fscrypt: don't clear flags on crypto transform
fscrypt is clearing the flags on the crypto_skcipher it allocates for
each inode.  But, this is unnecessary and may cause problems in the
future because it will even clear flags that are meant to be internal to
the crypto API, e.g. CRYPTO_TFM_NEED_KEY.

Remove the unnecessary flag clearing.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
89b7fb8298 fscrypt: remove stale comment from fscrypt_d_revalidate()
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
d56de4e926 fscrypt: remove error messages for skcipher_request_alloc() failure
skcipher_request_alloc() can only fail due to lack of memory, and in
that case the memory allocator will have already printed a detailed
error message.  Thus, remove the redundant error messages from fscrypt.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
f68d3b84ae fscrypt: remove unnecessary NULL check when allocating skcipher
crypto_alloc_skcipher() returns an ERR_PTR() on failure, not NULL.
Remove the unnecessary check for NULL.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
fb10231825 fscrypt: clean up after fscrypt_prepare_lookup() conversions
Now that all filesystems have been converted to use
fscrypt_prepare_lookup(), we can remove the fscrypt_set_d_op() and
fscrypt_set_encrypted_dentry() functions as well as un-export
fscrypt_d_ops.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:52 -07:00
Eric Biggers
39b1444906 fscrypt: use unbound workqueue for decryption
Improve fscrypt read performance by switching the decryption workqueue
from bound to unbound.  With the bound workqueue, when multiple bios
completed on the same CPU, they were decrypted on that same CPU.  But
with the unbound queue, they are now decrypted in parallel on any CPU.

Although fscrypt read performance can be tough to measure due to the
many sources of variation, this change is most beneficial when
decryption is slow, e.g. on CPUs without AES instructions.  For example,
I timed tarring up encrypted directories on f2fs.  On x86 with AES-NI
instructions disabled, the unbound workqueue improved performance by
about 25-35%, using 1 to NUM_CPUs jobs with 4 or 8 CPUs available.  But
with AES-NI enabled, performance was unchanged to within ~2%.

I also did the same test on a quad-core ARM CPU using xts-speck128-neon
encryption.  There performance was usually about 10% better with the
unbound workqueue, bringing it closer to the unencrypted speed.

The unbound workqueue may be worse in some cases due to worse locality,
but I think it's still the better default.  dm-crypt uses an unbound
workqueue by default too, so this change makes fscrypt match.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2018-06-28 12:18:23 -07:00
Jaegeuk Kim
73450231ff f2fs: run fstrim asynchronously if runtime discard is on
We don't need to wait for whole bunch of discard candidates in fstrim, since
runtime discard will issue them in idle time.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-06-04 21:18:51 -07:00
Chao Yu
85d2070f60 f2fs: turn down IO priority of discard from background
In order to avoid interfering normal r/w IO, let's turn down IO
priority of discard issued from background.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 11:39:57 -07:00
Chao Yu
4738f527db f2fs: don't split checkpoint in fstrim
Now, we issue discard asynchronously in separated thread instead of in
checkpoint, after that, we won't encounter long latency in checkpoint
due to huge number of synchronous discard command handling, so, we don't
need to split checkpoint to do trim in batch, merge it and obsolete
related sysfs entry.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 11:39:54 -07:00
Jaegeuk Kim
31e2713935 f2fs: issue discard commands proactively in high fs utilization
In the high utilization like over 80%, we don't expect huge # of large discard
commands, but do many small pending discards which affects FTL GCs a lot.
Let's issue them in that case.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 11:39:51 -07:00
Jaegeuk Kim
70676ef736 f2fs: add fsync_mode=nobarrier for non-atomic files
For non-atomic files, this patch adds an option to give nobarrier which
doesn't issue flush commands to the device.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 11:39:48 -07:00
Jaegeuk Kim
bb53d06b5f f2fs: let fstrim issue discard commands in lower priority
The fstrim gathers huge number of large discard commands, and tries to issue
without IO awareness, which results in long user-perceive IO latencies on
READ, WRITE, and FLUSH in UFS. We've observed some of commands take several
seconds due to long discard latency.

This patch limits the maximum size to 2MB per candidate, and check IO congestion
when issuing them to disk.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-30 11:39:45 -07:00
Jaegeuk Kim
520a948618 f2fs: avoid fsync() failure caused by EAGAIN in writepage()
pageout() in MM traslates EAGAIN, so calls handle_write_error()
 -> mapping_set_error() -> set_bit(AS_EIO, ...).
 file_write_and_wait_range() will see EIO error, which is critical
 to return value of fsync() followed by atomic_write failure to user.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-05-04 14:18:26 -07:00
Jaegeuk Kim
a44b418c31 f2fs: clear PageError on writepage - part 2
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-26 10:54:49 -06:00
Jaegeuk Kim
f819874f58 f2fs: check cap_resource only for data blocks
This patch changes the rule to check cap_resource for data blocks, not inode
or node blocks in order to avoid selinux denial.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-23 16:11:17 -06:00
Jaegeuk Kim
3e7a141175 Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.

This reverts commit fe76b796fc5194cc3d57265002e3a748566d073f.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-23 16:11:14 -06:00
Jaegeuk Kim
070da80085 f2fs: clear PageError on writepage
This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-22 16:47:10 -06:00
Eric Biggers
dafecc032e f2fs: call unlock_new_inode() before d_instantiate()
xfstest generic/429 sometimes hangs on f2fs, caused by a thread being
unable to take a directory's i_rwsem for write in vfs_rmdir().  In the
test, one thread repeatedly creates and removes a directory, and other
threads repeatedly look up a file in the directory.  The bug is that
f2fs_mkdir() calls d_instantiate() before unlock_new_inode(), resulting
in the directory inode being exposed to lookups before it has been fully
initialized.  And with CONFIG_DEBUG_LOCK_ALLOC, unlock_new_inode()
reinitializes ->i_rwsem, corrupting its state when it is already held.

Fix it by calling unlock_new_inode() before d_instantiate().  This
matches what other filesystems do.

Fixes: 57397d86c6 ("f2fs: add inode operations for special inodes")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-19 22:04:50 -07:00