Commit graph

256 commits

Author SHA1 Message Date
Jaegeuk Kim
35346d5c24 Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
commit 3e7a141175 ("Revert "f2fs: introduce f2fs_set_page_dirty_nobuffer"")

This patch reverts copied f2fs_set_page_dirty_nobuffer to use generic function
for stability.

This reverts commit fe76b796fc5194cc3d57265002e3a748566d073f.

Change-Id: I3d4728d894d1af41a2f1e30ebc375907abd5ffc8
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-23 23:09:36 -07:00
Jaegeuk Kim
b5390be7d0 f2fs: clear PageError on writepage
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
commit 070da80085 ("f2fs: clear PageError on writepage")

This patch clears PageError in some pages tagged by read path, but when we
write the pages with valid contents, writepage should clear the bit likewise
ext4.

Change-Id: I7272074f2bb9c81fc43e37074b44e9d761756263
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-23 23:09:00 -07:00
Eric Biggers
0f4e0fa71f f2fs: refactor read path to allow multiple postprocessing steps
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
  c18b4f60c8 ("f2fs: refactor read path to allow multiple postprocessing steps")

Currently f2fs's ->readpage() and ->readpages() assume that either the
data undergoes no postprocessing, or decryption only.  But with
fs-verity, there will be an additional authenticity verification step,
and it may be needed either by itself, or combined with decryption.

To support this, store a 'struct bio_post_read_ctx' in ->bi_private
which contains a work struct, a bitmask of postprocessing steps that are
enabled, and an indicator of the current step.  The bio completion
routine, if there was no I/O error, enqueues the first postprocessing
step.  When that completes, it continues to the next step.  Pages that
fail any postprocessing step have PageError set.  Once all steps have
completed, pages without PageError set are set Uptodate, and all pages
are unlocked.

Also replace f2fs_encrypted_file() with a new function
f2fs_post_read_required() in places like direct I/O and garbage
collection that really should be testing whether the file needs special
I/O processing, not whether it is encrypted specifically.

This may also be useful for other future f2fs features such as
compression.

Change-Id: I742be348b9dfc2113200bcc5366a84e978371a54
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-20 22:30:46 -07:00
Eric Biggers
f478c9bfbb fscrypt: allow synchronous bio decryption
Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:
  13890bed20 ("fscrypt: allow synchronous bio decryption")

Currently, fscrypt provides fscrypt_decrypt_bio_pages() which decrypts a
bio's pages asynchronously, then unlocks them afterwards.  But, this
assumes that decryption is the last "postprocessing step" for the bio,
so it's incompatible with additional postprocessing steps such as
authenticity verification after decryption.

Therefore, rename the existing fscrypt_decrypt_bio_pages() to
fscrypt_enqueue_decrypt_bio().  Then, add fscrypt_decrypt_bio() which
decrypts the pages in the bio synchronously without unlocking the pages,
nor setting them Uptodate; and add fscrypt_enqueue_decrypt_work(), which
enqueues work on the fscrypt_read_workqueue.  The new functions will be
used by filesystems that support both fscrypt and fs-verity.

Change-Id: I99f1c7bfb79381f6e4abf2b1f418776b19bd8e08
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2018-04-20 22:30:46 -07:00
Jaegeuk Kim
24a2d90393 f2fs/fscrypt: updates to v4.17-rc1
Pull f2fs update from Jaegeuk Kim:
 "In this round, we've mainly focused on performance tuning and critical
  bug fixes occurred in low-end devices. Sheng Yong introduced
  lost_found feature to keep missing files during recovery instead of
  thrashing them. We're preparing coming fsverity implementation. And,
  we've got more features to communicate with users for better
  performance. In low-end devices, some memory-related issues were
  fixed, and subtle race condtions and corner cases were addressed as
  well.

  Enhancements:
   - large nat bitmaps for more free node ids
   - add three block allocation policies to pass down write hints given by user
   - expose extension list to user and introduce hot file extension
   - tune small devices seamlessly for low-end devices
   - set readdir_ra by default
   - give more resources under gc_urgent mode regarding to discard and cleaning
   - introduce fsync_mode to enforce posix or not
   - nowait aio support
   - add lost_found feature to keep dangling inodes
   - reserve bits for future fsverity feature
   - add test_dummy_encryption for FBE

  Bug fixes:
   - don't use highmem for dentry pages
   - align memory boundary for bitops
   - truncate preallocated blocks in write errors
   - guarantee i_times on fsync call
   - clear CP_TRIMMED_FLAG correctly
   - prevent node chain loop during recovery
   - avoid data race between atomic write and background cleaning
   - avoid unnecessary selinux violation warnings on resgid option
   - GFP_NOFS to avoid deadlock in quota and read paths
   - fix f2fs_skip_inode_update to allow i_size recovery

  In addition to the above, there are several minor bug fixes and clean-ups"

Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:

42bf67fc54 f2fs: remain written times to update inode during fsync
6cb5aa02bf f2fs: make assignment of t->dentry_bitmap more readable
a8d07f1f9c f2fs: truncate preallocated blocks in error case
86444d6006 f2fs: fix a wrong condition in f2fs_skip_inode_update
db2188a687 f2fs: reserve bits for fs-verity
ee2e74b3f0 f2fs: Add a segment type check in inplace write
0192e0a450 f2fs: no need to initialize zero value for GFP_F2FS_ZERO
49338842e9 f2fs: don't track new nat entry in nat set
d6a69d5e65 f2fs: clean up with F2FS_BLK_ALIGN
2c8834a7a2 f2fs: check blkaddr more accuratly before issue a bio
6ab573a9d9 f2fs: Set GF_NOFS in read_cache_page_gfp while doing f2fs_quota_read
7419dcb8be f2fs: introduce a new mount option test_dummy_encryption
9321e22c03 f2fs: introduce F2FS_FEATURE_LOST_FOUND feature
8a57196158 f2fs: release locks before return in f2fs_ioc_gc_range()
739ace131c f2fs: align memory boundary for bitops
4c55abe4f8 f2fs: remove unneeded set_cold_node()
30654507e0 f2fs: add nowait aio support
d909e94106 f2fs: wrap all options with f2fs_sb_info.mount_opt
5738be52b3 f2fs: Don't overwrite all types of node to keep node chain
0bdeb167c8 f2fs: introduce mount option for fsync mode
6bc490f0ee f2fs: fix to restore old mount option in ->remount_fs
0c9c3e0344 f2fs: wrap sb_rdonly with f2fs_readonly
6c6611223a f2fs: avoid selinux denial on CAP_SYS_RESOURCE
076a6f32fe f2fs: support hot file extension
58edcdbca6 f2fs: fix to avoid race in between atomic write and background GC
1e0aeb0af9 f2fs: do gc in greedy mode for whole range if gc_urgent mode is set
10b2d001d6 f2fs: issue discard aggressively in the gc_urgent mode
a5052f32b9 f2fs: set readdir_ra by default
1aa536a624 f2fs: add auto tuning for small devices
0ffdffc8f1 f2fs: add mount option for segment allocation policy
b798298912 f2fs: don't stop GC if GC is contended
766d232169 f2fs: expose extension_list sysfs entry
98b329de50 f2fs: fix to set KEEP_SIZE bit in f2fs_zero_range
4d409fa334 f2fs: introduce sb_lock to make encrypt pwsalt update exclusive
1f6bac14c1 f2fs: remove redundant initialization of pointer 'p'
946aefc754 f2fs: flush cp pack except cp pack 2 page at first
e5081a52ac f2fs: clean up f2fs_sb_has_xxx functions
a292477154 f2fs: remove redundant check of page type when submit bio
190e64a819 f2fs: fix to handle looped node chain during recovery
889d980876 f2fs: handle quota for orphan inodes
92b12bb1a2 f2fs: support passing down write hints to block layer with F2FS policy
22fa74c2b0 f2fs: support passing down write hints given by users to block layer
180900373e f2fs: fix to clear CP_TRIMMED_FLAG
0671fae134 f2fs: support large nat bitmap
eceb943d5d f2fs: fix to check extent cache in f2fs_drop_extent_tree
2e2a339c98 f2fs: restrict inline_xattr_size configuration
41dda11641 f2fs: fix heap mode to reset it back
39575737bb f2fs: fix potential corruption in area before F2FS_SUPER_OFFSET
7e0e7995ee fscrypt: fix build with pre-4.6 gcc versions
31d3279a4f fscrypt: fix up fscrypt_fname_encrypted_size() for internal use
82bec88856 fscrypt: define fscrypt_fname_alloc_buffer() to be for presented names
168a907828 fscrypt: calculate NUL-padding length in one place only
042ae9f4cf fscrypt: move fscrypt_symlink_data to fscrypt_private.h
f9550c24c2 fscrypt: remove fscrypt_fname_usr_to_disk()
7ac4756a24 f2fs: switch to fscrypt_get_symlink()
6b76f58e24 f2fs: switch to fscrypt ->symlink() helper functions
fd457d2c4e fscrypt: new helper function - fscrypt_get_symlink()
a1cdacb7ae fscrypt: new helper functions for ->symlink()
7f43602f4d fscrypt: trim down fscrypt.h includes
d9cadc11bd fscrypt: move fscrypt_is_dot_dotdot() to fs/crypto/fname.c
e6fe930580 fscrypt: move fscrypt_valid_enc_modes() to fscrypt_private.h
efefa434f4 fscrypt: move fscrypt_operations declaration to fscrypt_supp.h
7ed178bc8a fscrypt: split fscrypt_dummy_context_enabled() into supp/notsupp versions
3f16e09dad fscrypt: move fscrypt_ctx declaration to fscrypt_supp.h
8216a0b51a fscrypt: move fscrypt_info_cachep declaration to fscrypt_private.h
dfe0b3b1b6 fscrypt: move fscrypt_control_page() to supp/notsupp headers
3a2c791778 fscrypt: move fscrypt_has_encryption_key() to supp/notsupp headers

Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-04-12 09:58:05 -07:00
Jaegeuk Kim
56ee1e8179 f2fs: updates on v4.16-rc1
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we've followed up to support some generic features such
  as cgroup, block reservation, linking fscrypt_ops, delivering
  write_hints, and some ioctls. And, we could fix some corner cases in
  terms of power-cut recovery and subtle deadlocks.

  Enhancements:
   - bitmap operations to handle NAT blocks
   - readahead to improve readdir speed
   - switch to use fscrypt_*
   - apply write hints for direct IO
   - add reserve_root=%u,resuid=%u,resgid=%u to reserve blocks for root/uid/gid
   - modify b_avail and b_free to consider root reserved blocks
   - support cgroup writeback
   - support FIEMAP_FLAG_XATTR for fibmap
   - add F2FS_IOC_PRECACHE_EXTENTS to pre-cache extents
   - add F2FS_IOC_{GET/SET}_PIN_FILE to pin LBAs for data blocks
   - support inode creation time

  Bug fixs:
   - sysfile-based quota operations
   - memory footprint accounting
   - allow to write data on partial preallocation case
   - fix deadlock case on fallocate
   - fix to handle fill_super errors
   - fix missing inode updates of fsync'ed file
   - recover renamed file which was fsycn'ed before
   - drop inmemory pages in corner error case
   - keep last_disk_size correctly
   - recover missing i_inline flags during roll-forward

  Various clean-up patches were added as well"

Cherry-pick from origin/upstream-f2fs-stable-linux-4.4.y:

5f9b3abb91 f2fs: support inode creation time
9fb0de1751 f2fs: rebuild sit page from sit info in mem
1062a0c018 f2fs: stop issuing discard if fs is readonly
fa043fae90 f2fs: clean up duplicated assignment in init_discard_policy
b007190234 f2fs: use GFP_F2FS_ZERO for cleanup
35b11839a1 f2fs: allow to recover node blocks given updated checkpoint
e56500860b f2fs: recover some i_inline flags
64aa9569a1 f2fs: correct removexattr behavior for null valued extended attribute
70b3a923da f2fs: drop page cache after fs shutdown
8069a0e983 f2fs: stop gc/discard thread after fs shutdown
bb924f7777 f2fs: hanlde error case in f2fs_ioc_shutdown
700b53f21e f2fs: split need_inplace_update
f31d52811c f2fs: fix to update last_disk_size correctly
eeb0118b83 f2fs: kill F2FS_INLINE_XATTR_ADDRS for cleanup
c1b74c9670 f2fs: clean up error path of fill_super
d5efd57e01 f2fs: avoid hungtask when GC encrypted block if io_bits is set
c4027d0843 f2fs: allow quota to use reserved blocks
18d267c273 f2fs: fix to drop all inmem pages correctly
4dca47531e f2fs: speed up defragment on sparse file
999f806a7c f2fs: support F2FS_IOC_PRECACHE_EXTENTS
84960fca96 f2fs: add an ioctl to disable GC for specific file
292c8e1cfd f2fs: prevent newly created inode from being dirtied incorrectly
58b1f5b0fc f2fs: support FIEMAP_FLAG_XATTR
6afa9a94d0 f2fs: fix to cover f2fs_inline_data_fiemap with inode_lock
10f4a4140b f2fs: check node page again in write end io
b203c58dfd f2fs: fix to caclulate required free section correctly
d49132d45c f2fs: handle newly created page when revoking inmem pages
2ce6b9d816 f2fs: add resgid and resuid to reserve root blocks
f53dcf6799 f2fs: implement cgroup writeback support
1338f376d5 f2fs: remove unused pend_list_tag
d4f19f6266 f2fs: avoid high cpu usage in discard thread
b78e9302e2 f2fs: make local functions static
62438ba87b f2fs: add reserved blocks for root user
06a366757f f2fs: check segment type in __f2fs_replace_block
4c6bc4be37 f2fs: update inode info to inode page for new file
591b336387 f2fs: show precise # of blocks that user/root can use
b242d7edc5 f2fs: clean up unneeded declaration
87b8168e9e f2fs: continue to do direct IO if we only preallocate partial blocks
2b4d859bd9 f2fs: enable quota at remount from r to w
54bf13a0ad f2fs: skip stop_checkpoint for user data writes
25ef3006ba f2fs: fix missing error number for xattr operation
cff2c7fe41 f2fs: recover directory operations by fsync
e2bb618a0a f2fs: return error during fill_super
8a2c11d865 f2fs: fix an error case of missing update inode page
cd38d5ada5 f2fs: fix potential hangtask in f2fs_trace_pid
e81cafbeba f2fs: no need return value in restore summary process
04d44000d6 f2fs: use unlikely for release case
925d0933d8 f2fs: don't return value in truncate_data_blocks_range
f7986c416d f2fs: clean up f2fs_map_blocks
e4f5e26cda f2fs: clean up hash codes
1f994d4708 f2fs: fix error handling in fill_super
e7db649b5f f2fs: spread f2fs_k{m,z}alloc
5d4e487b99 f2fs: inject fault to kvmalloc
8b33886c37 f2fs: inject fault to kzalloc
d946807987 f2fs: remove a redundant conditional expression
3bc01114a3 f2fs: apply write hints to select the type of segment for direct write
c80f019591 f2fs: switch to fscrypt_prepare_setattr()
bb8b850365 f2fs: switch to fscrypt_prepare_lookup()
9ab470eaf8 f2fs: switch to fscrypt_prepare_rename()
aeaac517a1 f2fs: switch to fscrypt_prepare_link()
101c6a96ad f2fs: switch to fscrypt_file_open()
6d025237a1 f2fs: remove repeated f2fs_bug_on
b01e03d724 f2fs: remove an excess variable
e1f9be2f7c f2fs: fix lock dependency in between dio_rwsem & i_mmap_sem
e5c7c86010 f2fs: remove unused parameter
f130dbb98a f2fs: still write data if preallocate only partial blocks
47ee9b2598 f2fs: introduce sysfs readdir_ra to readahead inode block in readdir
55e2f89181 f2fs: fix concurrent problem for updating free bitmap
e1398f6554 f2fs: remove unneeded memory footprint accounting
2d69561135 f2fs: no need to read nat block if nat_block_bitmap is set
4dd2d07338 f2fs: reserve nid resource for quota sysfile

Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-02-22 19:19:42 +00:00
Jaegeuk Kim
353c16247d f2fs: updates on 4.15-rc1
Pull f2fs updates from Jaegeuk Kim:
 "In this round, we introduce sysfile-based quota support which is
  required for Android by default. In addition, we allow that users are
  able to reserve some blocks in runtime to mitigate performance drops
  in low free space.

 Enhancements:
  - assign proper data segments according to write_hints given by user
  - issue cache_flush on dirty devices only among multiple devices
  - exploit cp_error flag and add more faults to enhance fault
    injection test
  - conduct more readaheads during f2fs_readdir
  - add a range for discard commands

Bug fixes:
 - fix zero stat->st_blocks when inline_data is set
 - drop crypto key and free stale memory pointer while evict_inode is
   failing
 - fix some corner cases in free space and segment management
 - fix wrong last_disk_size

This series includes lots of clean-ups and code enhancement in terms
of xattr operations, discard/flush command control. In addition, it
adds versatile debugfs entries to monitor f2fs status"

Cherry-picked from origin/upstream-f2fs-stable-linux-4.4.y:

56a07b0705 f2fs: deny accessing encryption policy if encryption is off
c394842e26 f2fs: inject fault in inc_valid_node_count
9262922510 f2fs: fix to clear FI_NO_PREALLOC
e6cfc5de2d f2fs: expose quota information in debugfs
c4cd2efe83 f2fs: separate nat entry mem alloc from nat_tree_lock
48c72b4c8c f2fs: validate before set/clear free nat bitmap
baf9275a4b f2fs: avoid opened loop codes in __add_ino_entry
47af6c72d9 f2fs: apply write hints to select the type of segments for buffered write
ac98191605 f2fs: introduce scan_curseg_cache for cleanup
ca28e9670e f2fs: optimize the way of traversing free_nid_bitmap
460688b59e f2fs: keep scanning until enough free nids are acquired
0186182c0c f2fs: trace checkpoint reason in fsync()
5d4b6efcfd f2fs: keep isize once block is reserved cross EOF
3c8f767e13 f2fs: avoid race in between GC and block exchange
4423778adf f2fs: save a multiplication for last_nid calculation
3e3b405575 f2fs: fix summary info corruption
44889e4879 f2fs: remove dead code in update_meta_page
55c7b9595b f2fs: remove unneeded semicolon
8b92814117 f2fs: don't bother with inode->i_version
42c7c71824 f2fs: check curseg space before foreground GC
c5470498e5 f2fs: use rw_semaphore to protect SIT cache
82750d346a f2fs: support quota sys files
26dfec49b2 f2fs: add quota_ino feature infra
ddb8e2ae98 f2fs: optimize __update_nat_bits
f46ae958c7 f2fs: modify for accurate fggc node io stat
c713fdb5a2 Revert "f2fs: handle dirty segments inside refresh_sit_entry"
873ec505cb f2fs: add a function to move nid
ae66786296 f2fs: export SSR allocation threshold
90c28a18d2 f2fs: give correct trimmed blocks in fstrim
5612922fb0 f2fs: support bio allocation error injection
583b7a274c f2fs: support get_page error injection
09a073cc8c f2fs: add missing sysfs description
e945474a9c f2fs: support soft block reservation
b7b2e629b6 f2fs: handle error case when adding xattr entry
7368e30495 f2fs: support flexible inline xattr size
ada4061e19 f2fs: show current cp state
5b8ff1301a f2fs: add missing quota_initialize
46d4a691f0 f2fs: show # of dirty segments via sysfs
fc13f9d7ce f2fs: stop all the operations by cp_error flag
91bea0c391 f2fs: remove several redundant assignments
807486c795 f2fs: avoid using timespec
03b1cb0bb4 f2fs: fix to correct no_fggc_candidate
5c15033cea Revert "f2fs: return wrong error number on f2fs_quota_write"
5f5f593222 f2fs: remove obsolete pointer for truncate_xattr_node
032a690682 f2fs: retry ENOMEM for quota_read|write
171b638fc4 f2fs: limit # of inmemory pages
83ed7a615f f2fs: update ctx->pos correctly when hitting hole in directory
4d6e68be25 f2fs: relocate readahead codes in readdir()
c8be47b540 f2fs: allow readdir() to be interrupted
2b903fe94c f2fs: trace f2fs_readdir
bb0db666d4 f2fs: trace f2fs_lookup
40d6250f04 f2fs: skip searching non-exist range in truncate_hole
8e84f379df f2fs: expose some sectors to user in inline data or dentry case
cb98f70dea f2fs: avoid stale fi->gdirty_list pointer
5562a3c539 f2fs/crypto: drop crypto key at evict_inode only
85853e7e38 f2fs: fix to avoid race when accessing last_disk_size
0c47a892d5 f2fs: Fix bool initialization/comparison
68e801abc5 f2fs: give up CP_TRIMMED_FLAG if it drops discards
df74eacb20 f2fs: trace f2fs_remove_discard
bd502c6e3e f2fs: reduce cmd_lock coverage in __issue_discard_cmd
a34ab5ca4f f2fs: split discard policy
1e65afd14d f2fs: wrap discard policy
684447dad1 f2fs: support issuing/waiting discard in range
27eaad0938 f2fs: fix to flush multiple device in checkpoint
08bb9d68d5 f2fs: enhance multiple device flush
9c2526ac2e f2fs: fix to show ino management cache size correctly
814b463d26 f2fs: drop FI_UPDATE_WRITE tag after f2fs_issue_flush
f555b0a117 f2fs: obsolete ALLOC_NID_LIST list
75d3164ae1 f2fs: convert inline data for direct I/O & FI_NO_PREALLOC
4de0ceb6b7 f2fs: allow readpages with NULL file pointer
322a45d172 f2fs: show flush list status in sysfs
6d625a93b4 f2fs: introduce read_xattr_block
8ea6e1c327 f2fs: introduce read_inline_xattr
dbce11e9ee Revert "f2fs: reuse nids more aggressively"
131bc9f6b7 Revert "f2fs: node segment is prior to data segment selected victim"

Change-Id: I93b9cd867b859a667a448b39299ff44a2b841b8c
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2017-11-27 13:08:17 -08:00
Jaegeuk Kim
13f002354d f2fs: catch up to v4.14-rc1
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.

Changes include:

commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")

Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2017-10-03 17:19:14 -07:00
Chao Yu
dc8b8cea1e f2fs: introduce FI_ATOMIC_COMMIT
commit 5fe457430e554a2f5188f13c1a2e36ad845640c5 upstream.

This patch introduces a new flag to indicate inode status of doing atomic
write committing, so that, we can keep atomic write status for inode
during atomic committing, then we can skip GCing pages of atomic write inode,
that avoids random GCed datas being mixed with current transaction, so
isolation of transaction can be kept.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 15:20:50 +00:00
Chao Yu
7129702a48 f2fs: clean up with list_{first, last}_entry
commit 939afa943c5290a3b92f01612a792af17bc98115 upstream.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 13:49:36 +00:00
Jaegeuk Kim
b3fcb70064 f2fs: support IO alignment for DATA and NODE writes
commit 0a595ebaaa6b53a2226d3fee2a2fd616ea5ba378 upstream.

This patch implements IO alignment by filling dummy blocks in DATA and NODE
write bios. If we can guarantee, for example, 32KB or 64KB for such the IOs,
we can eliminate underlying dummy page problem which FTL conducts in order to
close MLC or TLC partial written pages.

Note that,
 - it requires "-o mode=lfs".
 - IO size should be power of 2, not exceed BIO_MAX_PAGES, 256.
 - read IO is still 4KB.
 - do checkpoint at fsync, if dummy NODE page was written.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 13:45:50 +00:00
Jaegeuk Kim
8ef4f0ca7b f2fs: add submit_bio tracepoint
commit 554b5125f5cfca6653461fd52bad24d4ef35ec29 upstream.

This patch adds final submit_bio() tracepoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 13:45:13 +00:00
Yunlei He
ff9199293b f2fs: add a case of no need to read a page in write begin
commit 746e2403927efbd7c7f2e796314e3cfb3cfabaa4 upstream.

If the range we write cover the whole valid data in the last page,
we do not need to read it.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Jaegeuk Kim: nullify the remaining area (fix: xfstests/f2fs/001)]
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 13:39:12 +00:00
Jaegeuk Kim
dc45fd9e28 f2fs: resolve op and op_flags confilcts
commit 70fd76140a6cb63262bd47b68d57b42e889c10ee upstream.

This patch backported ("block,fs: use REQ_* flags directly")

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-10-03 13:35:45 +00:00
Chao Yu
9a82dd23e4 f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
commit 0002b61bdaac732bcff364a18f5bd57c95def0a5 upstream.

We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:00 -07:00
Chao Yu
9e266223b3 f2fs: don't wait writeback for datas during checkpoint
commit 36951b38d13ac7cce9fcf89e0e01c22ed0d05688 upstream.

Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:50 -07:00
Jaegeuk Kim
8b2c7581e8 f2fs: fix redundant block allocation
commit c040ff9d69fd1d782fe577ba9e35c1f5798158ae upstream.

In direct_IO path of f2fs_file_write_iter(),
1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
   -> allocate LBA X
2. f2fs_direct_IO()
   -> return 0;

Then,
f2fs_write_data_page() will allocate another LBA X+1.

This makes EIO triggered by HM-SMR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:29 -07:00
Jaegeuk Kim
bd8e41540a f2fs: use err for f2fs_preallocate_blocks
commit a7de608691f766cd148971a71d4f13aa1692d4c8 upstream.

This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:08 -07:00
Jaegeuk Kim
07f010798c f2fs: support multiple devices
commit 3c62be17d4f562f43fe1d03b48194399caa35aa5 upstream.

This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:03 -07:00
Jaegeuk Kim
f9baf967bd f2fs: allow dio read for LFS mode
commit e57e9ae5b179a6b243c42bf6d9549d1595c27089 upstream.

We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:40 -07:00
Jaegeuk Kim
00e5a211f9 f2fs: revert segment allocation for direct IO
commit 6ae1be13e85f4c42c8ca371fda50ae39eebbfd96 upstream.

Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b2 (f2fs: align direct_io'ed data to section)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:37 -07:00
Damien Le Moal
22bbc1efdb f2fs: Use generic zoned block device terminology
commit 0bfd7a091c19132489a0f977b8dbf9f6b5ae0a1c upstream.

SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:34 -07:00
Arnd Bergmann
b3441f8c71 f2fs: hide a maybe-uninitialized warning
commit 230436b3ef3fd7d4a1da19edf5e87bb2d74e0fc2 upstream.

gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:

fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.

This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.

Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:29 -07:00
Jaegeuk Kim
04030d21a7 f2fs: use BIO_MAX_PAGES for bio allocation
commit 664ba972df9b96942191db3068274cc1db899774 upstream.

We don't need to allocate bio partially in order to maximize sequential writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:06:10 -07:00
Chao Yu
9e3d0bf6d3 f2fs: be aware of extent beyond EOF in fiemap
commit 58736fa60f6ae659ac72da8b1580c308b47e8edd upstream.

f2fs can support fallocating blocks beyond file size without changing the
size, but ->fiemap of f2fs was restricted and can't detect these extents
fallocated past EOF, now relieve the restriction.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:54 -07:00
Chao Yu
332f40b43f f2fs: don't miss any f2fs_balance_fs cases
commit 6f2d8ed654bfa391854df4de854953f772a16a9d upstream.

In f2fs_map_blocks, let f2fs_balance_fs detects node page modification
with dn.node_changed to avoid miss some corner cases.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:53 -07:00
Chao Yu
75bb19d8b7 f2fs: give a chance to detach from dirty list
commit 933439c8f3474e329709b715b43b0b8168bbecf8 upstream.

If there is no dirty pages in inode, we should give a chance to detach
the inode from global dirty list, otherwise it needs to call another
unnecessary .writepages for detaching.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:04:52 -07:00
Jaegeuk Kim
c1286ff41c f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs)
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 14:27:55 -07:00
Mohan Srinivasan
d854b68890 ANDROID: Refactor fs readpage/write tracepoints.
Refactor the fs readpage/write tracepoints to move the
inode->path lookup outside the tracepoint code, and pass a pointer
to the path into the tracepoint code instead. This is necessary
because the tracepoint code runs non-preemptible. Thanks to
Trilok Soni for catching this in 4.4.

Change-Id: I7486c5947918d155a30c61d6b9cd5027cf8fbe15
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
2017-02-10 19:09:14 +00:00
Mohan Srinivasan
32cbbe5953 ANDROID: fs: FS tracepoints to track IO.
Adds tracepoints in ext4/f2fs/mpage to track readpages/buffered
write()s. This allows us to track files that are being read/written
to PIDs.

Change-Id: I26bd36f933108927d6903da04d8cb42fd9c3ef3d
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
2016-09-20 21:33:24 +00:00
Jaegeuk Kim
67f8cf3cee f2fs: support fiemap for inline_data
There is a FIEMAP_EXTENT_INLINE_DATA, pointed out by Marc.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20 11:33:21 -07:00
Jaegeuk Kim
1d373a0ef7 f2fs: flush dirty data for bmap
Users expect bmap will give allocated block addresses.
Let's play likewise ext4.

Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-20 11:33:11 -07:00
Chao Yu
08b39fbd59 f2fs crypto: fix racing of accessing encrypted page among
different competitors

Since we use different page cache (normally inode's page cache for R/W
and meta inode's page cache for GC) to cache the same physical block
which is belong to an encrypted inode. Writeback of these two page
cache should be exclusive, but now we didn't handle writeback state
well, so there may be potential racing problem:

a)
kworker:				f2fs_gc:
 - f2fs_write_data_pages
  - f2fs_write_data_page
   - do_write_data_page
    - write_data_page
     - f2fs_submit_page_mbio
(page#1 in inode's page cache was queued
in f2fs bio cache, and be ready to write
to new blkaddr)
					 - gc_data_segment
					  - move_encrypted_block
					   - pagecache_get_page
					(page#2 in meta inode's page cache
					was cached with the invalid datas
					of physical block located in new
					blkaddr)
					   - f2fs_submit_page_mbio
					(page#1 was submitted, later, page#2
					with invalid data will be submitted)

b)
f2fs_gc:
 - gc_data_segment
  - move_encrypted_block
   - f2fs_submit_page_mbio
(page#1 in meta inode's page cache was
queued in f2fs bio cache, and be ready
to write to new blkaddr)
					user thread:
					 - f2fs_write_begin
					  - f2fs_submit_page_bio
					(we submit the request to block layer
					to update page#2 in inode's page cache
					with physical block located in new
					blkaddr, so here we may read gabbage
					data from new blkaddr since GC hasn't
					writebacked the page#1 yet)

This patch fixes above potential racing problem for encrypted inode.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-13 09:52:34 -07:00
Chao Yu
b8c2940048 f2fs: add a tracepoint for f2fs_read_data_pages
This patch adds a tracepoint for f2fs_read_data_pages to trace when pages
are readahead by VFS.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 14:00:34 -07:00
Jaegeuk Kim
a56c7c6fb3 f2fs: set GFP_NOFS for grab_cache_page
For normal inodes, their pages are allocated with __GFP_FS, which can cause
filesystem calls when reclaiming memory.
This can incur a dead lock condition accordingly.

So, this patch addresses this problem by introducing
f2fs_grab_cache_page(.., bool for_write), which calls
grab_cache_page_write_begin() with AOP_FLAG_NOFS.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-12 13:38:03 -07:00
Jaegeuk Kim
a125702326 Revert "f2fs: do not skip dentry block writes"
The periodic checkpoint can resolve the previous issue.
So, now we can use this again to improve the reported performance regression:

https://lkml.org/lkml/2015/10/8/20

This reverts commit 15bec0ff5a9ba6d203178fa8772259df6207942a.
2015-10-12 13:38:02 -07:00
Jaegeuk Kim
90b803e6fb f2fs: do not skip dentry block writes
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.

But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.

So, this patch removes skipping data writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:54 -07:00
Chao Yu
46c9e1413f f2fs: use correct flag in f2fs_map_blocks()
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc88 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
f9811703fe f2fs: fix to handle io error in ->direct_IO
Here is a oops reported as following message when testing generic/019 of
xfstest:

 ------------[ cut here ]------------
 kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
 invalid opcode: 0000 [#1] SMP
 Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
 CPU: 2 PID: 25441 Comm: fio Tainted: G           O    4.3.0-rc1+ #6
 Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
 task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
 RIP: 0010:[<ffffffffa0784981>]  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
 RSP: 0018:ffff8803fd61f918  EFLAGS: 00010246
 RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
 RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
 RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
 R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
 R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
 FS:  00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
 Stack:
  000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
  0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
  ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
 Call Trace:
  [<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
  [<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
  [<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
  [<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
  [<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
  [<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
  [<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
  [<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
  [<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
  [<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
  [<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
  [<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
  [<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
  [<ffffffff8120b913>] do_io_submit+0x373/0x630
  [<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
  [<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
  [<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
 Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
 RIP  [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
  RSP <ffff8803fd61f918>
 ---[ end trace 2e577d7f711ddb86 ]---

The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.

Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.

So, this patch changes to return EIO in direct_IO for this condition.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:52 -07:00
Chao Yu
973163fc0c f2fs: reorganize f2fs_map_blocks
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:

/* check status of mapping */

if (unmapped) {
	/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */

	if (create) {
		/* write path, handle dio write case here */
		alloc_and_map;
	} else {
		/*
		 * handle read cases from all call paths:
		 *     1. generic read;
		 *     2. dio read;
		 *     3. fiemap;
		 *     4. bmap
		 */
	}
}

/* map buffer_header */

Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:51 -07:00
Chao Yu
9edcdabf36 f2fs: fix overflow of size calculation
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:

pgoff_t index =  0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000

So we should cast index with 64-bits type to avoid this issue.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-10-09 16:20:50 -07:00
Linus Torvalds
4c12ab7e5e Merge tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "The major work includes fixing and enhancing the existing extent_cache
  feature, which has been well settling down so far and now it becomes a
  default mount option accordingly.

  Also, this version newly registers a f2fs memory shrinker to reclaim
  several objects consumed by a couple of data structures in order to
  avoid memory pressures.

  Another new feature is to add ioctl(F2FS_GARBAGE_COLLECT) which
  triggers a cleaning job explicitly by users.

  Most of the other patches are to fix bugs occurred in the corner cases
  across the whole code area"

* tag 'for-f2fs-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (85 commits)
  f2fs: upset segment_info repair
  f2fs: avoid accessing NULL pointer in f2fs_drop_largest_extent
  f2fs: update extent tree in batches
  f2fs: fix to release inode correctly
  f2fs: handle f2fs_truncate error correctly
  f2fs: avoid unneeded initializing when converting inline dentry
  f2fs: atomically set inode->i_flags
  f2fs: fix wrong pointer access during try_to_free_nids
  f2fs: use __GFP_NOFAIL to avoid infinite loop
  f2fs: lookup neighbor extent nodes for merging later
  f2fs: split __insert_extent_tree_ret for readability
  f2fs: kill dead code in __insert_extent_tree
  f2fs: adjust showing of extent cache stat
  f2fs: add largest/cached stat in extent cache
  f2fs: fix incorrect mapping for bmap
  f2fs: add annotation for space utilization of regular/inline dentry
  f2fs: fix to update cached_en of extent tree properly
  f2fs: fix typo
  f2fs: check the node block address of newly allocated nid
  f2fs: go out for insert_inode_locked failure
  ...
2015-09-03 13:10:22 -07:00
Linus Torvalds
1081230b74 Merge branch 'for-4.3/core' of git://git.kernel.dk/linux-block
Pull core block updates from Jens Axboe:
 "This first core part of the block IO changes contains:

   - Cleanup of the bio IO error signaling from Christoph.  We used to
     rely on the uptodate bit and passing around of an error, now we
     store the error in the bio itself.

   - Improvement of the above from myself, by shrinking the bio size
     down again to fit in two cachelines on x86-64.

   - Revert of the max_hw_sectors cap removal from a revision again,
     from Jeff Moyer.  This caused performance regressions in various
     tests.  Reinstate the limit, bump it to a more reasonable size
     instead.

   - Make /sys/block/<dev>/queue/discard_max_bytes writeable, by me.
     Most devices have huge trim limits, which can cause nasty latencies
     when deleting files.  Enable the admin to configure the size down.
     We will look into having a more sane default instead of UINT_MAX
     sectors.

   - Improvement of the SGP gaps logic from Keith Busch.

   - Enable the block core to handle arbitrarily sized bios, which
     enables a nice simplification of bio_add_page() (which is an IO hot
     path).  From Kent.

   - Improvements to the partition io stats accounting, making it
     faster.  From Ming Lei.

   - Also from Ming Lei, a basic fixup for overflow of the sysfs pending
     file in blk-mq, as well as a fix for a blk-mq timeout race
     condition.

   - Ming Lin has been carrying Kents above mentioned patches forward
     for a while, and testing them.  Ming also did a few fixes around
     that.

   - Sasha Levin found and fixed a use-after-free problem introduced by
     the bio->bi_error changes from Christoph.

   - Small blk cgroup cleanup from Viresh Kumar"

* 'for-4.3/core' of git://git.kernel.dk/linux-block: (26 commits)
  blk: Fix bio_io_vec index when checking bvec gaps
  block: Replace SG_GAPS with new queue limits mask
  block: bump BLK_DEF_MAX_SECTORS to 2560
  Revert "block: remove artifical max_hw_sectors cap"
  blk-mq: fix race between timeout and freeing request
  blk-mq: fix buffer overflow when reading sysfs file of 'pending'
  Documentation: update notes in biovecs about arbitrarily sized bios
  block: remove bio_get_nr_vecs()
  fs: use helper bio_add_page() instead of open coding on bi_io_vec
  block: kill merge_bvec_fn() completely
  md/raid5: get rid of bio_fits_rdev()
  md/raid5: split bio for chunk_aligned_read
  block: remove split code in blkdev_issue_{discard,write_same}
  btrfs: remove bio splitting and merge_bvec_fn() calls
  bcache: remove driver private bio splitting code
  block: simplify bio_add_page()
  block: make generic_make_request handle arbitrarily sized bios
  blk-cgroup: Drop unlikely before IS_ERR(_OR_NULL)
  block: don't access bio->bi_error after bio_put()
  block: shrink struct bio down to 2 cache lines again
  ...
2015-09-02 13:10:25 -07:00
Chao Yu
e2b4e2bc88 f2fs: fix incorrect mapping for bmap
The test step is like below:
1. touch file
2. truncate -s $((1024*1024)) file
3. fallocate -o 0 -l $((1024*1024)) file
4. fibmap.f2fs file

Our result of fibmap.f2fs showed below is not correct:

file_pos   start_blk     end_blk        blks
       0    -937166132    -937166132           1
    4096    -937166132    -937166132           1
    8192    -937166132    -937166132           1
   12288    -937166132    -937166132           1
   16384    -937166132    -937166132           1
   20480    -937166132    -937166132           1
...
 1040384    -937166132    -937166132           1
 1044480    -937166132    -937166132           1

This is because f2fs_map_blocks will return with no error when meeting
a hole or preallocated block, the caller __get_data_block will map the
uninitialized variable value to bh->b_blocknr.

Unfortunately generic_block_bmap will neither check the return value of
get_data() nor check mapping info of buffer_head, result in returning
the random block address.

After fixing the issue, our result shows correctly:

file_pos   start_blk     end_blk        blks
       0           0           0         256

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-21 22:45:14 -07:00
Jaegeuk Kim
740432f835 f2fs: handle failed bio allocation
As the below comment of bio_alloc_bioset, f2fs can allocate multiple bios at the
same time. So, we can't guarantee that bio is allocated all the time.

"
 *   When @bs is not NULL, if %__GFP_WAIT is set then bio_alloc will always be
 *   able to allocate a bio. This is due to the mempool guarantees. To make this
 *   work, callers must never allocate more than 1 bio at a time from this pool.
 *   Callers that need to allocate more than 1 bio must always submit the
 *   previously allocated bio for IO before attempting to allocate a new one.
 *   Failure to do so can cause deadlocks under memory pressure.
"

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-20 09:00:09 -07:00
Kent Overstreet
b54ffb73ca block: remove bio_get_nr_vecs()
We can always fill up the bio now, no need to estimate the possible
size based on queue parameters.

Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
[hch: rebased and wrote a changelog]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ming Lin <ming.l@ssi.samsung.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-08-13 12:32:04 -06:00
Chao Yu
decd36b6c4 f2fs: remove inmem radix tree
Previously, we use radix tree to index all registered page entries for
atomic file, but now we only use radix tree to see whether current page
is indexed or not, since the other user of radix tree is gone in commit
042b7816aa ("f2fs: remove unnecessary call to invalidate inmemory pages").

So in this patch, we try to use one more efficient way:
Introducing a macro ATOMIC_WRITTEN_PAGE, and setting it as page private
value to indicate page indexing status. By using this way, we can save
memory and lookup time.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-11 11:31:14 -07:00
Chao Yu
c15e8599ff f2fs: report EINVAL for unalignment direct IO
We run ltp testcase with f2fs and obtain a TFAIL in diotest4, the result in
detail is as fallow:

dio04

<<<test_start>>>
tag=dio04 stime=1432278894
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4    1  TPASS  :  Negative Offset
diotest4    2  TPASS  :  removed
diotest4    3  TFAIL  :  diotest4.c:129: write allows odd count.returns 1: Success
diotest4    4  TFAIL  :  diotest4.c:183: Odd count of read and write
diotest4    5  TPASS  :  Read beyond the file size
......

the result of ext4 with same environment:

dio04

<<<test_start>>>
tag=dio04 stime=1432259643
cmdline="diotest4"
contacts=""
analysis=exit
<<<test_output>>>
diotest4    1  TPASS  :  Negative Offset
diotest4    2  TPASS  :  removed
diotest4    3  TPASS  :  Odd count of read and write
diotest4    4  TPASS  :  Read beyond the file size
......

The reason is that when triggering DIO in f2fs, we will return zero value
in ->direct_IO if writer's buffer offset, file offset and transfer size is
not alignment to block size of filesystem, resulting in falling back into
buffered write instead of returning -EINVAL.

This patch fixes that problem by returning correct error number for above
case, and removing the judgement condition in check_direct_IO to make sure
the verification will be enabled for direct reader too.

Besides, Jaegeuk Kim pointed out that there is expectional cases we should
always make direct-io falling back into buffered write, such as dio in
encrypted file.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
[Chao Yu make small change and add detail description in commit message]
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-11 11:30:24 -07:00
Fan Li
759af1c9c1 f2fs: use extent cache to optimize f2fs_reserve_block
In some cases, we only need the block address when we call
f2fs_reserve_block,
other fields of struct dnode_of_data aren't necessary.
We can try extent cache first for such cases in order to speed up the
process.

Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 22:15:42 -07:00
Chao Yu
470f00e968 f2fs: fix to release inode page correctly
In following call path, we will pass a locked and referenced ipage
pointer to get_new_data_page:
 - init_inode_metadata
  - make_empty_dir
   - get_new_data_page

There are two exit paths in get_new_data_page when error occurs:
1) grab_cache_page fails, ipage will not be released;
2) f2fs_reserve_block fails, ipage will be released in callee.

So, it's not consistent for error handling in get_new_data_page.

For f2fs_reserve_block, it's not very easy to change the rule
of error handling, since it's already complicated.

Here we deside to choose an easy way to fix this issue:
If any error occur in get_new_data_page, we will ensure releasing
ipage in this function.

The same issue is in f2fs_convert_inline_dir, fix that too.

Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2015-08-05 08:08:23 -07:00