Commit graph

569807 commits

Author SHA1 Message Date
Jaegeuk Kim
5b80a5e2be f2fs: detect wrong layout
commit 2040fce83fe17763b07c97c1f691da2bb85e4135 upstream.

Previous mkfs.f2fs allows small partition inappropriately, so f2fs should detect
that as well.

Refer this in f2fs-tools.

mkfs.f2fs: detect small partition by overprovision ratio and # of segments

Reported-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:02 -07:00
Jaegeuk Kim
d1c2c35718 f2fs: call sync_fs when f2fs is idle
commit f455c8a5f0a24090e99249eb7280012376adec2c upstream.

The sync_fs in f2fs_balance_fs_bg must avoid interrupting current user requests.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:01 -07:00
Jaegeuk Kim
036ed1b8eb Revert "f2fs: use percpu_counter for # of dirty pages in inode"
commit 204706c7accfabb67b97eef9f9a28361b6201199 upstream.

This reverts commit 1beba1b3a953107c3ff5448ab4e4297db4619c76.

The perpcu_counter doesn't provide atomicity in single core and consume more
DRAM. That incurs fs_mark test failure due to ENOMEM.

Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:01 -07:00
Chao Yu
9a82dd23e4 f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
commit 0002b61bdaac732bcff364a18f5bd57c95def0a5 upstream.

We should use AOP_WRITEPAGE_ACTIVATE when we bypass writing pages.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Miao Xie <miaoxie@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:00 -07:00
Jaegeuk Kim
9f495d826b f2fs: do not activate auto_recovery for fallocated i_size
commit 26787236b36660baf4d136281d40b5bb33a570ec upstream.

If a file needs to keep its i_size by fallocate, we need to turn off auto
recovery during roll-forward recovery.

This will resolve the below scenario.

1. xfs_io -f /mnt/f2fs/file -c "pwrite 0 4096" -c "fsync"
2. xfs_io -f /mnt/f2fs/file -c "falloc -k 4096 4096" -c "fsync"
3. md5sum /mnt/f2fs/file;
4. godown /mnt/f2fs/
5. umount /mnt/f2fs/
6. mount -t f2fs /dev/sdx /mnt/f2fs
7. md5sum /mnt/f2fs/file

Reported-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:37:00 -07:00
Arnd Bergmann
55342cce61 f2fs: fix 32-bit build
commit 19c526515f6b998039d5d71fea879d255f173746 upstream.

The addition of multiple-device support broke CONFIG_BLK_DEV_ZONED
on 32-bit machines because of a 64-bit division:

fs/f2fs/f2fs.o: In function `__issue_discard_async':
extent_cache.c:(.text.__issue_discard_async+0xd4): undefined reference to `__aeabi_uldivmod'

Fortunately, bdev_zone_size() is guaranteed to return a power-of-two
number, so we can replace the % operator with a cheaper bit mask.

Fixes: 792b84b74b54 ("f2fs: support multiple devices")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:59 -07:00
Chao Yu
a680707a3f f2fs: fix incorrect free inode count in ->statfs
commit b08b12d2ddc85b977a0531470cf6a7158289aaaf upstream.

While calculating inode count that we can create at most in the left space,
we should consider space which data/node blocks occupied, since we create
data/node mixly in main area. So fix the wrong calculation in ->statfs.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:59 -07:00
Geliang Tang
23eb517756 f2fs: drop duplicate header timer.h
commit b4ceec29219e340178baa9c5f17bf97a42951cc8 upstream.

Drop duplicate header timer.h from segment.c.

Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:58 -07:00
Jaegeuk Kim
7ce8cbc7fa f2fs: fix wrong AUTO_RECOVER condition
commit 97dd26ad834739d4e4ea35fd7ab5f92824de4cbb upstream.

If i_size is not aligned to the f2fs's block size, we should not skip inode
update during fsync.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:58 -07:00
Jaegeuk Kim
7ab9a6acd0 f2fs: do not recover i_size if it's valid
commit 3a3a5ead7b6d2c9a29f493791ba23f264052db34 upstream.

If i_size is already valid during roll_forward recovery, we should not update
it according to the block alignment.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:57 -07:00
Chao Yu
8f5fcb8034 f2fs: fix fdatasync
commit 281518c694a5228d6c46fac83529fb3e2c331281 upstream.

For below two cases, we can't guarantee data consistence:

a)
1. xfs_io "pwrite 0 4195328" "fsync"
2. xfs_io "pwrite 4195328 1024" "fdatasync"
3. godown
4. umount & mount
--> isize we updated before fdatasync won't be recovered

b)
1. xfs_io "pwrite -S 0xcc 0 4202496" "fsync"
2. xfs_io "fpunch 4194304 4096" "fdatasync"
3. godown
4. umount & mount
--> dnode we punched before fdatasync won't be recovered

The reason is that normally fdatasync won't be aware of modification
of metadata in file, e.g. isize changing, dnode updating, so in ->fsync
we will skip flushing node pages for above cases, result in making
fdatasynced file being lost during recovery.

Currently we have introduced DIRTY_META global list in sbi for tracking
dirty inode selectively, so in fdatasync we can choose to flush nodes
depend on dirty state of current inode in the list.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:56 -07:00
Chao Yu
031017c6f9 f2fs: fix to account total free nid correctly
commit 04d47e673863c637a2b44ad34a558aeb5d0a727e upstream.

Thread A		Thread B		Thread C
- f2fs_create
 - f2fs_new_inode
  - f2fs_lock_op
   - alloc_nid
    alloc last nid
  - f2fs_unlock_op
			- f2fs_create
			 - f2fs_new_inode
			  - f2fs_lock_op
			   - alloc_nid
			    as node count still not
			    be increased, we will
			    loop in alloc_nid
						- f2fs_write_node_pages
						 - f2fs_balance_fs_bg
						  - f2fs_sync_fs
						   - write_checkpoint
						    - block_operations
						     - f2fs_lock_all
 - f2fs_lock_op

While creating new inode, we do not allocate and account nid atomically,
so that when there is almost no free nids left, we may encounter deadloop
like above stack.

In order to avoid that, reuse nm_i::available_nids for accounting free nids
and make nid allocation and counting being atomical during node creation.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:56 -07:00
Yunlei He
beaab6afb4 f2fs: fix an infinite loop when flush nodes in cp
commit d40a43af0a57a017eba9ad2679183791587ceb6a upstream.

Thread A			Thread B

- write_checkpoint
 - block_operations
   -blk_start_plug
    -sync_node_pages		- f2fs_do_sync_file
				 - fsync_node_pages
				  - f2fs_wait_on_page_writeback

Thread A wait for global F2FS_DIRTY_NODES decreased to zero,
it start a plug list, some requests have been added to this list.
Thread B lock one dirty node page, and wait this page write back.
But this page has been in plug list of thread A with PG_writeback flag.
Thread A keep on running and its plug list has no chance to finish,
so it seems a deadlock between cp and fsync path.

This patch add a wait on page write back before set node page dirty
to avoid this problem.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Pengyang Hou <houpengyang@huawei.com>
Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:55 -07:00
Chao Yu
9e266223b3 f2fs: don't wait writeback for datas during checkpoint
commit 36951b38d13ac7cce9fcf89e0e01c22ed0d05688 upstream.

Normally, while committing checkpoint, we will wait on all pages to be
writebacked no matter the page is data or metadata, so in scenario where
there are lots of data IO being submitted with metadata, we may suffer
long latency for waiting writeback during checkpoint.

Indeed, we only care about persistence for pages with metadata, but not
pages with data, as file system consistent are only related to metadate,
so in order to avoid encountering long latency in above scenario, let's
recognize and reference metadata in submitted IOs, wait writeback only
for metadatas.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:36:50 -07:00
Jaegeuk Kim
df3f20f12b f2fs: fix wrong written_valid_blocks counting
commit c79b7ff1d3c7710c23d8828a69d8cbc5597ad19f upstream.

Previously, written_valid_blocks was got by ckpt->valid_block_count. But if
the last checkpoint has some NEW_ADDR due to power-cut, we can get wrong value.
Fix it to get the number from actual written block count from sit entries.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:34 -07:00
Jaegeuk Kim
79ba046b11 f2fs: avoid BG_GC in f2fs_balance_fs
commit 7702bdbe505a22380dd958e2ee35124c7c414806 upstream.

If many threads hit has_not_enough_free_secs() in f2fs_balance_fs() at the same
time, all the threads would do FG_GC or BG_GC.
In this critical path, we totally don't need to do BG_GC at all.
Let's avoid that.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:31 -07:00
Jaegeuk Kim
8b2c7581e8 f2fs: fix redundant block allocation
commit c040ff9d69fd1d782fe577ba9e35c1f5798158ae upstream.

In direct_IO path of f2fs_file_write_iter(),
1. f2fs_preallocate_blocks(F2FS_GET_BLOCK_PRE_DIO)
   -> allocate LBA X
2. f2fs_direct_IO()
   -> return 0;

Then,
f2fs_write_data_page() will allocate another LBA X+1.

This makes EIO triggered by HM-SMR.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:29 -07:00
Jaegeuk Kim
bd8e41540a f2fs: use err for f2fs_preallocate_blocks
commit a7de608691f766cd148971a71d4f13aa1692d4c8 upstream.

This patch has no functional change.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:08 -07:00
Jaegeuk Kim
07f010798c f2fs: support multiple devices
commit 3c62be17d4f562f43fe1d03b48194399caa35aa5 upstream.

This patch implements multiple devices support for f2fs.
Given multiple devices by mkfs.f2fs, f2fs shows them entirely as one big
volume under one f2fs instance.

Internal block management is very simple, but we will modify block allocation
and background GC policy to boost IO speed by exploiting them accoording to
each device speed.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:13:03 -07:00
Jaegeuk Kim
f9baf967bd f2fs: allow dio read for LFS mode
commit e57e9ae5b179a6b243c42bf6d9549d1595c27089 upstream.

We can allow dio reads for LFS mode, while doing buffered writes for dio writes.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:40 -07:00
Jaegeuk Kim
00e5a211f9 f2fs: revert segment allocation for direct IO
commit 6ae1be13e85f4c42c8ca371fda50ae39eebbfd96 upstream.

Now we don't need to be too much careful about storage alignment for dio, since
its speed becomes quite fast and we'd better avoid any misalignment first.

Revert: 38aa0889b2 (f2fs: align direct_io'ed data to section)

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:37 -07:00
Yunlei He
5065008a77 f2fs: return directly if block has been removed from the victim
commit 206147112860418fa9363cf94ec2c172bdf4313a upstream.

If one block has been to written to a new place, just return
in move data process. This patch check it again with holding
page lock.

Signed-off-by: Yunlei He <heyunlei@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:35 -07:00
Chao Yu
e035bdfc8b Revert "f2fs: do not recover from previous remained wrong dnodes"
commit d47b8715953ad0cf82bb0a9d30d7b11d83bc9c11 upstream.

i_times of inode will be set with current system time which can be
configured through 'date', so it's not safe to judge dnode block as
garbage data or unchanged inode depend on i_times.

Now, we have used enhanced 'cp_ver + cp' crc method to verify valid
dnode block, so I expect recoverying invalid dnode is almost not
possible.

This reverts commit 807b1e1c8e08452948495b1a9985ab46d329e5c2.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:32 -07:00
Jaegeuk Kim
67609b1286 f2fs: remove checkpoint in f2fs_freeze
commit b4b9d34c855ef383402cd1acefb1e33a515ae5f5 upstream.

The generic freeze_super() calls sync_filesystems() before f2fs_freeze().
So, basically we don't need to do checkpoint in f2fs_freeze(). But, in xfs/068,
it triggers circular locking problem below due to gc_mutex for checkpoint.

======================================================
[ INFO: possible circular locking dependency detected ]
4.9.0-rc1+ #132 Tainted: G           OE
-------------------------------------------------------

1. wait for __sb_start_write() by

 [<ffffffff9845f353>] dump_stack+0x85/0xc2
 [<ffffffff980e80bf>] print_circular_bug+0x1cf/0x230
 [<ffffffff980eb4d0>] __lock_acquire+0x19e0/0x1bc0
 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff9826bdd0>] __sb_start_write+0x130/0x200
 [<ffffffffc08c7c3b>] ? f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffffc08c7c3b>] f2fs_drop_inode+0x9b/0x160 [f2fs]
 [<ffffffff98289991>] iput+0x171/0x2c0
 [<ffffffffc08cfccf>] f2fs_sync_inode_meta+0x3f/0xf0 [f2fs]
 [<ffffffffc08cfe04>] block_operations+0x84/0x110 [f2fs]
 [<ffffffffc08cff78>] write_checkpoint+0xe8/0xf20 [f2fs]
 [<ffffffff980e979d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffff9803e9d9>] ? sched_clock+0x9/0x10
 [<ffffffffc08c6de9>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c6df5>] f2fs_sync_fs+0x85/0x190 [f2fs]
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4f90>] ? do_fsync+0x70/0x70
 [<ffffffff982a4fb0>] sync_fs_one_sb+0x20/0x30
 [<ffffffff9826ca3e>] iterate_supers+0xae/0x100
 [<ffffffff982a50b5>] sys_sync+0x55/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

2. wait for sbi->gc_mutex by

 [<ffffffff980ebdcb>] lock_acquire+0x11b/0x220
 [<ffffffff989063d6>] mutex_lock_nested+0x76/0x3f0
 [<ffffffffc08c6de9>] f2fs_sync_fs+0x79/0x190 [f2fs]
 [<ffffffffc08c7a6c>] f2fs_freeze+0x1c/0x20 [f2fs]
 [<ffffffff9826b6ef>] freeze_super+0xcf/0x190
 [<ffffffff9827eebc>] do_vfs_ioctl+0x53c/0x6a0
 [<ffffffff9827f099>] SyS_ioctl+0x79/0x90
 [<ffffffff9890b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:30 -07:00
Jaegeuk Kim
0ec5bcd0fe f2fs: assign segments correctly for direct_io
commit bdb7d964c4fb73078ab21c20e8c7b7bf7e641bb6 upstream.

Previously, we assigned CURSEG_WARM_DATA for direct_io, but if we have two or
four logs, we do not use that type at all.
Let's fix it.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:27 -07:00
Chao Yu
b2942d459a f2fs: fix wrong i_atime recovery
commit 9f0552e078b8a25fcf9c3950a05ea1398616d2b9 upstream.

Shouldn't update in-memory i_atime with on-disk i_mtime of inode when
recovering inode.

Shuoran found this bug which is hidden for a long time, honour is belong
to him.

Signed-off-by: Shuoran Liu <liushuoran@huawei.com>
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:25 -07:00
Chao Yu
c82c6b15a1 f2fs: record inode updating status correctly
commit 60dcedc9972d136fab1600c919dd32080bcc26f7 upstream.

We should record updating status of inode only for living inode, for those
unlinked inode it needs to clear its ino cache, otherwise after the ino
was been reused, it will cause unneeded node page writing during ->fsync.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:15 -07:00
Damien Le Moal
e96476298b f2fs: Trace reset zone events
commit 126606c7a99b32ba8265a51fab01533fe40c9ecc upstream.

Similarly to the regular discard, trace zone reset events.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:14 -07:00
Damien Le Moal
0573aa0678 f2fs: Reset sequential zones on zoned block devices
commit f46e8809e88d113ba096bcfc772b5182ce00b941 upstream.

When a zoned block device is mounted, discarding sections
contained in sequential zones must reset the zone write pointer.
For sections contained in conventional zones, the regular discard
is used if the drive supports it.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:12:09 -07:00
Damien Le Moal
018fc18e28 f2fs: Cache zoned block devices zone type
commit 178053e2f1f9ccdb61ff6c2bd8644b53fc98e72e upstream.

With the zoned block device feature enabled, section discard
need to do a zone reset for sections contained in sequential
zones, and a regular discard (if supported) for sections
stored in conventional zones. Avoid the need for a costly
report zones to obtain a section zone type when discarding it
by caching the types of the device zones in the super block
information. This cache is initialized at mount time for mounts
with the zoned block device feature enabled.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:49 -07:00
Damien Le Moal
d9d8c376e4 f2fs: Do not allow adaptive mode for host-managed zoned block devices
commit 3adc57e97792e4ac9f228bde802829e2e9840afe upstream.

The LFS mode is mandatory for host-managed zoned block devices as
update in place optimizations are not possible for segments in
sequential zones.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:44 -07:00
Damien Le Moal
4b1d4ef0b7 f2fs: Always enable discard for zoned blocks devices
commit 96ba2decb4241aa2c6b61cfc8489d648769eff99 upstream.

Zone write pointer reset acts as discard for zoned block
devices. So if the zoned block device feature is enabled,
always declare that discard is enabled, even if the device
does not actually support the command.
For the same reason, prevent the use the "nodicard" mount
option.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:41 -07:00
Damien Le Moal
ecc252e7a4 f2fs: Suppress discard warning message for zoned block devices
commit 0ab0299835738cd407569401da1fef4c97b4419c upstream.

For zoned block devices, discard is replaced by zone reset. So
do not warn if the device does not supports discard.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:39 -07:00
Damien Le Moal
1529b8f943 f2fs: Check zoned block feature for host-managed zoned block devices
commit d1b959c8770260b611b9a1f0c5e8b12b7cb5b9d2 upstream.

The F2FS_FEATURE_BLKZONED feature indicates that the drive was formatted
 with zone alignment optimization. This is optional for host-aware
devices, but mandatory for host-managed zoned block devices.
So check that the feature is set in this latter case.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:36 -07:00
Damien Le Moal
22bbc1efdb f2fs: Use generic zoned block device terminology
commit 0bfd7a091c19132489a0f977b8dbf9f6b5ae0a1c upstream.

SMR stands for "Shingled Magnetic Recording" which makes sense
only for hard disk drives (spinning rust). The ZBC/ZAC standards
enable management of SMR disks, but solid state drives may also
support those standards. So rename the HMSMR feature to BLKZONED
to avoid a HDD centric terminology. For the same reason, rename
f2fs_sb_mounted_hmsmr to f2fs_sb_mounted_blkzoned.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:34 -07:00
Damien Le Moal
97df49a0c3 f2fs: Add missing break in switch-case
commit 487df616dec33231c99294b906d720d256a2de16 upstream.

Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:28 -07:00
Jaegeuk Kim
a91b9fe273 f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes
commit 099228000eff6b25e0f76b276043cd65cd4eba5a upstream.

This patch should fix an infinite loop case below.

F2FS-fs : inject IO error in f2fs_read_end_io+0xf3/0x120 [f2fs]
F2FS-fs (nvme0n1p1): recover_orphan_inode: orphan failed (ino=39ac1a), run fsck to fix.
...
[<ffffffffc0b11ede>] sync_meta_pages+0xae/0x270 [f2fs]
[<ffffffffc0b288dd>] ? flush_sit_entries+0x8d/0x960 [f2fs]
[<ffffffffc0b13801>] write_checkpoint+0x361/0xf20 [f2fs]
[<ffffffffb40e979d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffffc0b0a199>] ? f2fs_sync_fs+0x79/0x190 [f2fs]
[<ffffffffc0b0a1a5>] f2fs_sync_fs+0x85/0x190 [f2fs]
[<ffffffffc0b2560e>] f2fs_balance_fs_bg+0x7e/0x1c0 [f2fs]
[<ffffffffc0b216c4>] f2fs_write_node_pages+0x34/0x320 [f2fs]
[<ffffffffb41dff21>] do_writepages+0x21/0x30
[<ffffffffb429edb1>] __writeback_single_inode+0x61/0x760
[<ffffffffb490a937>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffb42a0805>] writeback_single_inode+0xd5/0x190
[<ffffffffb42a0959>] write_inode_now+0x99/0xc0
[<ffffffffb4289a16>] iput+0x1f6/0x2c0
[<ffffffffc0b0e3be>] f2fs_fill_super+0xe0e/0x1300 [f2fs]
[<ffffffffb426c394>] ? sget_userns+0x4f4/0x530
[<ffffffffb426c692>] mount_bdev+0x182/0x1b0
[<ffffffffc0b0d5b0>] ? f2fs_commit_super+0x100/0x100 [f2fs]
[<ffffffffc0b0a375>] f2fs_mount+0x15/0x20 [f2fs]
[<ffffffffb426d038>] mount_fs+0x38/0x170
[<ffffffffb428ec9b>] vfs_kern_mount+0x6b/0x160
[<ffffffffb4291d9e>] do_mount+0x1be/0xd60
[<ffffffffb4291a57>] ? copy_mount_options+0xb7/0x220
[<ffffffffb4292c54>] SyS_mount+0x94/0xd0
[<ffffffffb490b345>] entry_SYSCALL_64_fastpath+0x23/0xc6

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:27 -07:00
Chao Yu
7d2eab1921 f2fs: report error of f2fs_fill_dentries
commit ed6bd4b146527e7c6934e3582c47d7b857802676 upstream.

Report error of f2fs_fill_dentries to ->iterate_shared, otherwise when
error ocurrs, user may just list part of dirents in target directory
without any hints.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:27 -07:00
Jaegeuk Kim
d4ec990d25 fs/crypto: catch up 4.9-rc6
commit d117b9acaeada0b243f31e0fe83e111fcc9a6644 upstream.

Merge tag 'ext4_for_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 fixes from Ted Ts'o:
"A security fix (so a maliciously corrupted file system image won't
panic the kernel) and some fixes for CONFIG_VMAP_STACK"

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:11:02 -07:00
Arnd Bergmann
b3441f8c71 f2fs: hide a maybe-uninitialized warning
commit 230436b3ef3fd7d4a1da19edf5e87bb2d74e0fc2 upstream.

gcc is unsure about the use of last_ofs_in_node, which might happen
without a prior initialization:

fs/f2fs//git/arm-soc/fs/f2fs/data.c: In function ‘f2fs_map_blocks’:
fs/f2fs/data.c:799:54: warning: ‘last_ofs_in_node’ may be used uninitialized in this function [-Wmaybe-uninitialized]
   if (prealloc && dn.ofs_in_node != last_ofs_in_node + 1) {

As pointed out by Chao Yu, the code is actually correct as 'prealloc'
is only set if the last_ofs_in_node has been set, the two always
get updated together.

This initializes last_ofs_in_node to dn.ofs_in_node for each
new dnode at the start of the 'next_block' loop, which at that
point is a correct initialization as well. I assume that compilers
that correctly track the contents of the variables and do not
warn about the condition also figure out that they can eliminate
the extra assignment here.

Fixes: 46008c6d4232 ("f2fs: support in batch multi blocks preallocation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:29 -07:00
Jaegeuk Kim
3f137dda70 f2fs: remove percpu_count due to performance regression
commit 35782b233f37e48ecc469d9c7232f3f6a7fad41a upstream.

This patch removes percpu_count usage due to performance regression in iozone.

Fixes: 523be8a6b3 ("f2fs: use percpu_counter for page counters")
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:25 -07:00
Jaegeuk Kim
a15f017e8a f2fs: make clean inodes when flushing inode page
commit 18340edc8da20b0d399eb25ba4bb631b27652f46 upstream.

This patch tries to make more clean inodes when flushing dirty inodes in
checkpoint.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:22 -07:00
Jaegeuk Kim
0ef31c7bfa f2fs: keep dirty inodes selectively for checkpoint
commit 7c45729a4d6d1c90879e6c5c2df325c2f6db7191 upstream.

This is to avoid no free segment bug during checkpoint caused by a number of
dirty inodes.

The case was reported by Chao like this.
1. mount with lazytime option
2. fill 4k file until disk is full
3. sync filesystem
4. read all files in the image
5. umount

In this case, we actually don't need to flush dirty inode to inode page during
checkpoint.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:18 -07:00
Jaegeuk Kim
dafac77e8d f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
commit 02027d42c3f747945f19111d3da2092ed2148ac8 upstream.

This is for backport only.

fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:08:05 -07:00
Jaegeuk Kim
04030d21a7 f2fs: use BIO_MAX_PAGES for bio allocation
commit 664ba972df9b96942191db3068274cc1db899774 upstream.

We don't need to allocate bio partially in order to maximize sequential writes.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:06:10 -07:00
Jaegeuk Kim
c01ce254c7 f2fs: declare static function for __build_free_nids
commit 3e7b5bbbef7f5eb8a19aa61b611c704bf8230937 upstream.

This patch avoids build warning.

Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:06:05 -07:00
Jaegeuk Kim
92c9dec342 f2fs: call f2fs_balance_fs for setattr
commit 15d04354555fdfe8005e1365009e349148fb5f90 upstream.

If inode becomes dirty, we need to check the # of dirty inodes whether or not
further checkpoint would be required.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:49 -07:00
Jaegeuk Kim
beb74f7757 f2fs: count dirty inodes to flush node pages during checkpoint
commit b9610bdfcbdbb6017802ec6d1e073f445c98157d upstream.

If there are a lot of dirty inodes, we need to flush all of them when doing
checkpoint. So, we need to count this for enough free space.

Reviewed-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:40 -07:00
Chao Yu
9f99694bb7 f2fs: avoid casted negative value as shrink count
commit 02110a4fd53164db7cce3bb2780dce4d6c4e058f upstream.

This patch makes sure it returns a positive value instead of a probable
casted negative value as shrink count.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:35 -07:00
Chao Yu
e07f457ef7 f2fs: don't interrupt free nids building during nid allocation
commit 3a2ad5672bb36ee9c07bab97dadc8b0f70d391f4 upstream.

Let build_free_nids support sync/async methods, in allocation flow of nids,
we use synchronuous method, so that we can avoid looping in alloc_nid when
free memory is low; in unblock_operations and f2fs_balance_fs_bg we use
asynchronuous method in where low memory condition can interrupt us.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2017-09-25 15:05:30 -07:00