6944da0a68 treewide: Use array_size in f2fs_kvzalloc()
f15443db99 treewide: Use array_size() in f2fs_kzalloc()
3ea03ea4bd treewide: Use array_size() in f2fs_kmalloc()
c41203299a overflow.h: Add allocation size calculation helpers
d400752f54 f2fs: fix to clear FI_VOLATILE_FILE correctly
853e7339b6 f2fs: let sync node IO interrupt async one
6a4540cf19 f2fs: don't change wbc->sync_mode
588ecdfd7d f2fs: fix to update mtime correctly
1ae5aadab1 fs: f2fs: insert space around that ':' and ', '
39ee53e223 fs: f2fs: add missing blank lines after declarations
d5b4710fcf fs: f2fs: changed variable type of offset "unsigned" to "loff_t"
c35da89531 f2fs: clean up symbol namespace
fcf37e16f3 f2fs: make set_de_type() static
5d1633aa10 f2fs: make __f2fs_write_data_pages() static
cc8093af7c f2fs: fix to avoid accessing cross the boundary
b7f5594670 f2fs: fix to let caller retry allocating block address
e48fcd8576 disable loading f2fs module on PAGE_SIZE > 4KB
02afc275a5 f2fs: fix error path of move_data_page
0291bd36d0 f2fs: don't drop dentry pages after fs shutdown
a1259450b6 f2fs: fix to avoid race during access gc_thread pointer
d2e0f2f786 f2fs: clean up with clear_radix_tree_dirty_tag
c74034518f f2fs: fix to don't trigger writeback during recovery
e72a2cca82 f2fs: clear discard_wake earlier
b25a1872e9 f2fs: let discard thread wait a little longer if dev is busy
b125dfb20d f2fs: avoid stucking GC due to atomic write
405909e7f5 f2fs: introduce sbi->gc_mode to determine the policy
1f62e4702a f2fs: keep migration IO order in LFS mode
c4408c2387 f2fs: fix to wait page writeback during revoking atomic write
9db5be4af8 f2fs: Fix deadlock in shutdown ioctl
ed74404955 f2fs: detect synchronous writeback more earlier
91e7d9d2dd mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
feb94dc829 ceph: use pagevec_lookup_range_nr_tag()
f3aa4a25b8 mm: add variant of pagevec_lookup_range_tag() taking number of pages
8914877e37 mm: use pagevec_lookup_range_tag() in write_cache_pages()
26778b87a0 mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
94f1b99298 nilfs2: use pagevec_lookup_range_tag()
160355d69f gfs2: use pagevec_lookup_range_tag()
564108e83a f2fs: use find_get_pages_tag() for looking up single page
6cf6fb8645 f2fs: simplify page iteration loops
a05d8a6a2b f2fs: use pagevec_lookup_range_tag()
18a4848ffd ext4: use pagevec_lookup_range_tag()
1c7be24f65 ceph: use pagevec_lookup_range_tag()
e25fadabb5 btrfs: use pagevec_lookup_range_tag()
bf9510b162 mm: implement find_get_pages_range_tag()
461247b21f f2fs: clean up with is_valid_blkaddr()
a5d0ccbc18 f2fs: fix to initialize min_mtime with ULLONG_MAX
9bb4d22cf5 f2fs: fix to let checkpoint guarantee atomic page persistence
cdcf2b3e25 f2fs: fix to initialize i_current_depth according to inode type
331ae0c25b Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
2494cc7c0b f2fs: don't drop any page on f2fs_cp_error() case
0037c639e6 f2fs: fix spelling mistake: "extenstion" -> "extension"
2bba5b8eb8 f2fs: enhance sanity_check_raw_super() to avoid potential overflows
9bb86b63dc f2fs: treat volatile file's data as hot one
2cf6459036 f2fs: introduce release_discard_addr() for cleanup
03279ce90b f2fs: fix potential overflow
f46eddc4da f2fs: rename dio_rwsem to i_gc_rwsem
bb01582453 f2fs: move mnt_want_write_file after range check
8bb9a8da75 f2fs: fix missing clear FI_NO_PREALLOC in some error case
cb38cc4e1d f2fs: enforce fsync_mode=strict for renamed directory
26bf4e8a96 f2fs: sanity check for total valid node blocks
78f8b0f46f f2fs: sanity check on sit entry
ab758ada22 f2fs: avoid bug_on on corrupted inode
1a5d1966c0 f2fs: give message and set need_fsck given broken node id
b025f6dfc0 f2fs: clean up commit_inmem_pages()
7aff5c69da f2fs: do not check F2FS_INLINE_DOTS in recover
23d00b0287 f2fs: remove duplicated dquot_initialize and fix error handling
937f4ef79e f2fs: stop issue discard if something wrong with f2fs
a6d74bb282 f2fs: fix return value in f2fs_ioc_commit_atomic_write
258489ec52 f2fs: allocate hot_data for atomic write more strictly
aa857e0f3b f2fs: check if inmem_pages list is empty correctly
9d77ded0a7 f2fs: fix race in between GC and atomic open
0d17eb90b5 f2fs: change le32 to le16 of f2fs_inode->i_extra_size
ea2813111f f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
9190cadf38 f2fs: correct return value of f2fs_trim_fs
17f85d0708 f2fs: fix to show missing bits in FS_IOC_GETFLAGS
3e90db63fc f2fs: remove unneeded F2FS_PROJINHERIT_FL
298032d4d4 f2fs: don't use GFP_ZERO for page caches
fdf61219dc f2fs: issue all big range discards in umount process
cd79eb2b5e f2fs: remove redundant block plug
ec034d0f14 f2fs: remove unmatched zero_user_segment when convert inline dentry
71aaced0e1 f2fs: introduce private inode status mapping
e7724207f7 fscrypt: log the crypto algorithm implementations
4cbda579cd crypto: api - Add crypto_type_has_alg helper
b24dcaae87 crypto: skcipher - Add low-level skcipher interface
a9146e4235 crypto: skcipher - Add helper to retrieve driver name
a0ca4bdf47 crypto: skcipher - Add default key size helper
eb13e0b692 fscrypt: add Speck128/256 support
27a0e77380 fscrypt: only derive the needed portion of the key
f68a71fa8f fscrypt: separate key lookup from key derivation
52359cf4fd fscrypt: use a common logging function
ff8e7c745e fscrypt: remove internal key size constants
7149dd4d39 fscrypt: remove unnecessary check for non-logon key type
56446c9142 fscrypt: make fscrypt_operations.max_namelen an integer
f572a22ef9 fscrypt: drop empty name check from fname_decrypt()
0077eff1d2 fscrypt: drop max_namelen check from fname_decrypt()
3f7af9d27f fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
52c51f7b7b fscrypt: don't clear flags on crypto transform
89b7fb8298 fscrypt: remove stale comment from fscrypt_d_revalidate()
d56de4e926 fscrypt: remove error messages for skcipher_request_alloc() failure
f68d3b84ae fscrypt: remove unnecessary NULL check when allocating skcipher
fb10231825 fscrypt: clean up after fscrypt_prepare_lookup() conversions
39b1444906 fscrypt: use unbound workqueue for decryption
Change-Id: Ied79ecd97385c05ef26e6b7b24d250eee9ec4e47
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages. Just drop the argument.
Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Use new function for looking up pages since nr_pages argument from
pagevec_lookup_range_tag() is going away.
Link: http://lkml.kernel.org/r/20171009151359.31984-14-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We want only pages from given range in ceph_writepages_start(). Use
pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove
unnecessary code.
Link: http://lkml.kernel.org/r/20171009151359.31984-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
commit dd2bc473482eedc60c29cf00ad12568ce40ce511 upstream.
ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream.
Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.
This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'
Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.
[geliangtang@gmail.com: truncate: use i_blocksize()]
Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.
Let's introduce a helper function which makes the restriction explicit and
easier to track. This patch doesn't introduce any functional changes.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull Ceph update from Sage Weil:
"There are a few fixes for snapshot behavior with CephFS and support
for the new keepalive protocol from Zheng, a libceph fix that affects
both RBD and CephFS, a few bug fixes and cleanups for RBD from Ilya,
and several small fixes and cleanups from Jianpeng and others"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: improve readahead for file holes
ceph: get inode size for each append write
libceph: check data_len in ->alloc_msg()
libceph: use keepalive2 to verify the mon session is alive
rbd: plug rbd_dev->header.object_prefix memory leak
rbd: fix double free on rbd_dev->header_name
libceph: set 'exists' flag for newly up osd
ceph: cleanup use of ceph_msg_get
ceph: no need to get parent inode in ceph_open
ceph: remove the useless judgement
ceph: remove redundant test of head->safe and silence static analysis warnings
ceph: fix queuing inode to mdsdir's snaprealm
libceph: rename con_work() to ceph_con_workfn()
libceph: Avoid holding the zero page on ceph_msgr_slab_init errors
libceph: remove the unused macro AES_KEY_SIZE
ceph: invalidate dirty pages after forced umount
ceph: EIO all operations after forced umount
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When readahead encounters file holes, osd reply returns error -ENOENT,
finish_read() skips adding pages to the the page cache. So readahead
does not work for file holes. The fix is adding zero pages to the
page cache when -ENOENT is returned.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
After forced umount, ceph_writepages_start() skips flushing dirty
pages. To make sure inode's reference count get dropped to zero,
we need to invalidate dirty pages.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Before a page get locked, someone else can write data to the page
and increase the i_size. So we should re-check the i_size after
pages are locked.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path
Signed-off-by: Yan, Zheng <zyan@redhat.com>
In most cases that snap context is needed, we are holding
reference of CEPH_CAP_FILE_WR. So we can set ceph inode's
i_head_snapc when getting the CEPH_CAP_FILE_WR reference,
and make codes get snap context from i_head_snapc. This makes
the code simpler.
Another benefit of this change is that we can handle snap
notification more elegantly. Especially when snap context
is updated while someone else is doing write. The old queue
cap_snap code may set cap_snap's context to ether the old
context or the new snap context, depending on if i_head_snapc
is set. The new queue capp_snap code always set cap_snap's
context to the old snap context.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Cached_context in ceph_snap_realm is directly accessed by
uninline_data() and get_pool_perm(). This is racy in theory.
both uninline_data() and get_pool_perm() do not modify existing
object, they only create new object. So we can pass the empty
snap context to them. Unlike cached_context in ceph_snap_realm,
we do not need to protect the empty snap context.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Pull Ceph updates from Sage Weil:
"This time around we have a collection of CephFS fixes from Zheng
around MDS failure handling and snapshots, support for a new CRUSH
straw2 algorithm (to sync up with userspace) and several RBD cleanups
and fixes from Ilya, an error path leak fix from Taesoo, and then an
assorted collection of cleanups from others"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (28 commits)
rbd: rbd_wq comment is obsolete
libceph: announce support for straw2 buckets
crush: straw2 bucket type with an efficient 64-bit crush_ln()
crush: ensuring at most num-rep osds are selected
crush: drop unnecessary include from mapper.c
ceph: fix uninline data function
ceph: rename snapshot support
ceph: fix null pointer dereference in send_mds_reconnect()
ceph: hold on to exclusive caps on complete directories
libceph: simplify our debugfs attr macro
ceph: show non-default options only
libceph: expose client options through debugfs
libceph, ceph: split ceph_show_options()
rbd: mark block queue as non-rotational
libceph: don't overwrite specific con error msgs
ceph: cleanup unsafe requests when reconnecting is denied
ceph: don't zero i_wrbuffer_ref when reconnecting is denied
ceph: don't mark dirty caps when there is no auth cap
ceph: keep i_snap_realm while there are writers
libceph: osdmap.h: Add missing format newlines
...
When ceph_update_writeable_page fails (including -EAGAIN), it
unlocks (w/ unlock_page) the page but does not 'release'
(w/ page_cache_release) properly.
Upon error, properly set *pagep to NULL, indicating an error.
Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Pull Ceph changes from Sage Weil:
"On the RBD side, there is a conversion to blk-mq from Christoph,
several long-standing bug fixes from Ilya, and some cleanup from
Rickard Strandqvist.
On the CephFS side there is a long list of fixes from Zheng, including
improved session handling, a few IO path fixes, some dcache management
correctness fixes, and several blocking while !TASK_RUNNING fixes.
The core code gets a few cleanups and Chaitanya has added support for
TCP_NODELAY (which has been used on the server side for ages but we
somehow missed on the kernel client).
There is also an update to MAINTAINERS to fix up some email addresses
and reflect that Ilya and Zheng are doing most of the maintenance for
RBD and CephFS these days. Do not be surprised to see a pull request
come from one of them in the future if I am unavailable for some
reason"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
MAINTAINERS: update Ceph and RBD maintainers
libceph: kfree() in put_osd() shouldn't depend on authorizer
libceph: fix double __remove_osd() problem
rbd: convert to blk-mq
ceph: return error for traceless reply race
ceph: fix dentry leaks
ceph: re-send requests when MDS enters reconnecting stage
ceph: show nocephx_require_signatures and notcp_nodelay options
libceph: tcp_nodelay support
rbd: do not treat standalone as flatten
ceph: fix atomic_open snapdir
ceph: properly mark empty directory as complete
client: include kernel version in client metadata
ceph: provide seperate {inode,file}_operations for snapdir
ceph: fix request time stamp encoding
ceph: fix reading inline data when i_size > PAGE_SIZE
ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_close_sessions)
ceph: avoid block operation when !TASK_RUNNING (ceph_get_caps)
ceph: avoid block operation when !TASK_RUNNING (ceph_mdsc_sync)
rbd: fix error paths in rbd_dev_refresh()
...
when inode has inline data but its size > PAGE_SIZE (it was truncated
to larger size), previous direct read code return -EIO. This patch adds
code to return zeros for data whose offset > PAGE_SIZE.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Probably this code was syncing a lot more often then intended because
the do_sync variable wasn't set to zero.
Cc: stable@vger.kernel.org # v3.11+
Fixes: c62988ec09 ('ceph: avoid meaningless calling ceph_caps_revoking if sync_mode == WB_SYNC_ALL.')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ilya Dryomov <idryomov@redhat.com>
Before any data write, convert inline data to normal data and set
i_inline_version to CEPH_INLINE_NONE. The OSD request that saves
inline data to object contains 3 operations (CMPXATTR, WRITE and
SETXATTR). It compares a xattr named 'inline_version' to prevent
old data overwrites newer data.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
we can't use getattr to fetch inline data while holding Fr cap,
because it can cause deadlock. If we need to sync read inline data,
drop cap refs first, then use getattr to fetch inline data.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
we can't use getattr to fetch inline data after getting Fcr caps,
because it can cause deadlock. The solution is try bringing inline
data to page cache when not holding any cap, and hope the inline
data page is still there after getting the Fcr caps. If the page
is still there, pin it in page cache for later IO.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Request reply and cap message can contain inline data. add inline data
to the page cache if there is Fc cap.
Signed-off-by: Yan, Zheng <zyan@redhat.com>
allow specifying position of extent operation in multi-operations
osd request. This is required for cephfs to convert inline data to
normal data (compare xattr, then write object).
Signed-off-by: Yan, Zheng <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@redhat.com>
Both ceph_update_writeable_page and ceph_setattr will verify file size
with max size ceph supported.
There are two caller for ceph_update_writeable_page, ceph_write_begin and
ceph_page_mkwrite. For ceph_write_begin, we have already verified the size in
generic_write_checks of ceph_write_iter; for ceph_page_mkwrite, we have no
chance to change file size when mmap. Likewise we have already verified the size
in inode_change_ok when we call ceph_setattr.
So let's remove the redundant code for max file size verification.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Pull Ceph updates from Sage Weil:
"This has a mix of bug fixes and cleanups.
Alex's patch fixes a rare race in RBD. Ilya's patches fix an ENOENT
check when a second rbd image is mapped and a couple memory leaks.
Zheng fixes several issues with fragmented directories and multiple
MDSs. Josh fixes a spin/sleep issue, and Josh and Guangliang's
patches fix setting and unsetting RBD images read-only.
Naturally there are several other cleanups mixed in for good measure"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (23 commits)
rbd: only set disk to read-only once
rbd: move calls that may sleep out of spin lock range
rbd: add ioctl for rbd
ceph: use truncate_pagecache() instead of truncate_inode_pages()
ceph: include time stamp in every MDS request
rbd: fix ida/idr memory leak
rbd: use reference counts for image requests
rbd: fix osd_request memory leak in __rbd_dev_header_watch_sync()
rbd: make sure we have latest osdmap on 'rbd map'
libceph: add ceph_monc_wait_osdmap()
libceph: mon_get_version request infrastructure
libceph: recognize poolop requests in debugfs
ceph: refactor readpage_nounlock() to make the logic clearer
mds: check cap ID when handling cap export message
ceph: remember subtree root dirfrag's auth MDS
ceph: introduce ceph_fill_fragtree()
ceph: handle cap import atomically
ceph: pre-allocate ceph_cap struct for ceph_add_cap()
ceph: update inode fields according to issued caps
rbd: replace IS_ERR and PTR_ERR with PTR_ERR_OR_ZERO
...
Pull vfs updates from Al Viro:
"This the bunch that sat in -next + lock_parent() fix. This is the
minimal set; there's more pending stuff.
In particular, I really hope to get acct.c fixes merged this cycle -
we need that to deal sanely with delayed-mntput stuff. In the next
pile, hopefully - that series is fairly short and localized
(kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
iov_iter work. Most of prereqs for ->splice_write with sane locking
order are there and Kent's dio rewrite would also fit nicely on top of
this pile"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
lock_parent: don't step on stale ->d_parent of all-but-freed one
kill generic_file_splice_write()
ceph: switch to iter_file_splice_write()
shmem: switch to iter_file_splice_write()
nfs: switch to iter_splice_write_file()
fs/splice.c: remove unneeded exports
ocfs2: switch to iter_file_splice_write()
->splice_write() via ->write_iter()
bio_vec-backed iov_iter
optimize copy_page_{to,from}_iter()
bury generic_file_aio_{read,write}
lustre: get rid of messing with iovecs
ceph: switch to ->write_iter()
ceph_sync_direct_write: stop poking into iov_iter guts
ceph_sync_read: stop poking into iov_iter guts
new helper: copy_page_from_iter()
fuse: switch to ->write_iter()
btrfs: switch to ->write_iter()
ocfs2: switch to ->write_iter()
xfs: switch to ->write_iter()
...
Update the last pr_warning callsites in fs branch
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Sage Weil <sage@inktank.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If the return value of ceph_osdc_readpages() is not negative,
it is certainly greater than or equal to zero.
Remove the useless condition judgment and redundant braces.
Signed-off-by: Zhang Zhen <zhenzhang.zhang@huawei.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
PAGE_CACHE_SIZE is unsigned long on all architectures, however size_t
is either unsigned int or unsigned long. Rather than change format
strings, cast PAGE_CACHE_SIZE to size_t to be in line with dout()s in
ceph_page_mkwrite().
Cc: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Currently, if one new page allocated into fscache in readpage(), however,
with no data read into due to error encountered during reading from OSDs,
the slot in fscache is not uncached. This patch fixes this.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
Adds cap check to the page fault handler. The check prevents page
fault handler from adding new page to the page cache while Fcb caps
are being revoked. This solves Fc revoking hang in multiple clients
mmap IO workload.
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Clean up if error occurred rather than going through normal process
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Yunchuan Wen <yunchuanwen@ubuntukylin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
If the length of data to be read in readpage() is exactly
PAGE_CACHE_SIZE, the original code does not flush d-cache
for data consistency after finishing reading. This patches fixes
this.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
ceph_osdc_readpages() returns number of bytes read, currently,
the code only allocate full-zero page into fscache, this patch
fixes this.
Signed-off-by: Li Wang <liwang@ubuntukylin.com>
Reviewed-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Sage Weil <sage@inktank.com>
Previous patch that allowed us to cleanup most of the issues with pages marked
as private_2 when calling ceph_readpages. However, there seams to be a case in
the error case clean up in start read that still trigers this from time to
time. I've only seen this one a couple times.
BUG: Bad page state in process petabucket pfn:335b82
page:ffffea000cd6e080 count:0 mapcount:0 mapping: (null) index:0x0
page flags: 0x200000000001000(private_2)
Call Trace:
[<ffffffff81563442>] dump_stack+0x46/0x58
[<ffffffff8112c7f7>] bad_page+0xc7/0x120
[<ffffffff8112cd9e>] free_pages_prepare+0x10e/0x120
[<ffffffff8112e580>] free_hot_cold_page+0x40/0x160
[<ffffffff81132427>] __put_single_page+0x27/0x30
[<ffffffff81132d95>] put_page+0x25/0x40
[<ffffffffa02cb409>] ceph_readpages+0x2e9/0x6f0 [ceph]
[<ffffffff811313cf>] __do_page_cache_readahead+0x1af/0x260
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
In some cases the ceph readapages code code bails without filling all the pages
already marked by fscache. When we return back to readahead code this causes
a BUG.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Adding support for fscache to the Ceph filesystem. This would bring it to on
par with some of the other network filesystems in Linux (like NFS, AFS, etc...)
In order to mount the filesystem with fscache the 'fsc' mount option must be
passed.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Following we will begin to add memcg dirty page accounting around
__set_page_dirty_{buffers,nobuffers} in vfs layer, so we'd better use vfs interface to
avoid exporting those details to filesystems.
Since vfs set_page_dirty() should be called under page lock, here we don't need elaborate
codes to handle racy anymore, and two WARN_ON() are added to detect such exceptions.
Thanks very much for Sage and Yan Zheng's coaching!
I tested it in a two server's ceph environment that one is client and the other is
mds/osd/mon, and run the following fsx test from xfstests:
./fsx 1MB -N 50000 -p 10000 -l 1048576
./fsx 10MB -N 50000 -p 10000 -l 10485760
./fsx 100MB -N 50000 -p 10000 -l 104857600
The fsx does lots of mmap-read/mmap-write/truncate operations and the tests completed
successfully without triggering any of WARN_ON.
Signed-off-by: Sha Zhengju <handai.szj@taobao.com>
Reviewed-by: Sage Weil <sage@inktank.com>
The invalidatepage code bails if it encounters a non-zero page offset. The
current logic that does is non-obvious with multiple if statements.
This should be logically and functionally equivalent.
Signed-off-by: Milosz Tanski <milosz@adfin.com>
Reviewed-by: Sage Weil <sage@inktank.com>