Commit graph

1177 commits

Author SHA1 Message Date
Jaegeuk Kim
8ddb600e03 Merge remote-tracking branch 'origin/upstream-f2fs-stable-linux-4.4.y' into android-4.4
6944da0a68 treewide: Use array_size in f2fs_kvzalloc()
f15443db99 treewide: Use array_size() in f2fs_kzalloc()
3ea03ea4bd treewide: Use array_size() in f2fs_kmalloc()
c41203299a overflow.h: Add allocation size calculation helpers
d400752f54 f2fs: fix to clear FI_VOLATILE_FILE correctly
853e7339b6 f2fs: let sync node IO interrupt async one
6a4540cf19 f2fs: don't change wbc->sync_mode
588ecdfd7d f2fs: fix to update mtime correctly
1ae5aadab1 fs: f2fs: insert space around that ':' and ', '
39ee53e223 fs: f2fs: add missing blank lines after declarations
d5b4710fcf fs: f2fs: changed variable type of offset "unsigned" to "loff_t"
c35da89531 f2fs: clean up symbol namespace
fcf37e16f3 f2fs: make set_de_type() static
5d1633aa10 f2fs: make __f2fs_write_data_pages() static
cc8093af7c f2fs: fix to avoid accessing cross the boundary
b7f5594670 f2fs: fix to let caller retry allocating block address
e48fcd8576 disable loading f2fs module on PAGE_SIZE > 4KB
02afc275a5 f2fs: fix error path of move_data_page
0291bd36d0 f2fs: don't drop dentry pages after fs shutdown
a1259450b6 f2fs: fix to avoid race during access gc_thread pointer
d2e0f2f786 f2fs: clean up with clear_radix_tree_dirty_tag
c74034518f f2fs: fix to don't trigger writeback during recovery
e72a2cca82 f2fs: clear discard_wake earlier
b25a1872e9 f2fs: let discard thread wait a little longer if dev is busy
b125dfb20d f2fs: avoid stucking GC due to atomic write
405909e7f5 f2fs: introduce sbi->gc_mode to determine the policy
1f62e4702a f2fs: keep migration IO order in LFS mode
c4408c2387 f2fs: fix to wait page writeback during revoking atomic write
9db5be4af8 f2fs: Fix deadlock in shutdown ioctl
ed74404955 f2fs: detect synchronous writeback more earlier
91e7d9d2dd mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
feb94dc829 ceph: use pagevec_lookup_range_nr_tag()
f3aa4a25b8 mm: add variant of pagevec_lookup_range_tag() taking number of pages
8914877e37 mm: use pagevec_lookup_range_tag() in write_cache_pages()
26778b87a0 mm: use pagevec_lookup_range_tag() in __filemap_fdatawait_range()
94f1b99298 nilfs2: use pagevec_lookup_range_tag()
160355d69f gfs2: use pagevec_lookup_range_tag()
564108e83a f2fs: use find_get_pages_tag() for looking up single page
6cf6fb8645 f2fs: simplify page iteration loops
a05d8a6a2b f2fs: use pagevec_lookup_range_tag()
18a4848ffd ext4: use pagevec_lookup_range_tag()
1c7be24f65 ceph: use pagevec_lookup_range_tag()
e25fadabb5 btrfs: use pagevec_lookup_range_tag()
bf9510b162 mm: implement find_get_pages_range_tag()
461247b21f f2fs: clean up with is_valid_blkaddr()
a5d0ccbc18 f2fs: fix to initialize min_mtime with ULLONG_MAX
9bb4d22cf5 f2fs: fix to let checkpoint guarantee atomic page persistence
cdcf2b3e25 f2fs: fix to initialize i_current_depth according to inode type
331ae0c25b Revert "f2fs: add ovp valid_blocks check for bg gc victim to fg_gc"
2494cc7c0b f2fs: don't drop any page on f2fs_cp_error() case
0037c639e6 f2fs: fix spelling mistake: "extenstion" -> "extension"
2bba5b8eb8 f2fs: enhance sanity_check_raw_super() to avoid potential overflows
9bb86b63dc f2fs: treat volatile file's data as hot one
2cf6459036 f2fs: introduce release_discard_addr() for cleanup
03279ce90b f2fs: fix potential overflow
f46eddc4da f2fs: rename dio_rwsem to i_gc_rwsem
bb01582453 f2fs: move mnt_want_write_file after range check
8bb9a8da75 f2fs: fix missing clear FI_NO_PREALLOC in some error case
cb38cc4e1d f2fs: enforce fsync_mode=strict for renamed directory
26bf4e8a96 f2fs: sanity check for total valid node blocks
78f8b0f46f f2fs: sanity check on sit entry
ab758ada22 f2fs: avoid bug_on on corrupted inode
1a5d1966c0 f2fs: give message and set need_fsck given broken node id
b025f6dfc0 f2fs: clean up commit_inmem_pages()
7aff5c69da f2fs: do not check F2FS_INLINE_DOTS in recover
23d00b0287 f2fs: remove duplicated dquot_initialize and fix error handling
937f4ef79e f2fs: stop issue discard if something wrong with f2fs
a6d74bb282 f2fs: fix return value in f2fs_ioc_commit_atomic_write
258489ec52 f2fs: allocate hot_data for atomic write more strictly
aa857e0f3b f2fs: check if inmem_pages list is empty correctly
9d77ded0a7 f2fs: fix race in between GC and atomic open
0d17eb90b5 f2fs: change le32 to le16 of f2fs_inode->i_extra_size
ea2813111f f2fs: check cur_valid_map_mir & raw_sit block count when flush sit entries
9190cadf38 f2fs: correct return value of f2fs_trim_fs
17f85d0708 f2fs: fix to show missing bits in FS_IOC_GETFLAGS
3e90db63fc f2fs: remove unneeded F2FS_PROJINHERIT_FL
298032d4d4 f2fs: don't use GFP_ZERO for page caches
fdf61219dc f2fs: issue all big range discards in umount process
cd79eb2b5e f2fs: remove redundant block plug
ec034d0f14 f2fs: remove unmatched zero_user_segment when convert inline dentry
71aaced0e1 f2fs: introduce private inode status mapping
e7724207f7 fscrypt: log the crypto algorithm implementations
4cbda579cd crypto: api - Add crypto_type_has_alg helper
b24dcaae87 crypto: skcipher - Add low-level skcipher interface
a9146e4235 crypto: skcipher - Add helper to retrieve driver name
a0ca4bdf47 crypto: skcipher - Add default key size helper
eb13e0b692 fscrypt: add Speck128/256 support
27a0e77380 fscrypt: only derive the needed portion of the key
f68a71fa8f fscrypt: separate key lookup from key derivation
52359cf4fd fscrypt: use a common logging function
ff8e7c745e fscrypt: remove internal key size constants
7149dd4d39 fscrypt: remove unnecessary check for non-logon key type
56446c9142 fscrypt: make fscrypt_operations.max_namelen an integer
f572a22ef9 fscrypt: drop empty name check from fname_decrypt()
0077eff1d2 fscrypt: drop max_namelen check from fname_decrypt()
3f7af9d27f fscrypt: don't special-case EOPNOTSUPP from fscrypt_get_encryption_info()
52c51f7b7b fscrypt: don't clear flags on crypto transform
89b7fb8298 fscrypt: remove stale comment from fscrypt_d_revalidate()
d56de4e926 fscrypt: remove error messages for skcipher_request_alloc() failure
f68d3b84ae fscrypt: remove unnecessary NULL check when allocating skcipher
fb10231825 fscrypt: clean up after fscrypt_prepare_lookup() conversions
39b1444906 fscrypt: use unbound workqueue for decryption

Change-Id: Ied79ecd97385c05ef26e6b7b24d250eee9ec4e47
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2018-07-30 17:25:04 -07:00
Jan Kara
91e7d9d2dd mm: remove nr_pages argument from pagevec_lookup_{,range}_tag()
All users of pagevec_lookup() and pagevec_lookup_range() now pass
PAGEVEC_SIZE as a desired number of pages.  Just drop the argument.

Link: http://lkml.kernel.org/r/20171009151359.31984-15-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-08 17:37:49 -07:00
Jan Kara
feb94dc829 ceph: use pagevec_lookup_range_nr_tag()
Use new function for looking up pages since nr_pages argument from
pagevec_lookup_range_tag() is going away.

Link: http://lkml.kernel.org/r/20171009151359.31984-14-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-08 17:37:17 -07:00
Jan Kara
1c7be24f65 ceph: use pagevec_lookup_range_tag()
We want only pages from given range in ceph_writepages_start().  Use
pagevec_lookup_range_tag() instead of pagevec_lookup_tag() and remove
unnecessary code.

Link: http://lkml.kernel.org/r/20171009151359.31984-4-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Cc: Ilya Dryomov <idryomov@gmail.com>
Cc: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2018-07-08 17:33:26 -07:00
Yan, Zheng
8c7c3d5b78 ceph: drop negative child dentries before try pruning inode's alias
commit 040d786032bf59002d374b86d75b04d97624005c upstream.

Negative child dentry holds reference on inode's alias, it makes
d_prune_aliases() do nothing.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-20 10:04:52 +01:00
Jeff Layton
da0345d723 ceph: unlock dangling spinlock in try_flush_caps()
commit 6c2838fbdedb9b72a81c931d49e56b229b6cdbca upstream.

sparse warns:

  fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit

We need to exit this function with the lock unlocked, but a couple of
cases leave it locked.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-02 09:40:49 +01:00
Jeff Layton
c7a20ed295 ceph: clean up unsafe d_parent accesses in build_dentry_path
[ Upstream commit c6b0b656ca24ede6657abb4a2cd910fa9c1879ba ]

While we hold a reference to the dentry when build_dentry_path is
called, we could end up racing with a rename that changes d_parent.
Handle that situation correctly, by using the rcu_read_lock to
ensure that the parent dentry and inode stick around long enough
to safely check ceph_snap and ceph_ino.

Link: http://tracker.ceph.com/issues/18148
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:06 +02:00
Yan, Zheng
857d0b3dd7 ceph: fix readpage from fscache
commit dd2bc473482eedc60c29cf00ad12568ce40ce511 upstream.

ceph_readpage() unlocks page prematurely prematurely in the case
that page is reading from fscache. Caller of readpage expects that
page is uptodate when it get unlocked. So page shoule get locked
by completion callback of fscache_read_or_alloc_pages()

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-07 08:34:09 +02:00
Yan, Zheng
ba790013b5 ceph: fix race in concurrent readdir
commit 84583cfb973c4313955c6231cc9cb3772d280b15 upstream.

For a large directory, program needs to issue multiple readdir
syscalls to get all dentries. When there are multiple programs
read the directory concurrently. Following sequence of events
can happen.

 - program calls readdir with pos = 2. ceph sends readdir request
   to mds. The reply contains N1 entries. ceph adds these N1 entries
   to readdir cache.
 - program calls readdir with pos = N1+2. The readdir is satisfied
   by the readdir cache, N2 entries are returned. (Other program
   calls readdir in the middle, which fills the cache)
 - program calls readdir with pos = N1+N2+2. ceph sends readdir
   request to mds. The reply contains N3 entries and it reaches
   directory end. ceph adds these N3 entries to the readdir cache
   and marks directory complete.

The second readdir call does not update fi->readdir_cache_idx.
ceph add the last N3 entries to wrong places.

Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-27 15:06:09 -07:00
Fabian Frederick
044470266a fs: add i_blocksize()
commit 93407472a21b82f39c955ea7787e5bc7da100642 upstream.

Replace all 1 << inode->i_blkbits and (1 << inode->i_blkbits) in fs
branch.

This patch also fixes multiple checkpatch warnings: WARNING: Prefer
'unsigned int' to bare use of 'unsigned'

Thanks to Andrew Morton for suggesting more appropriate function instead
of macro.

[geliangtang@gmail.com: truncate: use i_blocksize()]
  Link: http://lkml.kernel.org/r/9c8b2cd83c8f5653805d43debde9fa8817e02fc4.1484895804.git.geliangtang@gmail.com
Link: http://lkml.kernel.org/r/1481319905-10126-1-git-send-email-fabf@skynet.be
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Geliang Tang <geliangtang@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-14 13:16:24 +02:00
Yan, Zheng
04f522476a ceph: fix recursion between ceph_set_acl() and __ceph_setattr()
commit 8179a101eb5f4ef0ac9a915fcea9a9d3109efa90 upstream.

ceph_set_acl() calls __ceph_setattr() if the setacl operation needs
to modify inode's i_mode. __ceph_setattr() updates inode's i_mode,
then calls posix_acl_chmod().

The problem is that __ceph_setattr() calls posix_acl_chmod() before
sending the setattr request. The get_acl() call in posix_acl_chmod()
can trigger a getxattr request. The reply of the getxattr request
can restore inode's i_mode to its old value. The set_acl() call in
posix_acl_chmod() sees old value of inode's i_mode, so it calls
__ceph_setattr() again.

Cc: stable@vger.kernel.org # needs backporting for < 4.9
Link: http://tracker.ceph.com/issues/19688
Reported-by: Jerry Lee <leisurelysw24@gmail.com>
Signed-off-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
[luis: introduce __ceph_setattr() and make ceph_set_acl() call it, as
 suggested by Yan.]
Signed-off-by: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: “Yan, Zheng” <zyan@redhat.com>
2017-05-25 14:30:13 +02:00
Luis Henriques
bb7031c7e5 ceph: fix memory leak in __ceph_setxattr()
commit eeca958dce0a9231d1969f86196653eb50fcc9b3 upstream.

The ceph_inode_xattr needs to be released when removing an xattr.  Easily
reproducible running the 'generic/020' test from xfstests or simply by
doing:

  attr -s attr0 -V 0 /mnt/test && attr -r attr0 /mnt/test

While there, also fix the error path.

Here's the kmemleak splat:

unreferenced object 0xffff88001f86fbc0 (size 64):
  comm "attr", pid 244, jiffies 4294904246 (age 98.464s)
  hex dump (first 32 bytes):
    40 fa 86 1f 00 88 ff ff 80 32 38 1f 00 88 ff ff  @........28.....
    00 01 00 00 00 00 ad de 00 02 00 00 00 00 ad de  ................
  backtrace:
    [<ffffffff81560199>] kmemleak_alloc+0x49/0xa0
    [<ffffffff810f3e5b>] kmem_cache_alloc+0x9b/0xf0
    [<ffffffff812b157e>] __ceph_setxattr+0x17e/0x820
    [<ffffffff812b1c57>] ceph_set_xattr_handler+0x37/0x40
    [<ffffffff8111fb4b>] __vfs_removexattr+0x4b/0x60
    [<ffffffff8111fd37>] vfs_removexattr+0x77/0xd0
    [<ffffffff8111fdd1>] removexattr+0x41/0x60
    [<ffffffff8111fe65>] path_removexattr+0x75/0xa0
    [<ffffffff81120aeb>] SyS_lremovexattr+0xb/0x10
    [<ffffffff81564b20>] entry_SYSCALL_64_fastpath+0x13/0x94
    [<ffffffffffffffff>] 0xffffffffffffffff

Signed-off-by: Luis Henriques <lhenriques@suse.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-05-20 14:27:01 +02:00
Jeff Layton
05a9143edb ceph: remove req from unsafe list when unregistering it
commit df963ea8a082d31521a120e8e31a29ad8a1dc215 upstream.

There's no reason a request should ever be on a s_unsafe list but not
in the request tree.

Link: http://tracker.ceph.com/issues/18474
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-15 09:57:13 +08:00
Jeff Layton
62c3d36309 ceph: fix bad endianness handling in parse_reply_info_extra
commit 6df8c9d80a27cb587f61b4f06b57e248d8bc3f86 upstream.

sparse says:

    fs/ceph/mds_client.c:291:23: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:293:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:294:28: warning: restricted __le32 degrades to integer
    fs/ceph/mds_client.c:296:28: warning: restricted __le32 degrades to integer

The op value is __le32, so we need to convert it before comparing it.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: Sage Weil <sage@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-26 08:23:49 +01:00
Jan Kara
57c9cfdb61 posix_acl: Clear SGID bit when setting file permissions
commit 073931017b49d9458aa351605b43a7e34598caef upstream.

When file permissions are modified via chmod(2) and the user is not in
the owning group or capable of CAP_FSETID, the setgid bit is cleared in
inode_change_ok().  Setting a POSIX ACL via setxattr(2) sets the file
permissions as well as the new ACL, but doesn't clear the setgid bit in
a similar way; this allows to bypass the check in chmod(2).  Fix that.

References: CVE-2016-7097
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-31 04:13:58 -06:00
Nikolay Borisov
e382e130d4 ceph: fix error handling in ceph_read_iter
commit 0d7718f666be181fda1ba2d08f137d87c1419347 upstream.

In case __ceph_do_getattr returns an error and the retry_op in
ceph_read_iter is not READ_INLINE, then it's possible to invoke
__free_page on a page which is NULL, this naturally leads to a crash.
This can happen when, for example, a process waiting on a MDS reply
receives sigterm.

Fix this by explicitly checking whether the page is set or not.

Signed-off-by: Nikolay Borisov <kernel@kyup.com>
Reviewed-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-28 03:01:35 -04:00
Yan, Zheng
f0006eda8e ceph: fix race during filling readdir cache
commit af5e5eb574776cdf1b756a27cc437bff257e22fe upstream.

Readdir cache uses page cache to save dentry pointers. When adding
dentry pointers to middle of a page, we need to make sure the page
already exists. Otherwise the beginning part of the page will be
invalid pointers.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Cc: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:42 +02:00
Linus Torvalds
ca4ba96e02 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "There are several patches from Ilya fixing RBD allocation lifecycle
  issues, a series adding a nocephx_sign_messages option (and associated
  bug fixes/cleanups), several patches from Zheng improving the
  (directory) fsync behavior, a big improvement in IO for direct-io
  requests when striping is enabled from Caifeng, and several other
  small fixes and cleanups"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: clear msg->con in ceph_msg_release() only
  libceph: add nocephx_sign_messages option
  libceph: stop duplicating client fields in messenger
  libceph: drop authorizer check from cephx msg signing routines
  libceph: msg signing callouts don't need con argument
  libceph: evaluate osd_req_op_data() arguments only once
  ceph: make fsync() wait unsafe requests that created/modified inode
  ceph: add request to i_unsafe_dirops when getting unsafe reply
  libceph: introduce ceph_x_authorizer_cleanup()
  ceph: don't invalidate page cache when inode is no longer used
  rbd: remove duplicate calls to rbd_dev_mapping_clear()
  rbd: set device_type::release instead of device::release
  rbd: don't free rbd_dev outside of the release callback
  rbd: return -ENOMEM instead of pool id if rbd_dev_create() fails
  libceph: use local variable cursor instead of &msg->cursor
  libceph: remove con argument in handle_reply()
  ceph: combine as many iovec as possile into one OSD request
  ceph: fix message length computation
  ceph: fix a comment typo
  rbd: drop null test before destroy functions
2015-11-13 09:24:40 -08:00
Michal Hocko
c62d25556b mm, fs: introduce mapping_gfp_constraint()
There are many places which use mapping_gfp_mask to restrict a more
generic gfp mask which would be used for allocations which are not
directly related to the page cache but they are performed in the same
context.

Let's introduce a helper function which makes the restriction explicit and
easier to track.  This patch doesn't introduce any functional changes.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-11-06 17:50:42 -08:00
Ilya Dryomov
79dbd1baa6 libceph: msg signing callouts don't need con argument
We can use msg->con instead - at the point we sign an outgoing message
or check the signature on the incoming one, msg->con is always set.  We
wouldn't know how to sign a message without an associated session (i.e.
msg->con == NULL) and being able to sign a message using an explicitly
provided authorizer is of no use.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-11-02 23:37:45 +01:00
Yan, Zheng
68cd5b4b76 ceph: make fsync() wait unsafe requests that created/modified inode
If we get a unsafe reply for request that created/modified inode,
add the unsafe request to a list in the newly created/modified
inode. So we can make fsync() wait these unsafe requests.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Yan, Zheng
4c06ace81a ceph: add request to i_unsafe_dirops when getting unsafe reply
Previously we add request to i_unsafe_dirops when registering
request. So ceph_fsync() also waits for imcomplete requests.
This is unnecessary, ceph_fsync() only needs to wait unsafe
requests.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Yan, Zheng
5e804ac482 ceph: don't invalidate page cache when inode is no longer used
ceph_check_caps() invalidate page cache when inode is not used
by any open file. This behaviour is not friendly for workload
that repeatly read files.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:48 +01:00
Zhu, Caifeng
b5b98989dc ceph: combine as many iovec as possile into one OSD request
Both ceph_sync_direct_write and ceph_sync_read iterate iovec elements
one by one, send one OSD request for each iovec. This is sub-optimal,
We can combine serveral iovec into one page vector, and send an OSD
request for the whole page vector.

Signed-off-by: Zhu, Caifeng <zhucaifeng@unissoft-nj.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:47 +01:00
Arnd Bergmann
777d738a5e ceph: fix message length computation
create_request_message() computes the maximum length of a message,
but uses the wrong type for the time stamp: sizeof(struct timespec)
may be 8 or 16 depending on the architecture, while sizeof(struct
ceph_timespec) is always 8, and that is what gets put into the
message.

Found while auditing the uses of timespec for y2038 problems.

Fixes: b8e69066d8 ("ceph: include time stamp in every MDS request")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:47 +01:00
Geliang Tang
1291fb950f ceph: fix a comment typo
Signed-off-by: Geliang Tang <geliangtang@163.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-11-02 23:36:47 +01:00
Benjamin Coddington
4f6563677a Move locks API users to locks_lock_inode_wait()
Instead of having users check for FL_POSIX or FL_FLOCK to call the correct
locks API function, use the check within locks_lock_inode_wait().  This
allows for some later cleanup.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
2015-10-22 14:57:36 -04:00
Linus Torvalds
e013f74b60 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph update from Sage Weil:
 "There are a few fixes for snapshot behavior with CephFS and support
  for the new keepalive protocol from Zheng, a libceph fix that affects
  both RBD and CephFS, a few bug fixes and cleanups for RBD from Ilya,
  and several small fixes and cleanups from Jianpeng and others"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: improve readahead for file holes
  ceph: get inode size for each append write
  libceph: check data_len in ->alloc_msg()
  libceph: use keepalive2 to verify the mon session is alive
  rbd: plug rbd_dev->header.object_prefix memory leak
  rbd: fix double free on rbd_dev->header_name
  libceph: set 'exists' flag for newly up osd
  ceph: cleanup use of ceph_msg_get
  ceph: no need to get parent inode in ceph_open
  ceph: remove the useless judgement
  ceph: remove redundant test of head->safe and silence static analysis warnings
  ceph: fix queuing inode to mdsdir's snaprealm
  libceph: rename con_work() to ceph_con_workfn()
  libceph: Avoid holding the zero page on ceph_msgr_slab_init errors
  libceph: remove the unused macro AES_KEY_SIZE
  ceph: invalidate dirty pages after forced umount
  ceph: EIO all operations after forced umount
2015-09-11 12:33:03 -07:00
Kirill A. Shutemov
7cbea8dc01 mm: mark most vm_operations_struct const
With two exceptions (drm/qxl and drm/radeon) all vm_operations_struct
structs should be constant.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-10 13:29:01 -07:00
Yan, Zheng
438386853d ceph: improve readahead for file holes
When readahead encounters file holes, osd reply returns error -ENOENT,
finish_read() skips adding pages to the the page cache. So readahead
does not work for file holes. The fix is adding zero pages to the
page cache when -ENOENT is returned.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-09 09:52:29 +03:00
Yan, Zheng
55b0b31cbc ceph: get inode size for each append write
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-09 09:52:29 +03:00
Jianpeng Ma
5fdb1389e1 ceph: cleanup use of ceph_msg_get
Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Jianpeng Ma
e36d571d70 ceph: no need to get parent inode in ceph_open
parent inode is needed in creating new inode case.  For ceph_open,
the target inode already exists.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Jianpeng Ma
a43137f7b0 ceph: remove the useless judgement
err != 0 is already handled. So skip this.

Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Brad Hubbard
1550d34e56 ceph: remove redundant test of head->safe and silence static analysis warnings
Signed-off-by: Brad Hubbard <bhubbard@redhat.com>
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Yan, Zheng
23078637e0 ceph: fix queuing inode to mdsdir's snaprealm
During MDS failovers, MClientSnap message may cause kclient to move
some inodes from root directory's snaprealm to mdsdir's snaprealm
and queue snapshots for these inodes. For a FS has never created any
snapshot, both root directory's snaprealm and mdsdir's snaprealm
share the same snapshot contexts (both are ceph_empty_snapc). This
confuses ceph_put_wrbuffer_cap_refs(), make it unable to distinguish
snapshot buffers from head buffers.

The fix is do not use ceph_empty_snapc as snaprealm's cached context.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:29 +03:00
Yan, Zheng
a341d4df87 ceph: invalidate dirty pages after forced umount
After forced umount, ceph_writepages_start() skips flushing dirty
pages. To make sure inode's reference count get dropped to zero,
we need to invalidate dirty pages.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:28 +03:00
Yan, Zheng
48fec5d0a5 ceph: EIO all operations after forced umount
This patch makes try_get_cap_refs() and __do_request() check
if the file system was forced umount, and return -EIO if it was.
This patch also adds a helper function to drops dirty caps and
wakes up blocking operation.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-09-08 23:14:28 +03:00
Kees Cook
a068acf2ee fs: create and use seq_show_option for escaping
Many file systems that implement the show_options hook fail to correctly
escape their output which could lead to unescaped characters (e.g.  new
lines) leaking into /proc/mounts and /proc/[pid]/mountinfo files.  This
could lead to confusion, spoofed entries (resulting in things like
systemd issuing false d-bus "mount" notifications), and who knows what
else.  This looks like it would only be the root user stepping on
themselves, but it's possible weird things could happen in containers or
in other situations with delegated mount privileges.

Here's an example using overlay with setuid fusermount trusting the
contents of /proc/mounts (via the /etc/mtab symlink).  Imagine the use
of "sudo" is something more sneaky:

  $ BASE="ovl"
  $ MNT="$BASE/mnt"
  $ LOW="$BASE/lower"
  $ UP="$BASE/upper"
  $ WORK="$BASE/work/ 0 0
  none /proc fuse.pwn user_id=1000"
  $ mkdir -p "$LOW" "$UP" "$WORK"
  $ sudo mount -t overlay -o "lowerdir=$LOW,upperdir=$UP,workdir=$WORK" none /mnt
  $ cat /proc/mounts
  none /root/ovl/mnt overlay rw,relatime,lowerdir=ovl/lower,upperdir=ovl/upper,workdir=ovl/work/ 0 0
  none /proc fuse.pwn user_id=1000 0 0
  $ fusermount -u /proc
  $ cat /proc/mounts
  cat: /proc/mounts: No such file or directory

This fixes the problem by adding new seq_show_option and
seq_show_option_n helpers, and updating the vulnerable show_option
handlers to use them as needed.  Some, like SELinux, need to be open
coded due to unusual existing escape mechanisms.

[akpm@linux-foundation.org: add lost chunk, per Kees]
[keescook@chromium.org: seq_show_option should be using const parameters]
Signed-off-by: Kees Cook <keescook@chromium.org>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Jan Kara <jack@suse.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Cc: J. R. Okajima <hooanon05g@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-04 16:54:41 -07:00
Yan, Zheng
fc927cd32f ceph: always re-send cap flushes when MDS recovers
commit e548e9b93d makes the kclient
only re-send cap flush once during MDS failover. If the kclient sends
a cap flush after MDS enters reconnect stage but before MDS recovers.
The kclient will skip re-sending the same cap flush when MDS recovers.

This causes problem for newly created inode. The MDS handles cap
flushes before replaying unsafe requests, so it's possible that MDS
find corresponding inode is missing when handling cap flush. The fix
is reverting to old behaviour: always re-send when MDS recovers

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-07-31 11:38:53 +03:00
Yan, Zheng
f6762cb2ca ceph: fix ceph_encode_locks_to_buffer()
posix locks should be in ctx->flc_posix list

Signed-off-by: Yan, Zheng <zyan@redhat.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2015-07-31 11:38:47 +03:00
Linus Torvalds
1dc51b8288 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull more vfs updates from Al Viro:
 "Assorted VFS fixes and related cleanups (IMO the most interesting in
  that part are f_path-related things and Eric's descriptor-related
  stuff).  UFS regression fixes (it got broken last cycle).  9P fixes.
  fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups".  The
  file_table locking rule change by Eric Dumazet is a rather big and
  fundamental update even if the patch isn't huge.   - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
  9p: cope with bogus responses from server in p9_client_{read,write}
  p9_client_write(): avoid double p9_free_req()
  9p: forgetting to cancel request on interrupted zero-copy RPC
  dax: bdev_direct_access() may sleep
  block: Add support for DAX reads/writes to block devices
  dax: Use copy_from_iter_nocache
  dax: Add block size note to documentation
  fs/file.c: __fget() and dup2() atomicity rules
  fs/file.c: don't acquire files->file_lock in fd_install()
  fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
  vfs: avoid creation of inode number 0 in get_next_ino
  namei: make set_root_rcu() return void
  make simple_positive() public
  ufs: use dir_pages instead of ufs_dir_pages()
  pagemap.h: move dir_pages() over there
  remove the pointless include of lglock.h
  fs: cleanup slight list_entry abuse
  xfs: Correctly lock inode when removing suid and file capabilities
  fs: Call security_ops->inode_killpriv on truncate
  fs: Provide function telling whether file_remove_privs() will do anything
  ...
2015-07-04 19:36:06 -07:00
Linus Torvalds
0c76c6ba24 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "We have a pile of bug fixes from Ilya, including a few patches that
  sync up the CRUSH code with the latest from userspace.

  There is also a long series from Zheng that fixes various issues with
  snapshots, inline data, and directory fsync, some simplification and
  improvement in the cap release code, and a rework of the caching of
  directory contents.

  To top it off there are a few small fixes and cleanups from Benoit and
  Hong"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
  rbd: use GFP_NOIO in rbd_obj_request_create()
  crush: fix a bug in tree bucket decode
  libceph: Fix ceph_tcp_sendpage()'s more boolean usage
  libceph: Remove spurious kunmap() of the zero page
  rbd: queue_depth map option
  rbd: store rbd_options in rbd_device
  rbd: terminate rbd_opts_tokens with Opt_err
  ceph: fix ceph_writepages_start()
  rbd: bump queue_max_segments
  ceph: rework dcache readdir
  crush: sync up with userspace
  crush: fix crash from invalid 'take' argument
  ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
  ceph: pre-allocate data structure that tracks caps flushing
  ceph: re-send flushing caps (which are revoked) in reconnect stage
  ceph: send TID of the oldest pending caps flush to MDS
  ceph: track pending caps flushing globally
  ceph: track pending caps flushing accurately
  libceph: fix wrong name "Ceph filesystem for Linux"
  ceph: fix directory fsync
  ...
2015-07-02 11:35:00 -07:00
Yan, Zheng
e1966b4944 ceph: fix ceph_writepages_start()
Before a page get locked, someone else can write data to the page
and increase the i_size. So we should re-check the i_size after
pages are locked.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 18:30:53 +03:00
Yan, Zheng
fdd4e15838 ceph: rework dcache readdir
Previously our dcache readdir code relies on that child dentries in
directory dentry's d_subdir list are sorted by dentry's offset in
descending order. When adding dentries to the dcache, if a dentry
already exists, our readdir code moves it to head of directory
dentry's d_subdir list. This design relies on dcache internals.
Al Viro suggests using ncpfs's approach: keeping array of pointers
to dentries in page cache of directory inode. the validity of those
pointers are presented by directory inode's complete and ordered
flags. When a dentry gets pruned, we clear directory inode's complete
flag in the d_prune() callback. Before moving a dentry to other
directory, we clear the ordered flag for both old and new directory.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:32 +03:00
Yan, Zheng
687265e5a8 ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
GFP_NOFS memory allocation is required for page writeback path.
But there is no need to use GFP_NOFS in syscall path and readpage
path

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
f66fd9f095 ceph: pre-allocate data structure that tracks caps flushing
Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
e548e9b93d ceph: re-send flushing caps (which are revoked) in reconnect stage
if flushing caps were revoked, we should re-send the cap flush in
client reconnect stage. This guarantees that MDS processes the cap
flush message before issuing the flushing caps to other client.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
a2971c8ccb ceph: send TID of the oldest pending caps flush to MDS
According to this information, MDS can trim its completed caps flush
list (which is used to detect duplicated cap flush).

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00
Yan, Zheng
8310b08913 ceph: track pending caps flushing globally
So we know TID of the oldest pending caps flushing. Later patch will
send this information to MDS, so that MDS can trim its completed caps
flush list.

Tracking pending caps flushing globally also simplifies syncfs code.

Signed-off-by: Yan, Zheng <zyan@redhat.com>
2015-06-25 11:49:31 +03:00