[ Upstream commit 406dab8450ec76eca88a1af2fc15d18a2b36ca49 ]
Lock sequence IDs are bumped in decode_lock by calling
nfs_increment_seqid(). nfs_increment_sequid() does not use the
seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't
increment lock sequence ID after NFS4ERR_MOVED").
Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 762674f86d0328d5dc923c966e209e1ee59663f2 upstream.
Donald Buczek reports that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.
The root cause is ultimately that the VFS is calling may_open()
before the NFS client has a chance to OPEN the file and hence revalidate
the access and attribute caches.
Al Viro suggests:
>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>> that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>
> 3) in do_last(), followed on success by vfs_open()
>
> That's all. All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Link: http://lkml.kernel.org/r/1451046656-26319-1-git-send-email-buczek@molgen.mpg.de
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c5fc09a1157a11dbe84e6421c3e0b37d05238cb upstream.
Donald Buczek reports that NFS clients can also report incorrect
results for access() due to lack of revalidation of attributes
before calling execute_ok().
Looking closely, it seems chdir() is afflicted with the same problem.
Fix is to ensure we call nfs_revalidate_inode_rcu() or
nfs_revalidate_inode() as appropriate before deciding to trust
execute_ok().
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: http://lkml.kernel.org/r/1451331530-3748-1-git-send-email-buczek@molgen.mpg.de
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed92d8c137b7794c2c2aa14479298b9885967607 upstream.
We're not taking into account that the space needed for the (variable
length) attr bitmap, with the result that we'd sometimes get a spurious
ERANGE when the ACL data got close to the end of a page.
Just add in an extra page to make sure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6682c14bbe505a8b912c57faf544f866777ee48d upstream.
Bitmap and attrlen follow immediately after the op reply header. This
was an oversight from commit bf118a342f.
Consequences of this are just minor efficiency (extra calls to
xdr_shrink_bufhead).
Fixes: bf118a342f "NFSv4: include bitmap in nfsv4 get acl data"
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a974deee477af89411e0f80456bfb344ac433c98 upstream.
If we exit because the file access check failed, we currently
leak the struct nfs4_state. We need to attach it to the
open context before returning.
Fixes: 3efb972247 ("NFSv4: Refactor _nfs4_open_and_get_state..")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a430607b2ef7c3be090f88c71cfcb1b3988aa7c0 upstream.
Some nfsv4.0 servers may return a mode for the verifier following an open
with EXCLUSIVE4 createmode, but this does not mean the client should skip
setting the mode in the following SETATTR. It should only do that for
EXCLUSIVE4_1 or UNGAURDED createmode.
Fixes: 5334c5bdac ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cfd278c280f997cf2fe4662e0acab0fe465f637b upstream.
Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.
This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.
So add a test for ds_clp == NULL and return NULL in that case.
Fixes: c23266d532 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79f687a3de9e3ba2518b4ea33f38ca6cbe9133eb upstream.
Ben Coddington reports that commit 311324ad17, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.
If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.
The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad17 ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee284e35d8c71bf5d4d807eaff6f67a17134b359 upstream.
We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.
Fixes: 500d701f33 ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0cf3ef5e0f47e385920450b245d22bead93e7ad upstream.
What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied. As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d55b352b01bc78fbc3d1bb650140668b87e58bf9 upstream.
A correct bugfix introduced a harmless warning that shows up with gcc-7:
fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.
The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.
The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.
Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f807e5ae5597bd65a6fff684083e8eaa21f3fa7 upstream.
The caller of rpc_run_task also gets a reference that must be put.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 304020fe48c6c7fff8b5a38f382b54404f0f79d3 upstream.
If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa05c87f23efe417adc7ff9b4193b7201ec0dd79 upstream.
We must not allow the use of delegations that have been revoked or are
being returned.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3f9e7239074613aa6bdafa4caf7c104fe1e7276 upstream.
If the delegation is revoked, then it can't be used for caching.
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c001c87a63aa2f35358e33eb05e45e4cbcb34f54 upstream.
We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73e6c5d854d3f7f75e8b46d3e54aeb5d83fe6b1f upstream.
According to the errata
https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
we should always send layout commit after a commit to DS.
Fixes: bc7d4b8fd0 ("nfs/filelayout: set layoutcommit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream.
Before commit 778be232a2 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a2 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b519d408ea32040b1c7e10b155a3ee9a36660947 upstream.
Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf0291dd2267a2b9a4cd74d65249553d11bb45d6 upstream.
According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.
Client Server
====== ======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
LAYOUTGET return seqid=X+1
LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
Forget layout stateid
Process LAYOUTGET
Set seqid=X+1
The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.
When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...
Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98b0f80c2396224bbbed81792b526e6c72ba9efa upstream.
On error, the callers expect us to return without bumping
nn->cb_users[].
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b88fa69eaa8649f11828158c7b65c4bcd886ebd5 upstream.
Ensure that the client conforms to the normative behaviour described in
RFC5661 Section 12.7.2: "If a client believes its lease has expired,
it MUST NOT send I/O to the storage device until it has validated its
lease."
So ensure that we wait for the lease to be validated before using
the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 149a4fddd0a72d526abbeac0c8deaab03559836a upstream.
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e547f2628327fec6afd2e03b46f113f614cca05b upstream.
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d20cb71dbf3487f24549ede1a8e2d67579b4632e upstream.
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be62a1a8fd116f5cd9e53726601f970e16e17558 upstream.
NFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9dfd8d741683347ee159d25f5b50c346a0df557 upstream.
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count. An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 275bb30786 ("NFSv4: Move dentry instantiation into the NFSv4-...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 082fa37d1351a41afc491d44a1d095cb8d919aa2 upstream.
We must not skip encoding the statistics, or the server will see an
XDR encoding error.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream.
We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.
1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2
__nfs4_close(), state->n_wronly--
nfs4_state_set_mode_locked(), changes state->state = [R]
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3
__update_open_stateid()
nfs_set_open_stateid_locked(), changes state->flags
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
new sequence number is exposed now via nfs4_stateid_copy()
next step would be update_open_stateflags(), pending so_lock
1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
nfs4_close_prepare() gets so_lock and recalcs flags -> send close
1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
__update_open_stateid() gets so_lock
* update_open_stateflags() updates state->n_wronly.
nfs4_state_set_mode_locked() updates state->state
state->flags is [RW]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
* should have suppressed the preceding nfs4_close_prepare() from
sending open_downgrade
1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
nfs_clear_open_stateid_locked()
state->flags is [R]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
1964409 -> write reply (fails, openmode)
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86fb449b07b8215443a30782dca5755d5b8b0577 upstream.
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ade14a7df796d4e86bd9d181193c883a57b13db0 upstream.
If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a093ceb053832c25b92f3cf26b957543c7baf9b upstream.
Since commit 2d8ae84fbc, nothing is bumping lo->plh_block_lgets in the
layoutreturn path, so it should not be touched in nfs4_layoutreturn_release
either.
Fixes: 2d8ae84fbc ("NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes: 68985633bc ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 42cb14b110 ("mm: migrate dirty page without
clear_page_dirty_for_io etc") simplified the migration of a PageDirty
pagecache page: one stat needs moving from zone to zone and that's about
all.
It's convenient and safest for it to shift the PageDirty bit from old
page to new, just before updating the zone stats: before copying data
and marking the new PageUptodate. This is all done while both pages are
isolated and locked, just as before; and just as before, there's a
moment when the new page is visible in the radix_tree, but not yet
PageUptodate. What's new is that it may now be briefly visible as
PageDirty before it is PageUptodate.
When I scoured the tree to see if this could cause a problem anywhere,
the only places I found were in two similar functions __r4w_get_page():
which look up a page with find_get_page() (not using page lock), then
claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.
I'm not sure whether that was right before, but now it might be wrong
(on rare occasions): only claim the page is uptodate if PageUptodate.
Or perhaps the page in question could never be migratable anyway?
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Boaz Harrosh <ooo@electrozaur.com>
Cc: Benny Halevy <bhalevy@panasas.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The NFSv4.1 callback channel is currently broken because the receive
message will keep shrinking because the backchannel receive buffer size
never gets reset.
The easiest solution to this problem is instead of changing the receive
buffer, to rather adjust the copied request.
Fixes: 38b7631fbe ("nfs4: limit callback decoding to received bytes")
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
pnfs_layout_process will check the returned layout stateid against what
the kernel has in-core. If it turns out that the stateid we received is
older, then we should resend the LAYOUTGET instead of falling back to
MDS I/O.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org # 3.18+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we pass in an empty nfs_fattr struct to nfs_update_inode, it will
(correctly) not update any of the attributes, but it then clears the
NFS_INO_INVALID_ATTR flag, which indicates that the attributes are
up to date. Don't clear the flag if the fattr struct has no valid
attrs to apply.
Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If we get no post-op attributes back from a SETATTR operation, then no
attributes will of course be updated during the call to
nfs_update_inode.
We know however that the attributes are invalid at that point, since we
just changed some of them. At the very least, the ctime will be bogus.
If we get no post-op attributes back on the call, mark the attrcache
invalid to reflect that fact.
Reviewed-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
A truncated cb_compound request will cause the client to decode null or
data from a previous callback for nfs4.1 backchannel case, or uninitialized
data for the nfs4.0 case. This is because the path through
svc_process_common() advances the request's iov_base and decrements iov_len
without adjusting the overall xdr_buf's len field. That causes
xdr_init_decode() to set up the xdr_stream with an incorrect length in
nfs4_callback_compound().
Fixing this for the nfs4.1 backchannel case first requires setting the
correct iov_len and page_len based on the length of received data in the
same manner as the nfs4.0 case.
Then the request's xdr_buf length can be adjusted for both cases based upon
the remaining iov_len and page_len.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If clp->cl_cb_ident is zero, then nfs_cb_idr_remove_locked() skips removing
it when the nfs_client is freed. A decoding or server bug can then find
and try to put that first nfs_client which would lead to a crash.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: d687031265 ("nfs4client: convert to idr_alloc()")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When LAYOUTGET gets NFS4ERR_DELAY, we currently will wait 15s before
retrying the call. That is a _very_ long time, so add a timeout value to
struct nfs4_layoutget and pass nfs4_async_handle_error a pointer to it.
This allows the RPC engine to use a sliding delay window, instead of a
15s delay.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
NFS v4.2 operations can work outside of pNFS, so dprintk() output
shouldn't be placed under NFSDBG_PNFS.
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The NFS CLONE_RANGE defintion was wrong and thus never worked. Fix this
by simply using the btrfs ioctl defintion.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Originally CLONE didn't allow for intra-file clones, but we recently
updated the spec to support this feature which is also supported by
local Linux file systems.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Without this for example 64-bit binaries on typical amd64 distributions
would not be able to use ioctls on NFS. For now this only affects clones.
Additionally ->compat_ioctl is defined even for non-compat builds, so
get rid of the pointless ifdef.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>