commit 49686cbbb3ebafe42e63868222f269d8053ead00 upstream.
nfs_idmap_legacy_upcall() is supposed to be called with 'aux' pointing
to a 'struct idmap', via the call to request_key_with_auxdata() in
nfs_idmap_request_key().
However it can also be reached via the request_key() system call in
which case 'aux' will be NULL, causing a NULL pointer dereference in
nfs_idmap_prepare_pipe_upcall(), assuming that the key description is
valid enough to get that far.
Fix this by making nfs_idmap_legacy_upcall() negate the key if no
auxdata is provided.
As usual, this bug was found by syzkaller. A simple reproducer using
the command-line keyctl program is:
keyctl request2 id_legacy uid:0 '' @s
Fixes: 57e62324e4 ("NFS: Store the legacy idmapper result in the keyring")
Reported-by: syzbot+5dfdbcf7b3eb5912abbb@syzkaller.appspotmail.com
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Trond Myklebust <trondmy@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1b8d97b0a837beaf48a8449955b52c650a7114b4 upstream.
If some of the WRITE calls making up an O_DIRECT write syscall fail,
we neglect to commit, even if some of the WRITEs succeed.
We also depend on the commit code to free the reference count on the
nfs_page taken in the "if (request_commit)" case at the end of
nfs_direct_write_completion(). The problem was originally noticed
because ENOSPC's encountered partway through a write would result in a
closed file being sillyrenamed when it should have been unlinked.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7f1bda447c9bd48b415acedba6b830f61591601f upstream.
The commit list can get very large, and so we need a cond_resched()
in nfs_commit_release_pages() in order to ensure we don't hog the CPU
for excessive periods of time.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ba4a76f703ab7eb72941fdaac848502073d6e9ee upstream.
Currently when falling back to doing I/O through the MDS (via
pnfs_{read|write}_through_mds), the client frees the nfs_pgio_header
without releasing the reference taken on the dreq
via pnfs_generic_pg_{read|write}pages -> nfs_pgheader_init ->
nfs_direct_pgio_init. It then takes another reference on the dreq via
nfs_generic_pg_pgios -> nfs_pgheader_init -> nfs_direct_pgio_init and
as a result the requester will become stuck in inode_dio_wait. Once
that happens, other processes accessing the inode will become stuck as
well.
Ensure that pnfs_read_through_mds() and pnfs_write_through_mds() clean
up correctly by calling hdr->completion_ops->completion() instead of
calling hdr->release() directly.
This can be reproduced (sometimes) by performing "storage failover
takeover" commands on NetApp filer while doing direct I/O from a client.
This can also be reproduced using SystemTap to simulate a failure while
doing direct I/O from a client (from Dave Wysochanski
<dwysocha@redhat.com>):
stap -v -g -e 'probe module("nfs_layout_nfsv41_files").function("nfs4_fl_prepare_ds").return { $return=NULL; exit(); }'
Suggested-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Scott Mayhew <smayhew@redhat.com>
Fixes: 1ca018d28d ("pNFS: Fix a memory leak when attempted pnfs fails")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 033853325fe3bdc70819a8b97915bd3bca41d3af ]
Currently client doesn't respect max sizes server returns in CREATE_SESSION.
nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
so they never get set to the sizes returned by the server.
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d803224c84be067754db7fa58a93f36f61566493 ]
On successful rename, the "old_dentry" is retained and is attached to
the "new_dir", so we need to call nfs_set_verifier() accordingly.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c6180a6237174f481dc856ed6e890d8196b6f0fb ]
If the server reboots multiple times, the client should rely on the
server to tell it that it cannot reclaim state as per section 9.6.3.4
in RFC7530 and section 8.4.2.1 in RFC5661.
Currently, the client is being to conservative, and is assuming that
if the server reboots while state recovery is in progress, then it must
ignore state that was not recovered before the reboot.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4b09ec4b14a168bf2c687e1f598140c3c11e9222 ]
I have reports of a crash that look like __fput() was called twice for
a NFSv4.0 file. It seems possible that the state manager could try to
reclaim a lock and take a reference on the fl->fl_file at the same time the
file is being released if, during the close(), a signal interrupts the wait
for outstanding IO while removing locks which then skips the removal
of that lock.
Since 83bfff23e9 ("nfs4: have do_vfs_lock take an inode pointer") has
removed the need to traverse fl->fl_file->f_inode in nfs4_lock_done(),
taking that reference is no longer necessary.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b688741cb06695312f18b730653d6611e1bad28d upstream.
For correct close-to-open semantics, NFS must validate
the change attribute of a directory (or file) on open.
Since commit ecf3d1f1aa ("vfs: kill FS_REVAL_DOT by adding a
d_weak_revalidate dentry op"), open() of "." or a path ending ".." is
not revalidated reliably (except when that direct is a mount point).
Prior to that commit, "." was revalidated using nfs_lookup_revalidate()
which checks the LOOKUP_OPEN flag and forces revalidation if the flag is
set.
Since that commit, nfs_weak_revalidate() is used for NFSv3 (which
ignores the flags) and nothing is used for NFSv4.
This is fixed by using nfs_lookup_verify_inode() in
nfs_weak_revalidate(). This does the revalidation exactly when needed.
Also, add a definition of .d_weak_revalidate for NFSv4.
The incorrect behavior is easily demonstrated by running "echo *" in
some non-mountpoint NFS directory while watching network traffic.
Without this patch, "echo *" sometimes doesn't produce any traffic.
With the patch it always does.
Fixes: ecf3d1f1aa ("vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op")
cc: stable@vger.kernel.org (3.9+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c05cefcc72416a37eba5a2b35f0704ed758a9145 upstream.
Before traversing a referral and performing a mount, the mounted-on
directory looks strange:
dr-xr-xr-x. 2 4294967294 4294967294 0 Dec 31 1969 dir.0
nfs4_get_referral is wiping out any cached attributes with what was
returned via GETATTR(fs_locations), but the bit mask for that
operation does not request any file attributes.
Retrieve owner and timestamp information so that the memcpy in
nfs4_get_referral fills in more attributes.
Changes since v1:
- Don't request attributes that the client unconditionally replaces
- Request only MOUNTED_ON_FILEID or FILEID attribute, not both
- encode_fs_locations() doesn't use the third bitmask word
Fixes: 6b97fd3da1 ("NFSv4: Follow a referral")
Suggested-by: Pradeep Thomas <pradeepthomas@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f02fee227e5f21981152850744a6084ff3fa94ee upstream.
The option was incorrectly masking off all other options.
Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 196639ebbe63a037fe9a80669140bd292d8bcd80 upstream.
The writeback code wants to send a commit after processing the pages,
which is why we want to delay releasing the struct path until after
that's done.
Also, the layout code expects that we do not free the inode before
we've put the layout segments in pnfs_writehdr_free() and
pnfs_readhdr_free()
Fixes: 919e3bd9a875 ("NFS: Ensure we commit after writeback is complete")
Fixes: 4714fb51fd ("nfs: remove pgio_header refcount, related cleanup")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8a9d6e964d318533ba3d2901ce153ba317c99a89 upstream.
The blocklayout code does not compile cleanly for a 32-bit sector_t,
and also has no reliable checks for devices sizes, which makes it
unsafe to use with a kernel that doesn't support large block devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 5c83746a0c ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing")
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1feb26162bee7b2f110facfec71b5c7bdbc7d14d upstream.
The client was freeing the nfs4_ff_layout_ds, but not the contained
nfs4_ff_ds_version array.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc89684c9a265828ce061037f1f79f4a68ccd3f7 upstream.
Since commit bafc9b754f ("vfs: More precise tests in d_invalidate")
in v3.18, a return of '0' from ->d_revalidate() will cause the dentry
to be invalidated even if it has filesystems mounted on or it or on a
descendant. The mounted filesystem is unmounted.
This means we need to be careful not to return 0 unless the directory
referred to truly is invalid. So -ESTALE or -ENOENT should invalidate
the directory. Other errors such a -EPERM or -ERESTARTSYS should be
returned from ->d_revalidate() so they are propagated to the caller.
A particular problem can be demonstrated by:
1/ mount an NFS filesystem using NFSv3 on /mnt
2/ mount any other filesystem on /mnt/foo
3/ ls /mnt/foo
4/ turn off network, or otherwise make the server unable to respond
5/ ls /mnt/foo &
6/ cat /proc/$!/stack # note that nfs_lookup_revalidate is in the call stack
7/ kill -9 $! # this results in -ERESTARTSYS being returned
8/ observe that /mnt/foo has been unmounted.
This patch changes nfs_lookup_revalidate() to only treat
-ESTALE from nfs_lookup_verify_inode() and
-ESTALE or -ENOENT from ->lookup()
as indicating an invalid inode. Other errors are returned.
Also nfs_check_inode_attributes() is changed to return -ESTALE rather
than -EIO. This is consistent with the error returned in similar
circumstances from nfs_update_inode().
As this bug allows any user to unmount a filesystem mounted on an NFS
filesystem, this fix is suitable for stable kernels.
Fixes: bafc9b754f ("vfs: More precise tests in d_invalidate")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 406dab8450ec76eca88a1af2fc15d18a2b36ca49 ]
Lock sequence IDs are bumped in decode_lock by calling
nfs_increment_seqid(). nfs_increment_sequid() does not use the
seqid_mutating_err() function fixed in commit 059aa7348241 ("Don't
increment lock sequence ID after NFS4ERR_MOVED").
Fixes: 059aa7348241 ("Don't increment lock sequence ID after ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Xuan Qi <xuan.qi@oracle.com>
Cc: stable@vger.kernel.org # v3.7+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 762674f86d0328d5dc923c966e209e1ee59663f2 upstream.
Donald Buczek reports that a nfs4 client incorrectly denies
execute access based on outdated file mode (missing 'x' bit).
After the mode on the server is 'fixed' (chmod +x) further execution
attempts continue to fail, because the nfs ACCESS call updates
the access parameter but not the mode parameter or the mode in
the inode.
The root cause is ultimately that the VFS is calling may_open()
before the NFS client has a chance to OPEN the file and hence revalidate
the access and attribute caches.
Al Viro suggests:
>>> Make nfs_permission() relax the checks when it sees MAY_OPEN, if you know
>>> that things will be caught by server anyway?
>>
>> That can work as long as we're guaranteed that everything that calls
>> inode_permission() with MAY_OPEN on a regular file will also follow up
>> with a vfs_open() or dentry_open() on success. Is this always the
>> case?
>
> 1) in do_tmpfile(), followed by do_dentry_open() (not reachable by NFS since
> it doesn't have ->tmpfile() instance anyway)
>
> 2) in atomic_open(), after the call of ->atomic_open() has succeeded.
>
> 3) in do_last(), followed on success by vfs_open()
>
> That's all. All calls of inode_permission() that get MAY_OPEN come from
> may_open(), and there's no other callers of that puppy.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=109771
Link: http://lkml.kernel.org/r/1451046656-26319-1-git-send-email-buczek@molgen.mpg.de
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 5c5fc09a1157a11dbe84e6421c3e0b37d05238cb upstream.
Donald Buczek reports that NFS clients can also report incorrect
results for access() due to lack of revalidation of attributes
before calling execute_ok().
Looking closely, it seems chdir() is afflicted with the same problem.
Fix is to ensure we call nfs_revalidate_inode_rcu() or
nfs_revalidate_inode() as appropriate before deciding to trust
execute_ok().
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Link: http://lkml.kernel.org/r/1451331530-3748-1-git-send-email-buczek@molgen.mpg.de
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ed92d8c137b7794c2c2aa14479298b9885967607 upstream.
We're not taking into account that the space needed for the (variable
length) attr bitmap, with the result that we'd sometimes get a spurious
ERANGE when the ACL data got close to the end of a page.
Just add in an extra page to make sure.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6682c14bbe505a8b912c57faf544f866777ee48d upstream.
Bitmap and attrlen follow immediately after the op reply header. This
was an oversight from commit bf118a342f.
Consequences of this are just minor efficiency (extra calls to
xdr_shrink_bufhead).
Fixes: bf118a342f "NFSv4: include bitmap in nfsv4 get acl data"
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a974deee477af89411e0f80456bfb344ac433c98 upstream.
If we exit because the file access check failed, we currently
leak the struct nfs4_state. We need to attach it to the
open context before returning.
Fixes: 3efb972247 ("NFSv4: Refactor _nfs4_open_and_get_state..")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a430607b2ef7c3be090f88c71cfcb1b3988aa7c0 upstream.
Some nfsv4.0 servers may return a mode for the verifier following an open
with EXCLUSIVE4 createmode, but this does not mean the client should skip
setting the mode in the following SETATTR. It should only do that for
EXCLUSIVE4_1 or UNGAURDED createmode.
Fixes: 5334c5bdac ("NFS: Send attributes in OPEN request for NFS4_CREATE_EXCLUSIVE4_1")
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cfd278c280f997cf2fe4662e0acab0fe465f637b upstream.
Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.
This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.
So add a test for ds_clp == NULL and return NULL in that case.
Fixes: c23266d532 ("NFS4.1 Fix data server connection race")
Signed-off-by: NeilBrown <neilb@suse.com>
Acked-by: Olga Kornievskaia <aglo@umich.edu>
Acked-by: Adamson, Andy <William.Adamson@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79f687a3de9e3ba2518b4ea33f38ca6cbe9133eb upstream.
Ben Coddington reports that commit 311324ad17, by adding the function
nfs_dir_mapping_need_revalidate() that checks page cache validity on
each call to nfs_readdir() causes a performance regression when
the directory is being modified.
If the directory is changing while we're iterating through the directory,
POSIX does not require us to invalidate the page cache unless the user
calls rewinddir(). However, we still do want to ensure that we use
readdirplus in order to avoid a load of stat() calls when the user
is doing an 'ls -l' workload.
The fix should be to invalidate the page cache immediately when we're
setting the NFS_INO_ADVISE_RDPLUS bit.
Reported-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 311324ad17 ("NFS: Be more aggressive in using readdirplus...")
Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
Tested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ee284e35d8c71bf5d4d807eaff6f67a17134b359 upstream.
We must put the task to sleep while holding the inode->i_lock in order
to ensure atomicity with the test for NFS_LAYOUT_RETURN.
Fixes: 500d701f33 ("NFS41: make close wait for layoutreturn")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c0cf3ef5e0f47e385920450b245d22bead93e7ad upstream.
What matters when deciding if we should make a page uptodate is
not how much we _wanted_ to copy, but how much we actually have
copied. As it is, on architectures that do not zero tail on
short copy we can leave uninitialized data in page marked uptodate.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d55b352b01bc78fbc3d1bb650140668b87e58bf9 upstream.
A correct bugfix introduced a harmless warning that shows up with gcc-7:
fs/nfs/callback.c: In function 'nfs_callback_up':
fs/nfs/callback.c:214:14: error: array subscript is outside array bounds [-Werror=array-bounds]
What happens here is that the 'minorversion == 0' check tells the
compiler that we assume minorversion can be something other than 0,
but when CONFIG_NFS_V4_1 is disabled that would be invalid and
result in an out-of-bounds access.
The added check for IS_ENABLED(CONFIG_NFS_V4_1) tells gcc that this
really can't happen, which makes the code slightly smaller and also
avoids the warning.
The bugfix that introduced the warning is marked for stable backports,
we want this one backported to the same releases.
Fixes: 98b0f80c2396 ("NFSv4.x: Fix a refcount leak in nfs_callback_up_net")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3f807e5ae5597bd65a6fff684083e8eaa21f3fa7 upstream.
The caller of rpc_run_task also gets a reference that must be put.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 304020fe48c6c7fff8b5a38f382b54404f0f79d3 upstream.
If the file permissions change on the server, then we may not be able to
recover open state. If so, we need to ensure that we mark the file
descriptor appropriately.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit aa05c87f23efe417adc7ff9b4193b7201ec0dd79 upstream.
We must not allow the use of delegations that have been revoked or are
being returned.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3f9e7239074613aa6bdafa4caf7c104fe1e7276 upstream.
If the delegation is revoked, then it can't be used for caching.
Fixes: 869f9dfa4d ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c001c87a63aa2f35358e33eb05e45e4cbcb34f54 upstream.
We should always do a layoutcommit after commit to DS, except if
the layout segment we're using has set FF_FLAGS_NO_LAYOUTCOMMIT.
Fixes: d67ae825a5 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 73e6c5d854d3f7f75e8b46d3e54aeb5d83fe6b1f upstream.
According to the errata
https://www.rfc-editor.org/errata_search.php?rfc=5661&eid=2751
we should always send layout commit after a commit to DS.
Fixes: bc7d4b8fd0 ("nfs/filelayout: set layoutcommit...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a4e187d83d88eeaba6252aac0a2ffe5eaa73a818 upstream.
Before commit 778be232a2 ("NFS do not find client in NFSv4
pg_authenticate"), the Linux callback server replied with
RPC_AUTH_ERROR / RPC_AUTH_BADCRED, instead of dropping the CB
request. Let's restore that behavior so the server has a chance to
do something useful about it, and provide a warning that helps
admins correct the problem.
Fixes: 778be232a2 ("NFS do not find client in NFSv4 ...")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b519d408ea32040b1c7e10b155a3ee9a36660947 upstream.
Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bf0291dd2267a2b9a4cd74d65249553d11bb45d6 upstream.
According to RFC5661, the client is responsible for serialising
LAYOUTGET and LAYOUTRETURN to avoid ambiguity. Consider the case
where we send both in parallel.
Client Server
====== ======
LAYOUTGET(seqid=X)
LAYOUTRETURN(seqid=X)
LAYOUTGET return seqid=X+1
LAYOUTRETURN return seqid=X+2
Process LAYOUTRETURN
Forget layout stateid
Process LAYOUTGET
Set seqid=X+1
The client processes the layoutget/layoutreturn in the wrong order,
and since the result of the layoutreturn was to clear the only
existing layout segment, the client forgets the layout stateid.
When the LAYOUTGET comes in, it is treated as having a completely
new stateid, and so the client sets the wrong sequence id...
Fix is to check if there are outstanding LAYOUTGET requests
before we send the LAYOUTRETURN (note that LAYOUGET will already
wait if it sees an outstanding LAYOUTRETURN).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98b0f80c2396224bbbed81792b526e6c72ba9efa upstream.
On error, the callers expect us to return without bumping
nn->cb_users[].
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b88fa69eaa8649f11828158c7b65c4bcd886ebd5 upstream.
Ensure that the client conforms to the normative behaviour described in
RFC5661 Section 12.7.2: "If a client believes its lease has expired,
it MUST NOT send I/O to the storage device until it has validated its
lease."
So ensure that we wait for the lease to be validated before using
the layout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 149a4fddd0a72d526abbeac0c8deaab03559836a upstream.
NFS doesn't expect requests with wb_bytes set to zero and may make
unexpected decisions about how to handle that request at the page IO layer.
Skip request creation if we won't have any wb_bytes in the request.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Reviewed-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e547f2628327fec6afd2e03b46f113f614cca05b upstream.
Olga Kornievskaia reports that the following test fails to trigger
an OPEN_DOWNGRADE on the wire, and only triggers the final CLOSE.
fd0 = open(foo, RDRW) -- should be open on the wire for "both"
fd1 = open(foo, RDONLY) -- should be open on the wire for "read"
close(fd0) -- should trigger an open_downgrade
read(fd1)
close(fd1)
The issue is that we're missing a check for whether or not the current
state transitioned from an O_RDWR state as opposed to having transitioned
from a combination of O_RDONLY and O_WRONLY.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Fixes: cd9288ffae ("NFSv4: Fix another bug in the close/open_downgrade code")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d20cb71dbf3487f24549ede1a8e2d67579b4632e upstream.
In "NFSv4: Move dentry instantiation into the NFSv4-specific atomic open code"
unconditional d_drop() after the ->open_context() had been removed. It had
been correct for success cases (there ->open_context() itself had been doing
dcache manipulations), but not for error ones. Only one of those (ENOENT)
got a compensatory d_drop() added in that commit, but in fact it should've
been done for all errors. As it is, the case of O_CREAT non-exclusive open
on a hashed negative dentry racing with e.g. symlink creation from another
client ended up with ->open_context() getting an error and proceeding to
call nfs_lookup(). On a hashed dentry, which would've instantly triggered
BUG_ON() in d_materialise_unique() (or, these days, its equivalent in
d_splice_alias()).
Tested-by: Oleg Drokin <green@linuxhacker.ru>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit be62a1a8fd116f5cd9e53726601f970e16e17558 upstream.
NFS may be used as lower layer of overlayfs and accessing f_path.dentry can
lead to a crash.
Fix by replacing direct access of file->f_path.dentry with the
file_dentry() accessor, which will always return a native object.
Fixes: 4bacc9c923 ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Tested-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Acked-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d9dfd8d741683347ee159d25f5b50c346a0df557 upstream.
In the case where d_add_unique() finds an appropriate alias to use it will
have already incremented the reference count. An additional dget() to swap
the open context's dentry is unnecessary and will leak a reference.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Fixes: 275bb30786 ("NFSv4: Move dentry instantiation into the NFSv4-...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 082fa37d1351a41afc491d44a1d095cb8d919aa2 upstream.
We must not skip encoding the statistics, or the server will see an
XDR encoding error.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 361cad3c89070aeb37560860ea8bfc092d545adc upstream.
We've seen this in a packet capture - I've intermixed what I
think was going on. The fix here is to grab the so_lock sooner.
1964379 -> #1 open (for write) reply seqid=1
1964393 -> #2 open (for read) reply seqid=2
__nfs4_close(), state->n_wronly--
nfs4_state_set_mode_locked(), changes state->state = [R]
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
1964398 -> #3 open (for write) call -> because close is already running
1964399 -> downgrade (to read) call seqid=2 (close of #1)
1964402 -> #3 open (for write) reply seqid=3
__update_open_stateid()
nfs_set_open_stateid_locked(), changes state->flags
state->flags is [RW]
state->state is [R], state->n_wronly == 0, state->n_rdonly == 1
new sequence number is exposed now via nfs4_stateid_copy()
next step would be update_open_stateflags(), pending so_lock
1964403 -> downgrade reply seqid=2, fails with OLD_STATEID (close of #1)
nfs4_close_prepare() gets so_lock and recalcs flags -> send close
1964405 -> downgrade (to read) call seqid=3 (close of #1 retry)
__update_open_stateid() gets so_lock
* update_open_stateflags() updates state->n_wronly.
nfs4_state_set_mode_locked() updates state->state
state->flags is [RW]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
* should have suppressed the preceding nfs4_close_prepare() from
sending open_downgrade
1964406 -> write call
1964408 -> downgrade (to read) reply seqid=4 (close of #1 retry)
nfs_clear_open_stateid_locked()
state->flags is [R]
state->state is [RW], state->n_wronly == 1, state->n_rdonly == 1
1964409 -> write reply (fails, openmode)
Signed-off-by: Andrew Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 86fb449b07b8215443a30782dca5755d5b8b0577 upstream.
Jeff reports seeing an Oops in ff_layout_alloc_lseg. Turns out
copy+paste has played cruel tricks on a nested loop.
Reported-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ade14a7df796d4e86bd9d181193c883a57b13db0 upstream.
If a NFSv4 client uses the cache_consistency_bitmask in order to
request only information about the change attribute, timestamps and
size, then it has not revalidated all attributes, and hence the
attribute timeout timestamp should not be updated.
Reported-by: Donald Buczek <buczek@molgen.mpg.de>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1a093ceb053832c25b92f3cf26b957543c7baf9b upstream.
Since commit 2d8ae84fbc, nothing is bumping lo->plh_block_lgets in the
layoutreturn path, so it should not be touched in nfs4_layoutreturn_release
either.
Fixes: 2d8ae84fbc ("NFSv4.1/pnfs: Remove redundant lo->plh_block_lgets...")
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Jan Stancek reported that I wrecked things for him by fixing things for
Vladimir :/
His report was due to an UNINTERRUPTIBLE wait getting -EINTR, which
should not be possible, however my previous patch made this possible by
unconditionally checking signal_pending().
We cannot use current->state as was done previously, because the
instruction after the store to that variable it can be changed. We must
instead pass the initial state along and use that.
Fixes: 68985633bc ("sched/wait: Fix signal handling in bit wait helpers")
Reported-by: Jan Stancek <jstancek@redhat.com>
Reported-by: Chris Mason <clm@fb.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Vladimir Murzin <vladimir.murzin@arm.com>
Tested-by: Chris Mason <clm@fb.com>
Reviewed-by: Paul Turner <pjt@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: tglx@linutronix.de
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: hpa@zytor.com
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>