We can be more aggressive about this, if we are clever and careful. This is subtle.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are only 3 callers and quite a bit of that thing is executed
exactly in one of those. Just lift it there...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
What we want is to have non-counting references to children in
pagecache of parent directory, and avoid picking them after a child
has been freed. Fine, so let's just have ->d_prune() clear
parent's inode "has directory contents in page cache" flag.
That way we don't need ->d_fsdata for storing offsets, so we can
use it as a quick and dirty "is it referenced from page cache"
flag.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Pull vfs fixes from Al Viro:
"A couple of fixes - deadlock in CIFS and build breakage in cris serial
driver (resurfaced f_dentry in there)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Convert file->f_dentry->d_inode to file_inode()
fix deadlock in cifs_ioctl_clone()
Ensure that we deal correctly with the case where the server sends us a
newer instance of the same delegation. If the stateids match, but the
sequence numbers differ, then treat the new delegation as if it were
an atomic upgrade.
Signed-off-by: Trond Myklebust <Trond.Myklebust@primarydata.com>
Replace the current code with something that is a little closer to what
net/sunrpc/auth_unix.c uses.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Optimise the layout return on close code by ensuring that
1) Add a check for whether we hold a layout before taking any spinlocks
2) Only take the spin lock once
3) Use nfs_state->state to speed up open file checks
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Note, however, that we still serialise on the open stateid if the lock
stateid is unconfirmed. Hopefully that will not prove too much of a
burden for first time locks; it should leave the ability to parallelise
OPENs unchanged, since they no longer call the serialisation primitives.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Ensure that we test the lock stateid remained unchanged while we were
updating the VFS tracking of the byte range lock. Have the process
replay the lock to the server if we detect that was not the case.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
This patch ensures that the server cannot reorder our LOCK/LOCKU
requests if they are sent in parallel on the wire.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The original text in RFC3530 was terribly confusing since it conflated
lockowners and lock stateids. RFC3530bis clarifies that you must use
open_to_lock_owner when there is no lock state for that file+lockowner
combination.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we update the lock stateid, we really do need to ensure that this is
done under the state->state_lock, and that we are indeed only updating
confirmed locks with a newer version of the same stateid.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Remove the serialisation of OPEN/OPEN_DOWNGRADE and CLOSE calls for the
case of NFSv4.1 and newer.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
When we relax the sequencing on the NFSv4.1 OPEN/CLOSE code, we will want
to use the value NULL to indicate that no sequencing is needed.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
If an OPEN RPC call races with a CLOSE or OPEN_DOWNGRADE so that it
updates the nfs_state structure before the CLOSE/OPEN_DOWNGRADE has
a chance to do so, then we know that the state->flags need to be
recalculated from scratch.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Pull btrfs fixes from Chris Mason:
"We have a few fixes in my for-linus branch.
Qu Wenruo's batch fix a regression between some our merge window pull
and the inode_cache feature. The rest are smaller bugs"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.
btrfs: Fix the bug that fs_info->pending_changes is never cleared.
btrfs: fix state->private cast on 32 bit machines
Btrfs: fix race deleting block group from space_info->ro_bgs list
Btrfs: fix incorrect freeing in scrub_stripe
btrfs: sync ioctl, handle errors after transaction start
If we are to remove the serialisation of OPEN/CLOSE, then we need to
ensure that the stateid sent as part of a CLOSE operation does not
change after we test the state in nfs4_close_prepare.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
The BKL is completely out of the picture in the lockd and sunrpc code
these days. Update the antiquated comments that refer to it.
Signed-off-by: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Someone with a weird time_t happened to notice this, it shouldn't really
manifest till 2038. It may not be our ownly year-2038 problem.
Reported-by: Aaron Pace <Aaron.Pace@alcatel-lucent.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
In order to ensure that filenames are not released before the audit
subsystem is done with the strings there are a number of hacks built
into the fs and audit subsystems around getname() and putname(). To
say these hacks are "ugly" would be kind.
This patch removes the filename hackery in favor of a more
conventional reference count based approach. The diffstat below tells
most of the story; lots of audit/fs specific code is replaced with a
traditional reference count based approach that is easily understood,
even by those not familiar with the audit and/or fs subsystems.
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Enable recording of filenames in getname_kernel() and remove the
kludgy workaround in __audit_inode() now that we have proper filename
logging for kernel users.
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Reviewed-by: Richard Guy Briggs <rgb@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a) make it accept ERR_PTR() as filename (and return its PTR_ERR() in that case)
b) make it putname() the sucker in the end otherwise
simplifies life for callers...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
There are several areas in the kernel that create temporary filename
objects using the following pattern:
int func(const char *name)
{
struct filename *file = { .name = name };
...
return 0;
}
... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function. This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.
Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:
* https://lkml.org/lkml/2015/1/20/710
This patch includes code that was either based on, or directly written
by Al in the above thread.
CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
In preparation for expanded use in the kernel, make getname_kernel()
more useful by allowing it to handle any legal filename length.
Thanks to Guenter Roeck for his suggestion to substitute memcpy() for
strlcpy().
CC: linux@roeck-us.net
CC: viro@zeniv.linux.org.uk
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
xfs_compat_attrmulti_by_handle() calls memdup_user() which returns a
negative error code. The error code is negated by the caller and thus
incorrectly converted to a positive error code.
Remove the error negation such that the negative error is passed
correctly back up to userspace.
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
When the superblock is modified in a transaction, the commonly
modified fields are not actually copied to the superblock buffer to
avoid the buffer lock becoming a serialisation point. However, there
are some other operations that modify the superblock fields within
the transaction that don't directly log to the superblock but rely
on the changes to be applied during the transaction commit (to
minimise the buffer lock hold time).
When we do this, we fail to mark the buffer log item as being a
superblock buffer and that can lead to the buffer not being marked
with the corect type in the log and hence causing recovery issues.
Fix it by setting the type correctly, similar to xfs_mod_sb()...
cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Conversion from local to extent format does not set the buffer type
correctly on the new extent buffer when a symlink data is moved out
of line.
Fix the symlink code and leave a comment in the generic bmap code
reminding us that the format-specific data copy needs to set the
destination buffer type appropriately.
cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
This leads to log recovery throwing errors like:
XFS (md0): Mounting V5 Filesystem
XFS (md0): Starting recovery (logdev: internal)
XFS (md0): Unknown buffer type 0!
XFS (md0): _xfs_buf_ioapply: no ops on block 0xaea8802/0x1
ffff8800ffc53800: 58 41 47 49 .....
Which is the AGI buffer magic number.
Ensure that we set the type appropriately in both unlink list
addition and removal.
cc: <stable@vger.kernel.org> # 3.10 to current
Tested-by: Jan Kara <jack@suse.cz>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>