nfs41_validate_delegation_stateid is broken if we supply a stateid with
a non-zero sequence id. Instead of trying to match the sequence id,
the function assumes that we always want to error. While this is
true for a delegation callback, it is not true in general.
Also fix a typo in nfs4_callback_recall.
Reported-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we know that the delegation stateid is bad or revoked, we need to
remove that delegation as soon as possible, and then mark all the
stateids that relied on that delegation for recovery. We cannot use
the delegation as part of the recovery process.
Also note that NFSv4.1 uses a different error code (NFS4ERR_DELEG_REVOKED)
to indicate that the delegation was revoked.
Finally, ensure that setlk() and setattr() can both recover safely from
a revoked delegation.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
The check for 'fh == NULL' needs to come _before_ we dereference
fh.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Conflicts:
drivers/net/vmxnet3/vmxnet3_drv.c
Small vmxnet3 conflict with header size bug fix in 'net'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge the emailed seties of 19 patches from Andrew Morton
* akpm:
rapidio/tsi721: fix queue wrapping bug in inbound doorbell handler
memcg: fix mapcount check in move charge code for anonymous page
mm: thp: fix BUG on mm->nr_ptes
alpha: fix 32/64-bit bug in futex support
memcg: fix GPF when cgroup removal races with last exit
debugobjects: Fix selftest for static warnings
floppy/scsi: fix setting of BIO flags
memcg: fix deadlock by inverting lrucare nesting
drivers/rtc/rtc-r9701.c: fix crash in r9701_remove()
c2port: class_create() returns an ERR_PTR
pps: class_create() returns an ERR_PTR, not NULL
hung_task: fix the broken rcu_lock_break() logic
vfork: kill PF_STARTING
coredump_wait: don't call complete_vfork_done()
vfork: make it killable
vfork: introduce complete_vfork_done()
aio: wake up waiters when freeing unused kiocbs
kprobes: return proper error code from register_kprobe()
kmsg_dump: don't run on non-error paths by default
Now that CLONE_VFORK is killable, coredump_wait() no longer needs
complete_vfork_done(). zap_threads() should find and kill all tasks with
the same ->mm, this includes our parent if ->vfork_done is set.
mm_release() becomes the only caller, unexport complete_vfork_done().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
No functional changes.
Move the clear-and-complete-vfork_done code into the new trivial helper,
complete_vfork_done().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bart Van Assche reported a hung fio process when either hot-removing
storage or when interrupting the fio process itself. The (pruned) call
trace for the latter looks like so:
fio D 0000000000000001 0 6849 6848 0x00000004
ffff880092541b88 0000000000000046 ffff880000000000 ffff88012fa11dc0
ffff88012404be70 ffff880092541fd8 ffff880092541fd8 ffff880092541fd8
ffff880128b894d0 ffff88012404be70 ffff880092541b88 000000018106f24d
Call Trace:
schedule+0x3f/0x60
io_schedule+0x8f/0xd0
wait_for_all_aios+0xc0/0x100
exit_aio+0x55/0xc0
mmput+0x2d/0x110
exit_mm+0x10d/0x130
do_exit+0x671/0x860
do_group_exit+0x44/0xb0
get_signal_to_deliver+0x218/0x5a0
do_signal+0x65/0x700
do_notify_resume+0x65/0x80
int_signal+0x12/0x17
The problem lies with the allocation batching code. It will
opportunistically allocate kiocbs, and then trim back the list of iocbs
when there is not enough room in the completion ring to hold all of the
events.
In the case above, what happens is that the pruning back of events ends
up freeing up the last active request and the context is marked as dead,
so it is thus responsible for waking up waiters. Unfortunately, the
code does not check for this condition, so we end up with a hung task.
Signed-off-by: Jeff Moyer <jmoyer@redhat.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org> [3.2.x only]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The attempt to display the implementation ID needs to be conditional on
whether or not CONFIG_NFS_V4_1 is defined
Reported-by: Bryan Schumaker <Bryan.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
If we convert and unwritten extent past the current i_size log the size update
as part of the extent manipulation transactions instead of doing an unlogged
metadata update later.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Replace xfs_ioend_new_eof with a new inline xfs_new_eof helper that
doesn't require and ioend, and is available also outside of xfs_aops.c.
Also make the code a bit more clear by using a normal if statement
instead of a slightly misleading MIN().
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
The new concurrency managed workqueues are cheap enough that we can create
per-filesystem instead of global workqueues. This allows us to remove the
trylock or defer scheme on the ilock, which is not helpful once we have
outstanding log reservations until finishing a size update.
Also allow the default concurrency on this workqueues so that I/O completions
blocking on the ilock for one inode do not block process for another inode.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
This patch adds a call to gfs2_rindex_update from function gfs2_blk2rgrpd
and removes calls to it that are made redundant by it. The problem is
that a gfs2_grow can add rgrps to the rindex, then put those rgrps into
use, thus rendering the rindex we read in at mount time incomplete.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Over time, we've slowly eliminated the use of sd_rindex_mutex.
Up to this point, it was only used in two places: function
gfs2_ri_total (which totals the file system size by reading
and parsing the rindex file) and function gfs2_rindex_update
which updates the rgrps in memory. Both of these functions have
the rindex glock to protect them, so the rindex is unnecessary.
Since gfs2_grow writes to the rindex via the meta_fs, the mutex
is in the wrong order according to the normal rules. This patch
eliminates the mutex entirely to avoid the problem.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Since all that include/linux/if_ppp.h does is #include <linux/ppp-ioctl.h>,
this replaces the occurrences of #include <linux/if_ppp.h> with
#include <linux/ppp-ioctl.h>.
It also corrects an error in Documentation/networking/l2tp.txt, where
it referenced include/linux/if_ppp.h as the source of some definitions
that are actually now defined in include/linux/if_pppol2tp.h.
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's only used inside fs/dcache.c, and we're going to play games with it
for the word-at-a-time patches. This time we really don't even want to
export it, because it really is an internal function to fs/dcache.c, and
has been since it was introduced.
Having it in that extremely hot header file (it's included in pretty
much everything, thanks to <linux/fs.h>) is a disaster for testing
different versions, and is utterly pointless.
We really should have some kind of header file diet thing, where we
figure out which parts of header files are really better off private and
only result in more expensive compiles.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Conflicts:
fs/nfs/nfs4proc.c
Back-merge of the upstream kernel in order to fix a conflict with the
slotid type conversion and implementation id patches...
The reada code from scrub was casting down a u64 to
an unsigned long so it could insert it into a radix tree.
What it really wanted to do was cast down the result of a shift, instead
of casting down the u64. The bug resulted in trying to insert our
reada struct into the wrong place, which caused soft lockups and other
problems.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
- We might unlock head->mutex while it was not locked
- We might leave the function without unlocking delayed_refs->lock
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
Commit 5707c87f "vfs: uninline full_name_hash()" broke the modular
build, because it needs exporting now that it isn't inlined any more.
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The code in link_path_walk() that finds out the length and the hash of
the next path component is some of the hottest code in the kernel. And
I have a version of it that does things at the full width of the CPU
wordsize at a time, but that means that we *really* want to split it up
into a separate helper function.
So this re-organizes the code a bit and splits the hashing part into a
helper function called "hash_name()". It returns the length of the
pathname component, while at the same time computing and writing the
hash to the appropriate location.
The code generation is slightly changed by this patch, but generally for
the better - and the added abstraction actually makes the code easier to
read too. And the new interface is well suited for replacing just the
"hash_name()" function with alternative implementations.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
.. and also use it in lookup_one_len() rather than open-coding it.
There aren't any performance-critical users, so inlining it is silly.
But it wouldn't matter if it wasn't for the fact that the word-at-a-time
dentry name patches want to conditionally replace the function, and
uninlining it sets the stage for that.
So again, this is a preparatory patch that doesn't change any semantics,
and only prepares for a much cleaner and testable word-at-a-time dentry
name accessor patch.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
These don't change any semantics, but they clean up the code a bit and
mark some arguments appropriately 'const'.
They came up as I was doing the word-at-a-time dcache name accessor
code, and cleaning this up now allows me to send out a smaller relevant
interesting patch for the experimental stuff.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Get rid of
encode_compound: tag=
when XDR debugging is enabled. The current Linux client never sets
compound tags.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The fh_expire_type file attribute is a filesystem wide attribute that
consists of flags that indicate what characteristics file handles
on this FSID have.
Our client doesn't support volatile file handles. It should find
out early (say, at mount time) whether the server is going to play
shenanighans with file handles during a migration.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The Linux NFS client must distinguish between referral events (which
it currently supports) and migration events (which it does not yet
support).
In both types of events, an fs_locations array is returned. But upper
layers, not the XDR layer, should make the distinction between a
referral and a migration. There really isn't a way for an XDR decoder
function to distinguish the two, in general.
Slightly adjust the FATTR flags returned by decode_fs_locations()
to set NFS_ATTR_FATTR_V4_LOCATIONS only if a non-empty locations
array was returned from the server. Then have logic in nfs4proc.c
distinguish whether the locations array is for a referral or
something else.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Clean up: pass just the clientid4 to encode_renew(). This enables it
to be used by callers who might not have an full nfs_client.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For debugging, introduce a simplistic function to print NFS file
handles on the system console. The main function is hooked into the
dprintk debugging facility, but you can directly call the helper,
_nfs_display_fhandle(), if you want to print a handle unconditionally.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
For NFSv4 mounts, the clientaddr= mount option has always been
required. Now we have rpc_localaddr() in the kernel, which was
modeled after the same logic in the mount.nfs command that constructs
the clientaddr= mount option. If user space doesn't provide a
clientaddr= mount option, the kernel can now construct its own.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
When the cl_xprt field is updated, the cl_server field will also have
to change. Since the contents of cl_server follow the remote endpoint
of cl_xprt, just move that field to the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: simplify check_gss_callback_principal(), whitespace changes ]
[ cel: forward ported to 3.4 ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
A migration event will replace the rpc_xprt used by an rpc_clnt. To
ensure this can be done safely, all references to cl_xprt must now use
a form of rcu_dereference().
Special care is taken with rpc_peeraddr2str(), which returns a pointer
to memory whose lifetime is the same as the rpc_xprt.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ cel: fix lockdep splats and layering violations ]
[ cel: forward ported to 3.4 ]
[ cel: remove rpc_max_reqs(), add rpc_net_ns() ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
CLOSE is new with NFSv4. Sometimes it's important to know the timing
of this operation compared to things like lease renewal.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
I noticed recently that decode_attr_fs_locations() is not generating
very pretty debugging output. The pathname components each appear on
a separate line of output, though that does not appear to be the
intended display behavior. The preferred way to generate continued
lines of output on the console is to use pr_cont().
Note that incoming pathname4 components contain a string that is not
necessarily NUL-terminated. I did actually see some trailing garbage
on the console. In addition to correcting the line continuation
problem, add a string precision format specifier to ensure that each
component string is displayed properly, and that vsnprintf() does
not Oops.
Someone pointed out that allowing incoming network data to possibly
generate a console line of unbounded length may not be such a good
idea. Since this output will rarely be enabled, and there is a hard
upper bound (NFS4_PATHNAME_MAXCOMPONENTS) in our implementation, this
is probably not a major concern.
It might be useful to additionally sanity-check the length of each
incoming component, however. RFC 3530bis15 does not suggest a maximum
number of UTF-8 characters per component for either the pathname4 or
component4 types. However, we could invent one that is appropriate
for our implementation.
Another possibility is to scrap all of this and print these pathnames
in upper layers after a reasonable amount of sanity checking in the
XDR layer. This would give us an opportunity to allocate a full
buffer so that the whole pathname would be output via a single
dprintk.
Introduced by commit 7aaa0b3b: "NFSv4: convert fs-locations-components
to conform to RFC3530," (June 9, 2006).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Eliminate a number of implicit type casts in comparisons, and these
compiler warnings:
fs/nfs/dir.c: In function ‘nfs_readdir_clear_array’:
fs/nfs/dir.c:264:16: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_readdir_search_for_cookie’:
fs/nfs/dir.c:352:16: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c: In function ‘nfs_do_filldir’:
fs/nfs/dir.c:769:38: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
fs/nfs/dir.c:780:9: warning: comparison between signed and unsigned
integer expressions [-Wsign-compare]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The regset common infrastructure assumed that regsets would always
have .get and .set methods, but not necessarily .active methods.
Unfortunately people have since written regsets without .set methods.
Rather than putting in stub functions everywhere, handle regsets with
null .get or .set methods explicitly.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@hack.frob.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The 'minorversion' mount option is now deprecated, so we need to display
the minor version number in the 'vers=' format.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow the user to mount an NFSv4.0 or NFSv4.1 partition using a
standard syntax of '-overs=4.0', or '-overs=4.1' rather than the
more cumbersome '-overs=4,minorversion=1'.
See also the earlier patch by Dros Adamson, which added the
Linux-specific syntax '-ov4.0', '-ov4.1'.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Pass nice as a value to proc_sched_autogroup_set_nice().
No side effect is expected, and the variable err will be overwritten with
the return value.
Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4F45FBB7.5090607@ct.jp.nec.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Since 2.6.39 (1196f8b), when a driver returns -ENOMEDIUM for open(),
__blkdev_get() calls rescan_partitions() to remove
in-kernel partition structures and raise KOBJ_CHANGE uevent.
However it ends up calling driver's revalidate_disk without open
and could cause oops.
In the case of SCSI:
process A process B
----------------------------------------------
sys_open
__blkdev_get
sd_open
returns -ENOMEDIUM
scsi_remove_device
<scsi_device torn down>
rescan_partitions
sd_revalidate_disk
<oops>
Oopses are reported here:
http://marc.info/?l=linux-scsi&m=132388619710052
This patch separates the partition invalidation from rescan_partitions()
and use it for -ENOMEDIUM case.
Reported-by: Huajun Li <huajun.li.lee@gmail.com>
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
server_scope would never be freed if nfs4_check_cl_exchange_flags() returned
non-zero
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Send the nfs implementation id in EXCHANGE_ID requests unless the module
parameter nfs.send_implementation_id is 0.
This adds a CONFIG variable for the nii_domain that defaults to "kernel.org".
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch removes the old hashmap-based caching and instead uses a
"request key actor" to place an upcall to the legacy idmapper rather
than going through /sbin/request-key. This will only be used as a
fallback if /etc/request-key.conf isn't configured to use nfsidmap.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS4CLNT_LAYOUTRECALL bit is a long-term impediment to scalability. It
basically stops all other recalls by a given server once any layout recall
is requested.
If the recall is for a different file, then we don't care.
If the recall applies to the same file, then we're in one of two situations:
Either we are in the case of a replay of an existing request, in which case
the session is supposed to deal with matters, or we are dealing with a
completely different request, in which case we should just try to process
it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The NFS4CLNT_LAYOUTRECALL tests in pnfs_layout_process and
pnfs_update_layout are redundant.
In the case of a bulk layout recall, we're always testing for
the NFS_LAYOUT_BULK_RECALL flay anyway.
In the case of a file or segment recall, the call to
pnfs_set_layout_stateid() updates the layout_header 'barrier'
sequence id, which triggers the test in pnfs_layoutgets_blocked()
and is less race-prone than NFS4CLNT_LAYOUTRECALL anyway.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
This patch fixes an error path in function gfs2_rindex_update
that leaves the rindex mutex held.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Now that XFS takes quota reservations into account there is no need to flush
anything before reporting quotas - in addition to beeing fully transactional
all quota information is also 100% coherent with the rest of the filesystem
now.
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>