Commit graph

409637 commits

Author SHA1 Message Date
Mike Snitzer
41d35d25e9 MAINTAINERS: add reference to device-mapper's linux-dm.git tree
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:23 -05:00
Mikulas Patocka
5442851edb dm: fix Kconfig menu indentation
The option DM_LOG_USERSPACE is sub-option of DM_MIRROR, so place it
right after DM_MIRROR.  Doing so fixes various other Device mapper
targets/features to be properly nested under "Device mapper support".

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:22 -05:00
Mikulas Patocka
2c140a246d dm: allow remove to be deferred
This patch allows the removal of an open device to be deferred until
it is closed.  (Previously such a removal attempt would fail.)

The deferred remove functionality is enabled by setting the flag
DM_DEFERRED_REMOVE in the ioctl structure on DM_DEV_REMOVE or
DM_REMOVE_ALL ioctl.

On return from DM_DEV_REMOVE, the flag DM_DEFERRED_REMOVE indicates if
the device was removed immediately or flagged to be removed on close -
if the flag is clear, the device was removed.

On return from DM_DEV_STATUS and other ioctls, the flag
DM_DEFERRED_REMOVE is set if the device is scheduled to be removed on
closure.

A device that is scheduled to be deleted can be revived using the
message "@cancel_deferred_remove". This message clears the
DMF_DEFERRED_REMOVE flag so that the device won't be deleted on close.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:22 -05:00
Mike Snitzer
7833b08e18 dm table: print error on preresume failure
If preresume fails it is worth logging an error given that a device is
left suspended due to the failure.

This change was motivated by local preresume error logging that was
added to the cache target ("preresume failed").  Elevating this
target-agnostic context for the where the target-specific error occurred
relative to the DM core's callouts makes sense.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
2013-11-09 18:20:21 -05:00
Milan Broz
ed04d98169 dm crypt: add TCW IV mode for old CBC TCRYPT containers
dm-crypt can already activate TCRYPT (TrueCrypt compatible) containers
in LRW or XTS block encryption mode.

TCRYPT containers prior to version 4.1 use CBC mode with some additional
tweaks, this patch adds support for these containers.

This new mode is implemented using special IV generator named TCW
(TrueCrypt IV with whitening).  TCW IV only supports containers that are
encrypted with one cipher (Tested with AES, Twofish, Serpent, CAST5 and
TripleDES).

While this mode is legacy and is known to be vulnerable to some
watermarking attacks (e.g. revealing of hidden disk existence) it can
still be useful to activate old containers without using 3rd party
software or for independent forensic analysis of such containers.

(Both the userspace and kernel code is an independent implementation
based on the format documentation and it completely avoids use of
original source code.)

The TCW IV generator uses two additional keys: Kw (whitening seed, size
is always 16 bytes - TCW_WHITENING_SIZE) and Kiv (IV seed, size is
always the IV size of the selected cipher).  These keys are concatenated
at the end of the main encryption key provided in mapping table.

While whitening is completely independent from IV, it is implemented
inside IV generator for simplification.

The whitening value is always 16 bytes long and is calculated per sector
from provided Kw as initial seed, xored with sector number and mixed
with CRC32 algorithm.  Resulting value is xored with ciphertext sector
content.

IV is calculated from the provided Kiv as initial IV seed and xored with
sector number.

Detailed calculation can be found in the Truecrypt documentation for
version < 4.1 and will also be described on dm-crypt site, see:
http://code.google.com/p/cryptsetup/wiki/DMCrypt

The experimental support for activation of these containers is already
present in git devel brach of cryptsetup.

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:20 -05:00
Milan Broz
da31a0787a dm crypt: properly handle extra key string in initialization
Some encryption modes use extra keys (e.g. loopAES has IV seed) which
are not used in block cipher initialization but are part of key string
in table constructor.

This patch adds an additional field which describes the length of the
extra key(s) and substracts it before real key encryption setting.

The key_size always includes the size, in bytes, of the key provided
in mapping table.

The key_parts describes how many parts (usually keys) are contained in
the whole key buffer.  And key_extra_size contains size in bytes of
additional keys part (this number of bytes must be subtracted because it
is processed by the IV generator).

| K1 | K2 | .... | K64 |      Kiv       |
|----------- key_size ----------------- |
|                      |-key_extra_size-|
|     [64 keys]        |  [1 key]       | => key_parts = 65

Example where key string contains main key K, whitening key
Kw and IV seed Kiv:

|     K       |   Kiv   |       Kw      |
|--------------- key_size --------------|
|             |-----key_extra_size------|
|  [1 key]    | [1 key] |     [1 key]   | => key_parts = 3

Because key_extra_size is calculated during IV mode setting, key
initialization is moved after this step.

For now, this change has no effect to supported modes (thanks to ilog2
rounding) but it is required by the following patch.

Also, fix a sparse warning in crypt_iv_lmk_one().

Signed-off-by: Milan Broz <gmazyland@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:20 -05:00
Heinz Mauelshagen
2c2263c93f dm cache: log error message if dm_kcopyd_copy() fails
A migration failure should be logged (albeit limited).

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:19 -05:00
Heinz Mauelshagen
80f659f3f5 dm cache: use cell_defer() boolean argument consistently
Fix a few cell_defer() calls that weren't passing a bool.

Signed-off-by: Heinz Mauelshagen <heinzm@redhat.com>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:19 -05:00
Mikulas Patocka
4cb3e1db21 dm cache: return -EINVAL if the user specifies unknown cache policy
Return -EINVAL when the specified cache policy is unknown rather than
returning -ENOMEM.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:18 -05:00
Joe Thornber
dd8b0c2096 dm cache metadata: return bool from __superblock_all_zeroes
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:17 -05:00
Joe Thornber
0184b44e32 dm cache policy mq: a few small fixes
Rename takeout_queue to concat_queue.

Fix a harmless bug in mq policies pop() function.  Currently pop()
always succeeds, with up coming changes this wont be the case.

Fix typo in comment above pre_cache_to_cache prototype.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:17 -05:00
Joe Thornber
3351937e4a dm cache policy: remove return from void policy_remove_mapping
No need to return from a void function.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:20:16 -05:00
Joe Thornber
238f8363b6 dm cache: improve efficiency of quiescing flag management
Make the quiescing flag an atomic_t and stop protecting it with a spin
lock.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 18:19:59 -05:00
Joe Thornber
66cb1910df dm cache: fix a race condition between queuing new migrations and quiescing for a shutdown
The code that was trying to do this was inadequate.  The postsuspend
method (in ioctl context), needs to wait for the worker thread to
acknowledge the request to quiesce.  Otherwise the migration count may
drop to zero temporarily before the worker thread realises we're
quiescing.  In this case the target will be taken down, but the worker
thread may have issued a new migration, which will cause an oops when
it completes.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 3.9+
2013-11-09 17:55:50 -05:00
Joe Thornber
f8e5f01a32 dm cache: io destined for the cache device can now serve as tick bios
Previously only origin bios could trigger ticks, which meant if all
the io was destined for the cache no ticks were generated.  If no ticks
are generated then multiple hits, and movements in general, are
attributed to the same tick.

Only a stop gap fix, we need a better solution.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 17:55:49 -05:00
Joe Thornber
99ba2ae4cd dm cache policy mq: protect residency method with existing mutex
It is safe to use a mutex in mq_residency() at this point since it is
only called from ioctl context.  But future-proof mq_residency() by
using might_sleep() to catch new contexts that cannot sleep.

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
2013-11-09 17:54:34 -05:00
Sebastian Hesselbarth
4e3faf8863 MAINTAINERS: mv643xx_eth: take over maintainership from Lennart
Lennart says: "I haven't been able to spend time on mv643xx_eth for a
while now, so if you want to take over maintainership, I'd be fine with
that."

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Lennert Buytenhek <buytenh@wantstofly.org>
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-09 14:56:10 -05:00
Yang Yingliang
a33c4a2663 net_sched: tbf: support of 64bit rates
With psched_ratecfg_precompute(), tbf can deal with 64bit rates.
Add two new attributes so that tc can use them to break the 32bit
limit.

Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Suggested-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-11-09 14:53:37 -05:00
Oleg Nesterov
2ded0980a6 uprobes: Fix the memory out of bound overwrite in copy_insn()
1. copy_insn() doesn't look very nice, all calculations are
   confusing and it is not immediately clear why do we read
   the 2nd page first.

2. The usage of inode->i_size is wrong on 32-bit machines.

3. "Instruction at end of binary" logic is simply wrong, it
   doesn't handle the case when uprobe->offset > inode->i_size.

   In this case "bytes" overflows, and __copy_insn() writes to
   the memory outside of uprobe->arch.insn.

   Yes, uprobe_register() checks i_size_read(), but this file
   can be truncated after that. All i_size checks are racy, we
   do this only to catch the obvious mistakes.

Change copy_insn() to call __copy_insn() in a loop, simplify
and fix the bytes/nbytes calculations.

Note: we do not care if we read extra bytes after inode->i_size
if we got the valid page. This is fine because the task gets the
same page after page-fault, and arch_uprobe_analyze_insn() can't
know how many bytes were actually read anyway.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-09 17:05:43 +01:00
Oleg Nesterov
70d7f98722 uprobes: Fix the wrong usage of current->utask in uprobe_copy_process()
Commit aa59c53fd4 "uprobes: Change uprobe_copy_process() to dup
xol_area" has a stupid typo, we need to setup t->utask->vaddr but
the code wrongly uses current->utask.

Even with this bug dup_xol_work() works "in practice", but only
because get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE) likely
returns the same address every time.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
2013-11-09 17:05:41 +01:00
J. Bruce Fields
27ac0ffeac locks: break delegations on any attribute modification
NFSv4 uses leases to guarantee that clients can cache metadata as well
as data.

Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:44 -05:00
J. Bruce Fields
146a8595c6 locks: break delegations on link
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:43 -05:00
J. Bruce Fields
8e6d782cab locks: break delegations on rename
Cc: David Howells <dhowells@redhat.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:43 -05:00
J. Bruce Fields
5a14696c17 locks: helper functions for delegation breaking
We'll need the same logic for rename and link.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:42 -05:00
J. Bruce Fields
b21996e36c locks: break delegations on unlink
We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:42 -05:00
J. Bruce Fields
9accbb977a namei: minor vfs_unlink cleanup
We'll be using dentry->d_inode in one more place.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:41 -05:00
J. Bruce Fields
df4e8d2c1d locks: implement delegations
Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations.  Warn on any attempt to set a write delegation for now.
We'll come back to that case later.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:41 -05:00
J. Bruce Fields
617588d518 locks: introduce new FL_DELEG lock flag
For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:41 -05:00
J. Bruce Fields
6cedba8962 vfs: take i_mutex on renamed file
A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:40 -05:00
J. Bruce Fields
40bd22c9f8 vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:40 -05:00
J. Bruce Fields
275555163e vfs: don't use PARENT/CHILD lock classes for non-directories
Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.

(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)

Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:39 -05:00
J. Bruce Fields
375e289ea8 vfs: pull ext4's double-i_mutex-locking into common code
We want to do this elsewhere as well.

Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Acked-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:39 -05:00
J. Bruce Fields
f27c9298fd exportfs: fix quadratic behavior in filehandle lookup
Suppose we're given the filehandle for a directory whose closest
ancestor in the dcache is its Nth ancestor.

The main loop in reconnect_path searches for an IS_ROOT ancestor of
target_dir, reconnects that ancestor to its parent, then recommences the
search for an IS_ROOT ancestor from target_dir.

This behavior is quadratic in N.  And there's really no need to restart
the search from target_dir each time: once a directory has been looked
up, it won't become IS_ROOT again.  So instead of starting from
target_dir each time, we can continue where we left off.

This simplifies the code and improves performance on very deep directory
heirachies.  (I can't think of any reason anyone should need heirarchies
a hundred or more deep, but the performance improvement may be valuable
if only to limit damage in case of abuse.)

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:38 -05:00
J. Bruce Fields
efbf201f7a exportfs: better variable name
Replace another unhelpful acronym.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:38 -05:00
J. Bruce Fields
bbf7a8a356 exportfs: move most of reconnect_path to helper function
Also replace 3 easily-confused three-letter acronyms by more helpful
variable names.

Just cleanup, no change in functionality, with one exception: the
dentry_connected() check in the "out_reconnected" case will now only
check the ancestors of the current dentry instead of checking all the
way from target_dir.  Since we've already verified connectivity up to
this dentry, that should be sufficient.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:37 -05:00
J. Bruce Fields
e4b70ebeeb exportfs: eliminate unused "noprogress" counter
Note this counter is now being set to 0 on every pass through the loop,
so it no longer serves any useful purpose.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:37 -05:00
J. Bruce Fields
a056cc8934 exportfs: stop retrying once we race with rename/remove
There are two places here where we could race with a rename or remove:

	- We could find the parent, but then be removed or renamed away
	  from that parent directory before finding our name in that
	  directory.
	- We could find the parent, and find our name in that parent,
	  but then be renamed or removed before we look ourselves up by
	  that name in that parent.

In both cases the concurrent rename or remove will take care of
reconnecting the directory that we're currently examining.  Our target
directory should then also be connected.  Check this and clear
DISCONNECTED in these cases instead of looping around again.

Note: we *do* need to check that this actually happened if we want to be
robust in the face of corrupted filesystems: a corrupted filesystem
could just return a completely wrong parent, and we want to fail with an
error in that case before starting to clear DISCONNECTED on
non-DISCONNECTED filesystems.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:36 -05:00
J. Bruce Fields
0dbc018a49 exportfs: clear DISCONNECTED on all parents sooner
Once we've found any connected parent, we know all our parents are
connected--that's true even if there's a concurrent rename.  May as well
clear them all at once and be done with it.

Reviewed-by: Cristoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:36 -05:00
J. Bruce Fields
78cee9a8e4 exportfs: more detailed comment for path_reconnect
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:35 -05:00
Christoph Hellwig
854ff5caab exportfs: BUG_ON in crazy corner case
This would indicate a nasty bug in the dcache and has never triggered in
the past 10 years as far as I know.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:35 -05:00
J. Bruce Fields
13a2c3be03 dcache: fix outdated DCACHE_NEED_LOOKUP comment
The DCACHE_NEED_LOOKUP case referred to here was removed with
39e3c9553f "vfs: remove
DCACHE_NEED_LOOKUP".

There are only four real_lookup() callers and all of them pass in an
unhashed dentry just returned from d_alloc.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:34 -05:00
J. Bruce Fields
f80de2cde1 dcache: don't clear DCACHE_DISCONNECTED too early
DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.

Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:34 -05:00
J. Bruce Fields
e1a24bb0aa dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries
I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885ab "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:33 -05:00
J. Bruce Fields
7632e465fe dcache: use IS_ROOT to decide where dentry is hashed
Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2d2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:33 -05:00
Al Viro
b19f133674 ocfs2: get rid of impossible checks
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
Al Viro
fbad2bd132 qnx4: i_sb is never NULL
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
J. Bruce Fields
950ee9566a exportfs: fix 32-bit nfsd handling of 64-bit inode numbers
Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reported-by: Trevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:32 -05:00
J. Bruce Fields
b7a6ec52dd vfs: split out vfs_getattr_nosec
The filehandle lookup code wants this version of getattr.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:31 -05:00
Al Viro
5a3cd99285 iget/iget5: don't bother with ->i_lock until we find a match
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:31 -05:00
David Howells
b18825a7c8 VFS: Put a small type field into struct dentry::d_flags
Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

	Miss (negative dentry)
	Directory
	"Automount" directory (defective - no i_op->lookup())
	Symlink
	Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

	d_set_type(dentry, type)
	d_is_directory(dentry)
	d_is_autodir(dentry)
	d_is_symlink(dentry)
	d_is_file(dentry)
	d_is_negative(dentry)
	d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-11-09 00:16:30 -05:00