Even when we are emulating power cuts, otherwise it is difficult to investigate
failures during emulated power cuts testing.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Fix kernel-doc warnings in fs/namei.c:
Warning(fs/namei.c:360): No description found for parameter 'inode'
Warning(fs/namei.c:672): No description found for parameter 'nd'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
As soon as we'd installed the file into descriptor table, it can
get closed by another thread. Freeing ep in process...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
When debugging is enabled, we use a temporary on-stack buffer for formatting
the key strings like "(11368871, direntry, 0xcd0750)". The buffer size is
32 bytes and sometimes it is not enough to fit the key string - e.g., when
inode numbers are high. This is not fatal, but the key strings are incomplete
and UBIFS complains like this:
UBIFS assert failed in dbg_snprintf_key at 137 (pid 1)
This is a regression caused by "515315a UBIFS: fix key printing".
Fix the issue by increasing the buffer to 48 bytes.
Reported-by: Michael Hench <michaelhench@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Tested-by: Michael Hench <michaelhench@gmail.com>
Cc: stable@vger.kernel.org [v3.3+]
If "l->stripe_unit" is zero the the mod on the next line will cause a
divide by zero bug. This comes from the copy_from_user() in
ceph_ioctl_set_layout_policy(). Passing 0 is valid, though (it means
"do not change") so avoid the % check in that case.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
If the MDS gives us a dentry and we weren't prepared to handle it,
WARN_ON_ONCE instead of crashing.
Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Alex Elder <elder@inktank.com>
Commit "d51f17e UBIFS: simplify reply code a bit" introduces a bug with the
following symptoms:
UBIFS error (pid 1): replay_log_leb: first CS node at LEB 3:0 has wrong commit number 0 expected 1
The issue is that we start replaying the log from UBIFS_LOG_LNUM instead
of c->lhead_lnum. This patch fixes that.
Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
This patch fixes a regression introduced by
"4994297 UBIFS: make ubifs_lpt_init clean-up in case of failure" which
I've hit while running the 'integck -p' test. When remount the file-system
from R/O mode to R/W mode and 'lpt_init_wr()' fails, we free _all_ LPT
resources by calling 'ubifs_lpt_free(c, 0)', even those needed for R/O
mode. This leads to subsequent crashes, e.g., if we try to unmount
the file-system.
Cc: stable@vger.kernel.org [v3.5+]
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Commit d5497fc693 "nfsd4: move rq_flavor
into svc_cred" forgot to remove cl_flavor from the client, leaving two
places (cl_flavor and cl_cred.cr_flavor) for the flavor to be stored.
After that patch, the latter was the one that was updated, but the
former was the one that the callback used.
Symptoms were a long delay on utime(). This is because the utime()
generated a setattr which recalled a delegation, but the cb_recall was
ignored by the client because it had the wrong security flavor.
Cc: stable@vger.kernel.org
Tested-by: Jamie Heilman <jamie@audible.transient.net>
Reported-by: Jamie Heilman <jamie@audible.transient.net>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
flush[_delayed]_work_sync() are now spurious. Mark them deprecated
and convert all users to flush[_delayed]_work().
If you're cc'd and wondering what's going on: Now all workqueues are
non-reentrant and the regular flushes guarantee that the work item is
not pending or running on any CPU on return, so there's no reason to
use the sync flushes at all and they're going away.
This patch doesn't make any functional difference.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Ian Campbell <ian.campbell@citrix.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Mattia Dongili <malattia@linux.it>
Cc: Kent Yoder <key@linux.vnet.ibm.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Karsten Keil <isdn@linux-pingi.de>
Cc: Bryan Wu <bryan.wu@canonical.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: linux-wireless@vger.kernel.org
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Sangbeom Kim <sbkim73@samsung.com>
Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Eric Van Hensbergen <ericvh@gmail.com>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Petr Vandrovec <petr@vandrovec.name>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Avi Kivity <avi@redhat.com>
compat_sys_{read,write}v() need the same "pass a copy of file->f_pos" thing
as sys_{read,write}{,v}().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).
Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.
Reported-by: Yuanming Chen <hikvision_linux@163.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Any pointer that was allocated through nfs_alloc_client() needs to be
freed via a call to nfs_free_client().
Reported-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Commit d2c127197d caused a regression
in cifs_do_create error handling. Fix this by closing a file handle
in the case of a get_inode_info(_unix) error. Also remove unnecessary
checks for newinode being NULL.
Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org>
Signed-off-by: Steve French <smfrench@gmail.com>
While trying to debug a SMB signature related issue with Windows Servers
figured out it might be easier to debug if we print the error code from
cifs_verify_signature(). Also, fix indendation while at it.
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
that can cause warning messages. Pavel had initially
suggested a smaller patch around drop_nlink, after
a similar problem was discovered NFS. Protecting
additional places where nlink is touched was
suggested by Jeff Layton and is included in this.
Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Pull vfs fixes from Miklos Szeredi.
This mainly fixes some confusion about whether the open 'mode' variable
passed around should contain the full file type (S_IFREG etc)
information or just the permission mode. In particular, the lack of
proper file type information had confused fuse.
* 'vfs-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
vfs: fix propagation of atomic_open create error on negative dentry
fuse: check create mode in atomic open
vfs: pass right create mode to may_o_create()
vfs: atomic_open(): fix create mode usage
vfs: canonicalize create mode in build_open_flags()
the ones which cause problems for ext4 on RAID --- a performance
problem when mounting very large filesystems, and a kernel OOPS when
doing an rm -rf on large directory hierarchies on fast devices.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQIcBAABCAAGBQJQLlVSAAoJENNvdpvBGATwhOYQAKx22YF9+SHnnVv2GtCQrsE6
N3acE/FoDYiO1/LRa5M6NDO3ZIL4Vqi6409LYFQky/SQL8ziM5CBeLOBD9qPTE1L
AGrzWn4vzZcjEf90ZEBN99fS5Uj1A9Axz2denxy7Hfb2HcSXfcuuVQXpdS42cFo8
gwtlX4jxmPkbjRlAdeqYVBNuWpTZ0a+UyLc4A6v3aLcbzPSJvPYmI7mKCksiSySU
yt0atjMDb56blAQJ2TITdAZN6rQShNzyok2pPfxaLusl5g0Gtjq8sSEPof1PQ1zq
gDFc+kpZvUyPdwQzV3IL8+TodFFZ0x/2OhqoYCTKajROLHGjQFsdkb8EJbnNeDff
EDxIjeVJR0kzSuSNWu+n5028lmEd9Sk7Ykr37cHeUxG8/0SADUqSDQYNhvbOPQsj
iq5dwF79tKjuMqjJrABuWA1ZNgHBISXgyBmHLXgEk3LrgucT9UIg8Zlkhq480SYO
JXhmkO2Ka4UwkVUShoWgRtEzRgUxhINBShs6g67zwm6slS4s2CWHnqhUn6EQe6+r
DY/hoUA8KbdG3Cf5iJBFM2kUO68CDIXeJjocA7JvlouRgQSxmkOceIuk7DusAitM
nHJKAtSNgC/z7yMoNi7YN0S5YYcCebmO1MEPzYSpPH07YwLJVNmh9Fk6BIfb7vi3
vJSQMBrgGrbaXrnhBA7z
=Zulv
-----END PGP SIGNATURE-----
Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 bug fixes from Ted Ts'o:
"The following are all bug fixes and regressions. The most notable are
the ones which cause problems for ext4 on RAID --- a performance
problem when mounting very large filesystems, and a kernel OOPS when
doing an rm -rf on large directory hierarchies on fast devices."
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: fix kernel BUG on large-scale rm -rf commands
ext4: fix long mount times on very big file systems
ext4: don't call ext4_error while block group is locked
ext4: avoid kmemcheck complaint from reading uninitialized memory
ext4: make sure the journal sb is written in ext4_clear_journal_err()
In some cases when an autofs indirect mount is contained in a file
system that is marked as shared (such as when systemd does the
equivalent of "mount --make-rshared /" early in the boot), mounts
stop expiring.
When this happens the first expiry check on a mountpoint dentry in
autofs_expire_indirect() sees a mountpoint dentry with a higher
than minimal reference count. Consequently the dentry is condidered
busy and the actual expiry check is never done.
This particular check was originally meant as an optimisation to
detect a path walk in progress but with the addition of rcu-walk
it can be ineffective anyway.
Removing the test allows automounts to expire again since the
actual expire check doesn't rely on the dentry reference count.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 968dee7722: "ext4: fix hole punch failure when depth is greater
than 0" introduced a regression in v3.5.1/v3.6-rc1 which caused kernel
crashes when users ran run "rm -rf" on large directory hierarchy on
ext4 filesystems on RAID devices:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
Process rm (pid: 18229, threadinfo ffff8801276bc000, task ffff880123631710)
Call Trace:
[<ffffffff81236483>] ? __ext4_handle_dirty_metadata+0x83/0x110
[<ffffffff812353d3>] ext4_ext_truncate+0x193/0x1d0
[<ffffffff8120a8cf>] ? ext4_mark_inode_dirty+0x7f/0x1f0
[<ffffffff81207e05>] ext4_truncate+0xf5/0x100
[<ffffffff8120cd51>] ext4_evict_inode+0x461/0x490
[<ffffffff811a1312>] evict+0xa2/0x1a0
[<ffffffff811a1513>] iput+0x103/0x1f0
[<ffffffff81196d84>] do_unlinkat+0x154/0x1c0
[<ffffffff8118cc3a>] ? sys_newfstatat+0x2a/0x40
[<ffffffff81197b0b>] sys_unlinkat+0x1b/0x50
[<ffffffff816135e9>] system_call_fastpath+0x16/0x1b
Code: 8b 4d 20 0f b7 41 02 48 8d 04 40 48 8d 04 81 49 89 45 18 0f b7 49 02 48 83 c1 01 49 89 4d 00 e9 ae f8 ff ff 0f 1f 00 49 8b 45 28 <48> 8b 40 28 49 89 45 20 e9 85 f8 ff ff 0f 1f 80 00 00 00
RIP [<ffffffff81233164>] ext4_ext_remove_space+0xa34/0xdf0
This could be reproduced as follows:
The problem in commit 968dee7722 was that caused the variable 'i' to
be left uninitialized if the truncate required more space than was
available in the journal. This resulted in the function
ext4_ext_truncate_extend_restart() returning -EAGAIN, which caused
ext4_ext_remove_space() to restart the truncate operation after
starting a new jbd2 handle.
Reported-by: Maciej Żenczykowski <maze@google.com>
Reported-by: Marti Raudsepp <marti@juffo.org>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
Commit 8aeb00ff85a: "ext4: fix overhead calculation used by
ext4_statfs()" introduced a O(n**2) calculation which makes very large
file systems take forever to mount. Fix this with an optimization for
non-bigalloc file systems. (For bigalloc file systems the overhead
needs to be set in the the superblock.)
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
While in ext4_validate_block_bitmap(), if an block allocation bitmap
is found to be invalid, we call ext4_error() while the block group is
still locked. This causes ext4_commit_super() to call a function
which might sleep while in an atomic context.
There's no need to keep the block group locked at this point, so hoist
the ext4_error() call up to ext4_validate_block_bitmap() and release
the block group spinlock before calling ext4_error().
The reported stack trace can be found at:
http://article.gmane.org/gmane.comp.file-systems.ext4/33731
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
If range.start or range.minlen is bigger than filesystem size, return
invalid value error. This fixes possible overflow in BTOBB macro when
passed value was nearly ULLONG_MAX.
Signed-off-by: Tomas Racek <tracek@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
Also update some commens in the area to make the code easier to read.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
This allows the normal error-paths to handle the error, rather than
making a special call to complete_request_key() just for this instance.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
idmap_pipe_downcall already clears this field if the upcall succeeds,
but if it fails (rpc.idmapd isn't running) the field will still be set
on the next call triggering a BUG_ON(). This patch tries to handle all
possible ways that the upcall could fail and clear the idmap key data
for each one.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Tested-by: William Dauchy <wdauchy@gmail.com>
Cc: stable@vger.kernel.org [>= 3.4]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Instead of using the private field xdr->p from struct xdr_stream,
use the public xdr_stream_pos().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Currently, we do not take into account the size of the 16 byte
struct nfs4_cached_acl header, when deciding whether or not we should
cache the acl data. Consequently, we will end up allocating an
8k buffer in order to fit a maximum size 4k acl.
This patch adjusts the calculation so that we limit the cache size
to 4k for the acl header+data.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Resetting the cursor xdr->p to a previous value is not a safe
practice: if the xdr_stream has crossed out of the initial iovec,
then a bunch of other fields would need to be reset too.
Fix this issue by using xdr_enter_page() so that the buffer gets
page aligned at the bitmap _before_ we decode it.
Also fix the confusion of the ACL length with the page buffer length
by not adding the base offset to the ACL length...
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
This allows distros to remove the line from their modprobe
configuration.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Some systems have a modprobe.d/nfs.conf file that sets an nfs4 alias
pointing to nfs.ko, rather than nfs4.ko. This can prevent the v4 module
from loading on mount, since the kernel sees that something named "nfs4"
has already been loaded. To work around this, I've renamed the modules
to "nfsv2.ko" "nfsv3.ko" and "nfsv4.ko".
I also had to move the nfs4_fs_type back to nfs.ko to ensure that `mount
-t nfs4` still works.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Following a report of a crash during an automount expire I found that
the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong.
Not only is the locking wrong but the function is more complex than it
needs to be.
The function is meant to calculate (and dget) the next entry in the list
of directories contained in the root of an autofs mount point (an autofs
indirect mount to be precise). The main problem was that the d_lock of
the owner of the list was not being taken when walking the list, which
lead to list corruption under load. The only other lock that needs to
be taken is against the next dentry candidate so it can be checked for
usability.
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It's only used locally, no need to pollute global namespace.
Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I noticed that "struct xfs_mount_args" was still declared in
"fs/xfs/xfs_mount.h". That struct doesn't even exist any more (and
is obviously not referenced elsewhere in that header file). While
in there, delete four other unneeded struct declarations in that
file.
Doing so highlights that "fs/xfs/xfs_trace.h" was relying indirectly
on "xfs_mount.h" to be #included in order to declare "struct
xfs_bmbt_irec", so add that declaration to resolve that issue.
Signed-off-by: Alex Elder <elder@inktank.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
If ->atomic_open() returns -ENOENT, we take care to return the create
error (e.g., EACCES), if any. Do the same when ->atomic_open() returns 1
and provides a negative dentry.
This fixes a regression where an unprivileged open O_CREAT fails with
ENOENT instead of EACCES, introduced with the new atomic_open code. It
is tested by the open/08.t test in the pjd posix test suite, and was
observed on top of fuse (backed by ceph-fuse).
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
In case we detect a problem and bail out, we fail to set "ret" to a
nonzero value, and udf_load_logicalvol will mistakenly report success.
Signed-off-by: Nikola Pajkovsky <npajkovs@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
This sequence:
results in an IO error when unmounting the RO filesystem. The bug was
introduced by:
commit 9754e39c7b
Author: Jan Kara <jack@suse.cz>
Date: Sat Apr 7 12:33:03 2012 +0200
jbd: Split updating of journal superblock and marking journal empty
which lost some of the magic in journal_update_superblock() which
used to test for a journal with no outstanding transactions.
This is a port of a jbd2 fix by Eric Sandeen.
CC: <stable@vger.kernel.org> # 3.4.x
Signed-off-by: Jan Kara <jack@suse.cz>
Verify that the VFS is passing us a complete create mode with the S_IFREG to
atomic open.
Reported-by: Steve <steveamigauk@yahoo.co.uk>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Pass the umask-ed create mode to may_o_create() instead of the original one.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Don't mask S_ISREG off the create mode before passing to ->atomic_open(). Other
methods (->create, ->mknod) also get the complete file mode and filesystems
expect it.
Reported-by: Steve <steveamigauk@yahoo.co.uk>
Reported-by: Richard W.M. Jones <rjones@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
Userspace can pass weird create mode in open(2) that we canonicalize to
"(mode & S_IALLUGO) | S_IFREG" in vfs_create().
The problem is that we use the uncanonicalized mode before calling vfs_create()
with unforseen consequences.
So do the canonicalization early in build_open_flags().
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Tested-by: Richard W.M. Jones <rjones@redhat.com>
CC: stable@vger.kernel.org
struct file already has a user namespace associated with it
in file->f_cred->user_ns, unfortunately because struct
seq_file has no struct file backpointer associated with
it, it is difficult to get at the user namespace in seq_file
context. Therefore add a helper function seq_user_ns to return
the associated user namespace and a user_ns field to struct
seq_file to be used in implementing seq_user_ns.
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
The BKL push-down for reiserfs made lock recursion a special case that needs
to be handled explicitly. One of the cases that was unhandled is dropping
the quota during inode eviction. Both reiserfs_evict_inode and
reiserfs_write_dquot take the write lock, but when the journal lock is
taken it only drops one the references. The locking rules are that the journal
lock be acquired before the write lock so leaving the reference open leads
to a ABBA deadlock.
This patch pushes the unlock up before clear_inode and avoids the recursive
locking.
Another ABBA situation can occur when the write lock is dropped while reading
the bitmap buffer while in the quota code. When the lock is reacquired, it
will deadlock against dquot->dq_lock and dqopt->dqio_mutex in the dquot_acquire
path. It's safe to retain the lock across the read and should be cached under
write load.
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Jan Kara <jack@suse.cz>