Commit graph

29510 commits

Author SHA1 Message Date
Joe Perches
471b1f9871 cifs: Add CONFIG_CIFS_DEBUG and rename use of CIFS_DEBUG
This can reduce the size of the module by ~120KB which
could be useful for embedded systems.

$ size fs/cifs/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 388567	  34459	 100440	 523466	  7fcca	fs/cifs/built-in.o.new
 495970	  34599	 117904	 648473	  9e519	fs/cifs/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-05 14:58:36 -06:00
Joe Perches
bde9819731 cifs: Make CIFS_DEBUG possible to undefine
Make the compilation work again when CIFS_DEBUG is not #define'd.

Add format and argument verification for the various macros when
CIFS_DEBUG is not #define'd.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-05 14:58:09 -06:00
Steve French
52c0f4ad8e SMB3 mounts fail with access denied to some servers
We were checking incorrectly if signatures were required to be sent,
so were always sending signatures after the initial session establishment.
For SMB3 mounts (vers=3.0) this was a problem because we were putting
SMB2 signatures in SMB3 requests which would cause access denied
on mount (the tree connection would fail).

This might also be worth considering for stable (for 3.7), as the
error message on mount (access denied) is confusing to users and
there is no workaround if the server is configured to only
support smb3.0. I am ok either way.

CC: stable <stable@kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2012-12-05 13:27:31 -06:00
Joe Perches
176c9b3939 cifs: Remove unused cEVENT macro
It uses an undefined KERN_EVENT and is itself unused.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:31 -06:00
Jeff Layton
6ee9542a87 cifs: always zero out smb_vol before parsing options
Currently, the code relies on the callers to do that and they all do,
but this will ensure that it's always done.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:31 -06:00
Jeff Layton
9fa114f74f cifs: remove unneeded address argument from cifs_find_tcp_session and match_server
Now that the smb_vol contains the destination sockaddr, there's no need
to pass it in separately.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:30 -06:00
Steve French
1cc9bd6861 make convert_delimiter use strchr instead of open-coding it
Take advantage of accelerated strchr() on arches that support it.

Also, no caller ever passes in a NULL pointer. Get rid of the unneeded
NULL pointer check.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:30 -06:00
Jeff Layton
b979aaa177 cifs: get rid of smb_vol->UNCip and smb_vol->port
Passing this around as a string is contorted and painful. Instead, just
convert these to a sockaddr as soon as possible, since that's how we're
going to work with it later anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:30 -06:00
Jeff Layton
ccb5c001b3 cifs: ensure we revalidate the inode after readdir if cifsacl is enabled
Otherwise, "ls -l" will simply show the ownership of the files as
the default mnt_uid/gid. This may make "ls -l" performance on large
directories super-suck in some cases, but that's the cost of cifsacl.

One possibility to make it suck less would be to somehow proactively
dispatch the ACL requests asynchronously from readdir codepath, but
that's non-trivial to implement.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:30 -06:00
Jesper Nilsson
3c15b4cf55 cifs: Add handling of blank password option
The option to have a blank "pass=" already exists, and with
a password specified both "pass=%s" and "password=%s" are supported.
Also, both blank "user=" and "username=" are supported, making
"password=" the odd man out.

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:30 -06:00
Steve French
dd446b16ed Add SMB2.02 dialect support
This patch enables optional for original SMB2 (SMB2.02) dialect
by specifying vers=2.0 on mount.

Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:29 -06:00
Pavel Shilovsky
21cb2d90c7 CIFS: Fix lock consistensy bug in cifs_setlk
If we netogiate mandatory locking style, have a read lock and try
to set a write lock we end up with a write lock in vfs cache and
no lock in cifs lock cache - that's wrong. Fix it by returning
from cifs_setlk immediately if a error occurs during setting a lock.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:29 -06:00
Pavel Shilovsky
f152fd5fff CIFS: Implement cifs_relock_file
that reacquires byte-range locks when a file is reopened.

Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:29 -06:00
Pavel Shilovsky
b8db928b76 CIFS: Separate pushing mandatory locks and lock_sem handling
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:29 -06:00
Pavel Shilovsky
9ec3c88287 CIFS: Separate pushing posix locks and lock_sem handling
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:29 -06:00
Steve French
6d3ea7e497 CIFS: Make use of common cifs_build_path_to_root for CIFS and SMB2
because the is no difference here. This also adds support of prefixpath
mount option for SMB2.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:27:28 -06:00
Jeff Layton
e5e69abd05 cifs: make error on lack of a unc= option more explicit
Error out with a clear error message if there is no unc= option. The
existing code doesn't handle this in a clear fashion, and the check for
a UNCip option with no UNC string is just plain wrong.

Later, we'll fix the code to not require a unc= option, but for now we
need this to at least clarify why people are getting errors about DFS
parsing. With this change we can also get rid of some later NULL pointer
checks since we know the UNC and UNCip will never be NULL there.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:12 -06:00
Jeff Layton
d3d1fce11d cifs: don't override the uid/gid in getattr when cifsacl is enabled
If we're using cifsacl, then we don't want to override the uid/gid with
the current uid/gid, since that would prevent you from being able to
upcall for this info.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:12 -06:00
Jeff Layton
b1a6dc21d1 cifs: remove uneeded __KERNEL__ block from cifsacl.h
...and make those symbols static in cifsacl.c. Nothing outside
of that file refers to them.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:11 -06:00
Jeff Layton
ee13b2ba74 cifs: fix the format specifiers in sid_to_str
The format specifiers are for signed values, but these are unsigned.
Given that '-' is a delimiter between fields, I don't think you'd get
what you'd expect if you got a value here that would overflow the sign
bit.

The version and authority fields are 8 bit values so use a "hh" length
modifier there. The subauths are 32 bit values, so there's no need to
use a "l" length modifier there.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:11 -06:00
Jeff Layton
30c9d6cca5 cifs: redefine NUM_SUBAUTH constant from 5 to 15
According to several places on the Internet and the samba winbind code,
this is hard limited to 15 in windows, not 5. This does balloon out
the allocation of each by 40 bytes, but I don't see any alternative.

Also, rename it to SID_MAX_SUB_AUTHORITIES to match the alleged name
of this constant in the windows header files

Finally, rename SIDLEN to SID_STRING_MAX, fix the value to reflect
the change to SID_MAX_SUB_AUTHORITIES and document how it was
determined.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:11 -06:00
Jeff Layton
36f87ee70f cifs: make cifs_copy_sid handle a source sid with variable size subauth arrays
...and lift the restriction in id_to_sid upcall that the size must be
at least as big as a full cifs_sid.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:11 -06:00
Jeff Layton
436bb435fc cifs: make compare_sids static
..nothing outside of cifsacl.c calls it. Also fix the incorrect
comment on the function. It returns 0 when they match.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:11 -06:00
Jeff Layton
852e22950d cifs: use the NUM_AUTHS and NUM_SUBAUTHS constants in cifsacl code
...instead of hardcoding in '5' and '6' all over the place.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:10 -06:00
Jeff Layton
fc03d8a5a1 cifs: move num_subauth check inside of CONFIG_CIFS_DEBUG2 check in parse_sid()
Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:13:10 -06:00
Jeff Layton
c78cd83805 cifs: clean up id_mode_to_cifs_acl
Add a label we can goto on error, and get rid of some excess indentation.
Also move to kernel-style comments.

Reviewed-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:12:16 -06:00
Jeff Layton
60654ce047 cifs: fix types on module parameters
Most of these are unsigned ints, so we should be passing "uint" to
module_param. Also, get rid of the extra "(bool)" in the description
of enable_oplocks.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-12-05 13:07:14 -06:00
Steve French
81bcd8b795 default authentication needs to be at least ntlmv2 security for cifs mounts
We had planned to upgrade to ntlmv2 security a few releases ago,
and have been warning users in dmesg on mount about the impending
upgrade, but had to make a change (to use nltmssp with ntlmv2) due
to testing issues with some non-Windows, non-Samba servers.

The approach in this patch is simpler than earlier patches,
and changes the default authentication mechanism to ntlmv2
password hashes (encapsulated in ntlmssp) from ntlm (ntlm is
too weak for current use and ntlmv2 has been broadly
supported for many, many years).

Signed-off-by: Steve French <smfrench@gmail.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
2012-12-05 13:07:13 -06:00
Dan Carpenter
27d7c2a006 vfs: clear to the end of the buffer on partial buffer reads
READ is zero so the "rw & READ" test is always false.  The intended test
was "((rw & RW_MASK) == READ)".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-05 10:32:59 -08:00
Tao Ma
879b38257b ext4: export inline xattr functions
The inline data feature will need some inline xattr functions, so
export them from fs/ext4/xattr.c so that inline.c can use them.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-05 10:28:46 -05:00
Linus Torvalds
57302e0ddf vfs: avoid "attempt to access beyond end of device" warnings
The block device access simplification that avoided accessing the (racy)
block size information (commit bbec0270bd: "blkdev_max_block: make
private to fs/buffer.c") no longer checks the maximum block size in the
block mapping path.

That was _almost_ as simple as just removing the code entirely, because
the readers and writers all check the size of the device anyway, so
under normal circumstances it "just worked".

However, the block size may be such that the end of the device may
straddle one single buffer_head.  At which point we may still want to
access the end of the device, but the buffer we use to access it
partially extends past the end.

The 'bd_set_size()' function intentionally sets the block size to avoid
this, but mounting the device - or setting the block size by hand to
some other value - can modify that block size.

So instead, teach 'submit_bh()' about the special case of the buffer
head straddling the end of the device, and turning such an access into a
smaller IO access, avoiding the problem.

This, btw, also means that unlike before, we can now access the whole
device regardless of device block size setting.  So now, even if the
device size is only 512-byte aligned, we can read and write even the
last sector even when having a much bigger block size for accessing the
rest of the device.

So with this, we could now get rid of the 'bd_set_size()' block size
code entirely - resulting in faster IO for the common case - but that
would be a separate patch.

Reported-and-tested-by: Romain Francoise <romain@orebokech.com>
Reporeted-and-tested-by: Meelis Roos <mroos@linux.ee>
Reported-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-04 08:25:11 -08:00
Linus Torvalds
d3594ea2b3 Merge branch 'block-dev'
Merge 'block-dev' branch.

I was going to just mark everything here for stable and leave it to the
3.8 merge window, but having decided on doing another -rc, I migth as
well merge it now.

This removes the bd_block_size_semaphore semaphore that was added in
this release to fix a race condition between block size changes and
block IO, and replaces it with atomicity guaratees in fs/buffer.c
instead, along with simplifying fs/block-dev.c.

This removes more lines than it adds, makes the code generally simpler,
and avoids the latency/rt issues that the block size semaphore
introduced for mount.

I'm not happy with the timing, but it wouldn't be much better doing this
during the merge window and then having some delayed back-port of it
into stable.

* block-dev:
  blkdev_max_block: make private to fs/buffer.c
  direct-io: don't read inode->i_blkbits multiple times
  blockdev: remove bd_block_size_semaphore again
  fs/buffer.c: make block-size be per-page and protected by the page lock
2012-12-03 10:53:25 -08:00
Dave Chinner
f9668a09e3 xfs: fix sparse reported log CRC endian issue
Not a bug as such, just warning noise from the xlog_cksum()
returning a __be32 type when it should be returning a __le32 type.

On Wed, Nov 28, 2012 at 08:30:59AM -0500, Christoph Hellwig wrote:
> But why are we storing the crc field little endian while all other on
> disk formats are big endian? (And yes I realize it might as well have
> been me who did that back in the idea, but I still have no idea why)

Because the CRC always returns the calcuation LE format, even on BE
systems. So rather than always having to byte swap it everywhere and
have all the force casts and anootations for sparse, it seems simpler to
just make it a __le32 everywhere....

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-12-03 12:10:59 -06:00
Tao Ma
152a7b0a80 ext4: move extra inode read to a new function
Currently, in ext4_iget we do a simple check to see whether
there does exist some information starting from the end
of i_extra_size. With inline data added, this procedure
is more complicated. So move it to a new function named
ext4_iget_extra_inode.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-12-02 11:13:24 -05:00
Linus Torvalds
331fee3cd3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fixes from Al Viro:
 "A bunch of fixes; the last one is this cycle regression, the rest are
  -stable fodder."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  fix off-by-one in argument passed by iterate_fd() to callbacks
  lookup_one_len: don't accept . and ..
  cifs: get rid of blind d_drop() in readdir
  nfs_lookup_revalidate(): fix a leak
  don't do blind d_drop() in nfs_prime_dcache()
2012-12-01 13:29:55 -08:00
Linus Torvalds
086486e46e Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6
Pull CIFS fixes from Steve French:
 "Two low risk, small fixes, that fix cifs regressions introduced in
  3.7."

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
  CIFS: Fix wrong buffer pointer usage in smb_set_file_info
  cifs: fix writeback race with file that is growing
2012-11-30 16:57:18 -08:00
Al Viro
a77cfcb429 fix off-by-one in argument passed by iterate_fd() to callbacks
Noticed by Pavel Roskin; the thing in his patch I disagree with
was compensating for that shite in callbacks instead of fixing
it once in the iterator itself.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 23:01:30 -05:00
Al Viro
21d8a15ac3 lookup_one_len: don't accept . and ..
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:17:21 -05:00
Al Viro
0903a0c849 cifs: get rid of blind d_drop() in readdir
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:11:06 -05:00
Al Viro
c44600c9d1 nfs_lookup_revalidate(): fix a leak
We are leaking fattr and fhandle if we decide that dentry is not to
be invalidated, after all (e.g. happens to be a mountpoint).  Just
free both before that...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:04:36 -05:00
Al Viro
696199f8cc don't do blind d_drop() in nfs_prime_dcache()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 22:00:51 -05:00
Theodore Ts'o
aeb1e5d69a ext4: fix possible use after free with metadata csum
Commit fa77dcfafe introduces block bitmap checksum calculation into
ext4_new_inode() in the case that block group was uninitialized.
However we brelse() the bitmap buffer before we attempt to checksum it
so we have no guarantee that the buffer is still there.

Fix this by releasing the buffer after the possible checksum
computation.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Acked-by: Darrick J. Wong <darrick.wong@oracle.com>
Cc: stable@vger.kernel.org
2012-11-29 21:21:22 -05:00
Theodore Ts'o
69c499d152 ext4: restructure ext4_ext_direct_IO()
Remove a level of indentation by moving the DIO read and extending
write case to the beginning of the file.  This results in no actual
programmatic changes to the file, but makes it easier to
read/understand.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-29 21:13:48 -05:00
Linus Torvalds
bbec0270bd blkdev_max_block: make private to fs/buffer.c
We really don't want to look at the block size for the raw block device
accesses in fs/block-dev.c, because it may be changing from under us.
So get rid of the max_block logic entirely, since the caller should
already have done it anyway.

That leaves the only user of this function in fs/buffer.c, so move the
whole function there and make it static.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29 17:48:12 -08:00
Linus Torvalds
ab73857e35 direct-io: don't read inode->i_blkbits multiple times
Since directio can work on a raw block device, and the block size of the
device can change under it, we need to do the same thing that
fs/buffer.c now does: read the block size a single time, using
ACCESS_ONCE().

Reading it multiple times can get different results, which will then
confuse the code because it actually encodes the i_blksize in
relationship to the underlying logical blocksize.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29 12:38:44 -08:00
Dave Chinner
b870553cde xfs: fix stray dquot unlock when reclaiming dquots
When we fail to get a dquot lock during reclaim, we jump to an error
handler that unlocks the dquot. This is wrong as we didn't lock the
dquot, and unlocking it means who-ever is holding the lock has had
it silently taken away, and hence it results in a lock imbalance.

Found by inspection while modifying the code for the numa-lru
patchset. This fixes a random hang I've been seeing on xfstest 232
for the past several months.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29 14:24:03 -06:00
Dave Chinner
437a255aa2 xfs: fix direct IO nested transaction deadlock.
The direct IO path can do a nested transaction reservation when
writing past the EOF. The first transaction is the append
transaction for setting the filesize at IO completion, but we can
also need a transaction for allocation of blocks. If the log is low
on space due to reservations and small log, the append transaction
can be granted after wating for space as the only active transaction
in the system. This then attempts a reservation for an allocation,
which there isn't space in the log for, and the reservation sleeps.
The result is that there is nothing left in the system to wake up
all the processes waiting for log space to come free.

The stack trace that shows this deadlock is relatively innocuous:

 xlog_grant_head_wait
 xlog_grant_head_check
 xfs_log_reserve
 xfs_trans_reserve
 xfs_iomap_write_direct
 __xfs_get_blocks
 xfs_get_blocks_direct
 do_blockdev_direct_IO
 __blockdev_direct_IO
 xfs_vm_direct_IO
 generic_file_direct_write
 xfs_file_dio_aio_writ
 xfs_file_aio_write
 do_sync_write
 vfs_write

This was discovered on a filesystem with a log of only 10MB, and a
log stripe unit of 256k whih increased the base reservations by
512k. Hence a allocation transaction requires 1.2MB of log space to
be available instead of only 260k, and so greatly increased the
chance that there wouldn't be enough log space available for the
nested transaction to succeed. The key to reproducing it is this
mkfs command:

mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV

The test case was a 1000 fsstress processes running with random
freeze and unfreezes every few seconds. Thanks to Eryu Guan
(eguan@redhat.com) for writing the test that found this on a system
with a somewhat unique default configuration....

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Andrew Dahl <adahl@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29 14:22:56 -06:00
Dave Chinner
ef9d873344 xfs: byte range granularity for XFS_IOC_ZERO_RANGE
XFS_IOC_ZERO_RANGE simply does not work properly for non page cache
aligned ranges. Neither test 242 or 290 exercise this correctly, so
the behaviour is completely busted even though the tests pass.

Fix it to support full byte range granularity as was originally
intended for this ioctl.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-29 14:21:46 -06:00
Linus Torvalds
1e8b33328a blockdev: remove bd_block_size_semaphore again
This reverts the block-device direct access code to the previous
unlocked code, now that fs/buffer.c no longer needs external locking.

With this, fs/block_dev.c is back to the original version, apart from a
whitespace cleanup that I didn't want to revert.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29 10:52:19 -08:00
Linus Torvalds
45bce8f3e3 fs/buffer.c: make block-size be per-page and protected by the page lock
This makes the buffer size handling be a per-page thing, which allows us
to not have to worry about locking too much when changing the buffer
size.  If a page doesn't have buffers, we still need to read the block
size from the inode, but we can do that with ACCESS_ONCE(), so that even
if the size is changing, we get a consistent value.

This doesn't convert all functions - many of the buffer functions are
used purely by filesystems, which in turn results in the buffer size
being fixed at mount-time.  So they don't have the same consistency
issues that the raw device access can have.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-29 10:47:20 -08:00