Commit graph

24436 commits

Author SHA1 Message Date
Alex Elder
9508534c5f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Resolved conflicts:
  fs/xfs/xfs_trans_priv.h:
    - deleted struct xfs_ail field xa_flags
    - kept field xa_log_flush in struct xfs_ail
  fs/xfs/xfs_trans_ail.c:
    - in xfsaild_push(), in XFS_ITEM_PUSHBUF case, replaced
      "flush_log = 1" with "ailp->xa_log_flush++"

Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-17 15:42:02 -05:00
Shirish Pargaonkar
a5ff376966 cifs: Call id to SID mapping functions to change owner/group (try #4 repost)
Now build security descriptor to change either owner or group at the
server.  Initially security descriptor was built to change only
(D)ACL, that functionality has been extended.

When either an Owner or a Group of a file object at the server is changed,
rest of security descriptor remains same (DACL etc.).

To set security descriptor, it is necessary to open that file
with permission bits of either WRITE_DAC if DACL is being modified or
WRITE_OWNER (Take Ownership) if Owner or Group is being changed.

It is the server that decides whether a set security descriptor with
either owner or group change succeeds or not.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-17 09:11:11 -05:00
Dan Carpenter
01cd4afadb nfsd4: typo logical vs bitwise negate
This should be a bitwise negate here.  It silences a Sparse warning:
fs/nfsd/nfs4xdr.c:693:16: warning: dubious: x & !y

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-17 08:35:09 -04:00
Boaz Harrosh
4b46c9f5cf ore/exofs: Change ore_check_io API
Current ore_check_io API receives a residual
pointer, to report partial IO. But it is actually
not used, because in a multiple devices IO there
is never a linearity in the IO failure.

On the other hand if every failing device is reported
through a received callback measures can be taken to
handle only failed devices. One at a time.

This will also be needed by the objects-layout-driver
for it's error reporting facility.

Exofs is not currently using the new information and
keeps the old behaviour of failing the complete IO in
case of an error. (No partial completion)

TODO: Use an ore_check_io callback to set_page_error only
the failing pages. And re-dirty write pages.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:42 +02:00
Boaz Harrosh
5a51c0c7e9 ore/exofs: Define new ore_verify_layout
All users of the ore will need to check if current code
supports the given layout. For example RAID5/6 is not
currently supported.

So move all the checks from exofs/super.c to a new
ore_verify_layout() to be used by ore users.

Note that any new layout should be passed through the
ore_verify_layout() because the ore engine will prepare
and verify some internal members of ore_layout, and
assumes it's called.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:41 +02:00
Boaz Harrosh
3bd9856857 ore: Support for partial component table
Users like the objlayout-driver would like to only pass
a partial device table that covers the IO in question.
For example exofs divides the file into raid-group-sized
chunks and only serves group_width number of devices at
a time.

The partiality is communicated by setting
ore_componets->first_dev and the array covers all logical
devices from oc->first_dev upto (oc->first_dev + oc->numdevs)

The ore_comp_dev() API receives a logical device index
and returns the actual present device in the table.
An out-of-range dev_index will BUG.

Logical device index is the theoretical device index as if
all the devices of a file are present. .i.e:
	total_devs = group_width * mirror_p1 * group_count
	0 <= dev_index < total_devs

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:41 +02:00
Boaz Harrosh
bbf9a31bba ore: Support for short read/writes
Memory conditions and max_bio constraints might cause us to
not comply to the full length of the requested IO. Instead of
failing the complete IO we can issue a shorter read/write and
report how much was actually executed in the ios->length
member.

All users must check ios->length at IO_done or upon return of
ore_read/write and re-issue the reminder of the bytes. Because
other wise there is no error returned like before.

This is part of the effort to support the pnfs-obj layout driver.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:40 +02:00
Boaz Harrosh
154a9300cd exofs: Support for short read/writes
If at read/write_done the actual IO was shorter then requested,
reported in returned ios->length. It is not an error. The reminder
of the pages should just be unlocked but not marked uptodate or
end_page_writeback. They will be re issued later by the VFS.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:54:39 +02:00
Boaz Harrosh
6851a5e5c1 ore: Remove check for ios->kern_buff in _prepare_for_striping to later
Move the check and preparation of the ios->kern_buff case to
later inside _write_mirror().

Since read was never used with ios->kern_buff its support is removed
instead of fixed.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:53:55 +02:00
Boaz Harrosh
9826075404 ore: cleanup: Embed an ore_striping_info inside ore_io_state
Now that each ore_io_state covers only a single raid group.
A single striping_info math is needed. Embed one inside
ore_io_state to cache the calculation results and eliminate
an extra call.

Also the outer _prepare_for_striping is removed since it does nothing.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:53:54 +02:00
Boaz Harrosh
b916c5cd4d ore: Only IO one group at a time (API change)
Usually a single IO is confined to one group of devices
(group_width) and at the boundary of a raid group it can
spill into a second group. Current code would allocate a
full device_table size array at each io_state so it can
comply to requests that span two groups. Needless to say
that is very wasteful, specially when device_table count
can get very large (hundreds even thousands), while a
group_width is usually 8 or 10.

* Change ore API to trim on IO that spans two raid groups.
  The user passes offset+length to ore_get_rw_state, the
  ore might trim on that length if spanning a group boundary.
  The user must check ios->length or ios->nrpages to see
  how much IO will be preformed. It is the responsibility
  of the user to re-issue the reminder of the IO.

* Modify exofs To copy spilled pages on to the next IO.
  This means one last kick is needed after all coalescing
  of pages is done.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-14 18:52:50 +02:00
Linus Torvalds
480082968a Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
  xfs: revert to using a kthread for AIL pushing
  xfs: force the log if we encounter pinned buffers in .iop_pushbuf
  xfs: do not update xa_last_pushed_lsn for locked items
2011-10-14 17:06:39 +12:00
Pavel Shilovsky
d59dad2be0 CIFS: Move byte range lock list from fd to inode
that let us do local lock checks before requesting to the server.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-13 19:52:47 -05:00
Jeff Layton
fe11e4ccb8 cifs: clean up check_rfc1002_header
Rename it for better clarity as to what it does and have the caller pass
in just the single type byte. Turn the if statement into a switch and
optimize it by placing the most common message type at the top. Move the
header length check back into cifs_demultiplex_thread in preparation
for adding a new receive phase and normalize the cFYI messages.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-13 18:44:40 -05:00
Pavel Shilovsky
03776f4516 CIFS: Simplify byte range locking code
Split cifs_lock into several functions and let CIFSSMBLock get pid
as an argument.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-13 17:16:28 -05:00
Pavel Shilovsky
94443f4340 CIFS: Fix incorrect max RFC1002 write size value
..the length field has only 17 bits.

Cc: <stable@kernel.org>
Acked-by: Jeff Layton <jlayton@samba.org>
Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-13 15:20:14 -05:00
Linus Torvalds
b2f9452bd5 Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
  Btrfs: make sure not to defrag extents past i_size
  Btrfs: fix recursive auto-defrag
2011-10-13 18:20:40 +12:00
Jeff Layton
a52c1eb7ae cifs: simplify read_from_socket
Move the iovec handling entirely into read_from_socket. That simplifies
the code and gets rid of the special handling for header reads. With
this we can also get rid of the "goto incomplete_rcv" label in the main
demultiplex thread function since we can now treat header and non-header
receives the same way.

Also, make it return an int (since we'll never receive enough to worry
about the sign bit anyway), and simply make it return the amount of bytes
read or a negative error code.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-13 00:05:59 -05:00
Shirish Pargaonkar
21fed0d5b7 cifs: Add data structures and functions for uid/gid to SID mapping (try #4)
Add data structures and functions necessary to map a uid and gid to SID.
These functions are very similar to the ones used to map a SID to uid and gid.
This time, instead of storing sid to id mapping sorted on a sid value,
id to sid is stored, sorted on an id.
A cifs upcall sends an id (uid or gid) and expects a SID structure
in return, if mapping was done successfully.

A failed id to sid mapping to EINVAL.

This patchset aims to enable chown and chgrp commands when
cifsacl mount option is specified, especially to Windows SMB servers.
Currently we can't do that.  So now along with chmod command,
chown and chgrp work.

Winbind is used to map id to a SID.  chown and chgrp use an upcall
to provide an id to winbind and upcall returns with corrosponding
SID if any exists. That SID is used to build security descriptor.
The DACL part of a security descriptor is not changed by either
chown or chgrp functionality.

cifs client maintains a separate caches for uid to SID and
gid to SID mapping. This is similar to the one used earlier
to map SID to id (as part of ID mapping code).

I tested it by mounting shares from a Windows (2003) server by
authenticating as two users, one at a time, as Administrator and
as a ordinary user.
And then attempting to change owner of a file on the share.

Depending on the permissions/privileges at the server for that file,
chown request fails to either open a file (to change the ownership)
or to set security descriptor.
So it all depends on privileges on the file at the server and what
user you are authenticated as at the server, cifs client is just a
conduit.

I compared the security descriptor during chown command to that
what smbcacls sends when it is used with -M OWNNER: option
and they are similar.

This patchset aim to enable chown and chgrp commands when
cifsacl mount option is specified, especially to Windows SMB servers.
Currently we can't do that.  So now along with chmod command,
chown and chgrp work.

I tested it by mounting shares from a Windows (2003) server by
authenticating as two users, one at a time, as Administrator and
as a ordinary user.
And then attempting to change owner of a file on the share.

Depending on the permissions/privileges at the server for that file,
chown request fails to either open a file (to change the ownership)
or to set security descriptor.
So it all depends on privileges on the file at the server and what
user you are authenticated as at the server, cifs client is just a
conduit.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:45:39 -05:00
Steve French
20c3a200c4 Typo in cifs readme in name of module parm directory
Suresh had a typo in his recent patch adding information on
the new oplock_endabled parm. Should be documented as in
directory /sys/module/cifs/parameters not /proc/module/cifs/parameters

Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:26 -05:00
Shirish Pargaonkar
d026168692 cifs: clean up unused encryption code
Remove unsed  #if 0 encryption code.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:21 -05:00
Shirish Pargaonkar
3d3ea8e64e cifs: Add mount options for backup intent (try #6)
Add mount options backupuid and backugid.

It allows an authenticated user to access files with the intent to back them
up including their ACLs, who may not have access permission but has
"Backup files and directories user right" on them (by virtue of being part
of the built-in group Backup Operators.

When mount options backupuid is specified, cifs client restricts the
use of backup intents to the user whose effective user id is specified
along with the mount option.

When mount options backupgid is specified, cifs client restricts the
use of backup intents to the users whose effective user id belongs to the
group id specified along with the mount option.

If an authenticated user is not part of the built-in group Backup Operators
at the server, access to such files is denied, even if allowed by the client.

Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:17 -05:00
Suresh Jayaraman
8bc4392a1e cifs: warn about deprecation of /proc/fs/cifs/OplockEnabled interface
The plan is to deprecate this interface by kernel version 3.4.

Changes since v1
   - add a '\n' to the printk.

Reported-by: Alexander Swen <alex@swen.nu>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:13 -05:00
Suresh Jayaraman
c9c4708fdf cifs: update README about the kernel module parameters
Reported-by: Alexander Swen <alex@swen.nu>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:09 -05:00
Steve French
e75047344e add new module parameter 'enable_oplocks'
Thus spake Jeff Layton:

"Making that a module parm would allow you to set that parameter at boot
time without needing to add special startup scripts. IMO, all of the
procfile "switches" under /proc/fs/cifs should be module parms
instead."

This patch doesn't alter the default behavior (Oplocks are enabled by
default).

To disable oplocks when loading the module, use

   modprobe cifs enable_oplocks=0

(any of '0' or 'n' or 'N' conventions can be used).

To disable oplocks at runtime using the new interface, use

   echo 0 > /sys/module/cifs/parameters/enable_oplocks

The older /proc/fs/cifs/OplockEnabled interface will be deprecated
after two releases. A subsequent patch will add an warning message
about this deprecation.

Changes since v2:
   - make enable_oplocks a 'bool'

Changes since v1:
   - eliminate the use of extra variable by renaming the old one to
     enable_oplocks and make it an 'int' type.

Reported-by: Alexander Swen <alex@swen.nu>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:42:05 -05:00
Jeff Layton
ba749e6d52 cifs: check for unresponsive server every time we call kernel_recvmsg
If the server stops sending data while in the middle of sending a
response then we still want to reconnect it if it doesn't come back.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:59 -05:00
Jeff Layton
e831e6cf3a cifs: make smb_msg local to read_from_socket
If msg_controllen is 0, then the socket layer should never touch these
fields. Thus, there's no need to continually reset them. Also, there's
no need to keep this field on the stack for the demultiplex thread, just
make it a local variable in read_from_socket.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:54 -05:00
Jeff Layton
e2218eab20 cifs: trivial: remove obsolete comment
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:47 -05:00
Jeff Layton
826a95e4a3 cifs: consolidate signature generating code
We have two versions of signature generating code. A vectorized and
non-vectorized version. Eliminate a large chunk of cut-and-paste
code by turning the non-vectorized version into a wrapper around the
vectorized one.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:41 -05:00
Jeff Layton
376b43f41c cifs: clean up checkSMB
The variable names in this function are so ambiguous that it's very
difficult to know what it's doing. Rename them to make it a bit more
clear.

Also, remove a redundant length check. cifsd checks to make sure that
the rfclen isn't larger than the maximum frame size when it does the
receive.

Finally, change checkSMB to return a real error code (-EIO) when
it finds an error. That will help simplify some coming changes in the
callers.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:36 -05:00
Jeff Layton
c974befa40 cifs: untangle server->maxBuf and CIFSMaxBufSize
server->maxBuf is the maximum SMB size (including header) that the
server can handle. CIFSMaxBufSize is the maximum amount of data (sans
header) that the client can handle. Currently maxBuf is being capped at
CIFSMaxBufSize + the max headers size, and the two values are used
somewhat interchangeably in the code.

This makes little sense as these two values are not related at all.
Separate them and make sure the code uses the right values in the right
places.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:32 -05:00
Paul Bolle
f3a6a60e4c cifs: Fix typo 'CIFS_NFSD_EXPORT'
It should be 'CONFIG_CIFS_NFSD_EXPORT'. No-one noticed because that
symbol depends on BROKEN.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:27 -05:00
Jeff Layton
4a29a0bd1d cifs: get rid of unused xid in cifs_get_root
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:22 -05:00
Jeff Layton
b4dacbc282 cifs: use memcpy for magic string in cifs signature generation BSRSPYL
...it's more efficient since we know the length.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:18 -05:00
Jeff Layton
ac423446d8 cifs: switch CIFSSMBQAllEAs to use memcmp
...as that's more efficient when we know that the lengths are equal.

Reported-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2011-10-12 23:41:14 -05:00
Mi Jinlong
5703728ac1 nfs: fix bug about IPv6 address scope checking
The result from ipv6_addr_scope() is a set of flags, not a single value,
so we can't just compare the result with  IPV6_ADDR_SCOPE_LINKLOCAL.

This patch fixs the problem, and checks for unequal addresses before
scope_id.

Signed-off-by: Mi Jinlong <mijinlong@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2011-10-12 10:30:29 -04:00
Christoph Hellwig
5a93a064d2 xfs: do not flush data workqueues in xfs_flush_buftarg
When we call xfs_flush_buftarg (generally from sync or umount) it already
is too late to flush the data workqueues, as I/O completion is signalled
for them and we are thus already done with the data we would flush here.

There are places where flushing them might be useful, but the current
sync interface doesn't give us that opportunity.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 22:34:31 -05:00
Christoph Hellwig
a9add83e5a xfs: remove XFS_bflush
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:11 -05:00
Christoph Hellwig
02b102df15 xfs: remove xfs_buf_target_name
The calling convention that returns a pointer to a static buffer is
fairly nasty, so just opencode it in the only caller that is left.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:11 -05:00
Christoph Hellwig
b38505b09b xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks
Use xfs_ioerror_alert instead of opencoding a very similar error
message.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:10 -05:00
Christoph Hellwig
901796afca xfs: clean up xfs_ioerror_alert
Instead of passing the block number and mount structure explicitly
get them off the bp and fix make the argument order more natural.

Also move it to xfs_buf.c and stop printing the device name given
that we already get the fs name as part of xfs_alert, and we know
what device is operates on because of the caller that gets printed,
finally rename it to xfs_buf_ioerror_alert and pass __func__ as
argument where it makes sense.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:10 -05:00
Christoph Hellwig
4347b9d7ad xfs: clean up buffer allocation
Change _xfs_buf_initialize to allocate the buffer directly and rename it to
xfs_buf_alloc now that is the only buffer allocation routine.  Also remove
the xfs_buf_deallocate wrapper around the kmem_zone_free calls for buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:10 -05:00
Christoph Hellwig
af5c4bee49 xfs: remove buffers from the delwri list in xfs_buf_stale
For each call to xfs_buf_stale we call xfs_buf_delwri_dequeue either
directly before or after it, or are guaranteed by the surrounding
conditionals that we are never called on delwri buffers.  Simply
this situation by moving the call to xfs_buf_delwri_dequeue into
xfs_buf_stale.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:10 -05:00
Christoph Hellwig
c867cb6164 xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:10 -05:00
Christoph Hellwig
38f2323244 xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00
Christoph Hellwig
5fde0326dd xfs: remove XFS_BUF_FINISH_IOWAIT
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00
Christoph Hellwig
b17b833443 xfs: remove xfs_get_buftarg_list
The code is unused and under a config option that doesn't exist, remove it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00
Christoph Hellwig
87c7bec7fc xfs: fix buffer flushing during unmount
The code to flush buffers in the umount code is a bit iffy: we first
flush all delwri buffers out, but then might be able to queue up a
new one when logging the sb counts.  On a normal shutdown that one
would get flushed out when doing the synchronous superblock write in
xfs_unmountfs_writesb, but we skip that one if the filesystem has
been shut down.

Fix this by moving the delwri list flushing until just before unmounting
the log, and while we're at it also remove the superflous delwri list
and buffer lru flusing for the rt and log device that can never have
cached or delwri buffers.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Tested-by: Amit Sahrawat <amit.sahrawat83@gmail.com>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00
Christoph Hellwig
1da2f2dbf2 xfs: optimize fsync on directories
Directories are only updated transactionally, which means fsync only
needs to flush the log the inode is currently dirty, but not bother
with checking for dirty data, non-transactional updates, and most
importanly doesn't have to flush disk caches except as part of a
transaction commit.

While the first two optimizations can't easily be measured, the
latter actually makes a difference when doing lots of fsync that do
not actually have to commit the inode, e.g. because an earlier fsync
already pushed the log far enough.

The new xfs_dir_fsync is identical to xfs_nfs_commit_metadata except
for the prototype, but I'm not sure creating a common helper for the
two is worth it given how simple the functions are.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00
Dave Chinner
670ce93fef xfs: reduce the number of log forces from tail pushing
The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.

This results in numbers like this:

	xs_push_ail_flush.....         1456
	xs_log_force.........          1485

For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.

Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:

	xs_push_ail_flush.....          665
	xs_log_force.........           682

For the same test.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Alex Elder <aelder@sgi.com>
2011-10-11 21:15:09 -05:00