Commit graph

28319 commits

Author SHA1 Message Date
Sage Weil
b7a9e5dd40 libceph: set peer name on con_open, not init
The peer name may change on each open attempt, even when the connection is
reused.

Signed-off-by: Sage Weil <sage@inktank.com>
2012-07-05 21:14:35 -07:00
Linus Torvalds
5eecb9cc90 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs updates from Chris Mason:
 "I held off on my rc5 pull because I hit an oops during log recovery
  after a crash.  I wanted to make sure it wasn't a regression because
  we have some logging fixes in here.

  It turns out that a commit during the merge window just made it much
  more likely to trigger directory logging instead of full commits,
  which exposed an old bug.

  The new backref walking code got some additional fixes.  This should
  be the final set of them.

  Josef fixed up a corner where our O_DIRECT writes and buffered reads
  could expose old file contents (not stale, just not the most recent).
  He and Liu Bo fixed crashes during tree log recover as well.

  Ilya fixed errors while we resume disk balancing operations on
  readonly mounts."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: run delayed directory updates during log replay
  Btrfs: hold a ref on the inode during writepages
  Btrfs: fix tree log remove space corner case
  Btrfs: fix wrong check during log recovery
  Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
  Btrfs: resume balance on rw (re)mounts properly
  Btrfs: restore restriper state on all mounts
  Btrfs: fix dio write vs buffered read race
  Btrfs: don't count I/O statistic read errors for missing devices
  Btrfs: resolve tree mod log locking issue in btrfs_next_leaf
  Btrfs: fix tree mod log rewind of ADD operations
  Btrfs: leave critical region in btrfs_find_all_roots as soon as possible
  Btrfs: always put insert_ptr modifications into the tree mod log
  Btrfs: fix tree mod log for root replacements at leaf level
  Btrfs: support root level changes in __resolve_indirect_ref
  Btrfs: avoid waiting for delayed refs when we must not
2012-07-05 13:06:25 -07:00
Greg Kroah-Hartman
6fbfd0592e Merge v3.5-rc5 into driver-core-next
This picks up the big printk fixes, and resolves a merge issue with:
	drivers/extcon/extcon_gpio.c

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-07-05 08:25:34 -07:00
Jan Kara
a4564ead76 ocfs2: Fix bogus error message from ocfs2_global_read_info
'status' variable in ocfs2_global_read_info() is always != 0 when leaving the
function because it happens to contain number of read bytes. Thus we always log
error message although everything is OK. Since all error cases properly call
mlog_errno() before jumping to out_err, there's no reason to call mlog_errno()
on exit at all. This is a fallout of c1e8d35e (conversion of mlog_exit()
calls).

Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:17 -07:00
Jeff Liu
65622e647b ocfs2: for SEEK_DATA/SEEK_HOLE, return internal error unchanged if ocfs2_get_clusters_nocache() or ocfs2_inode_lock() call failed.
Hello,

Since ENXIO only means "offset beyond EOF" for SEEK_DATA/SEEK_HOLE,
Hence we should return the internal error unchanged if ocfs2_inode_lock() or
ocfs2_get_clusters_nocache() call failed rather than ENXIO.
Otherwise, it will confuse the user applications when they trying to understand the root cause.

Thanks Dave for pointing this out.

Thanks,
-Jeff

Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:16 -07:00
Srinivas Eeda
a75e9ccabd ocfs2: use spinlock irqsave for downconvert lock.patch
When ocfs2dc thread holds dc_task_lock spinlock and receives soft IRQ it
deadlock itself trying to get same spinlock in ocfs2_wake_downconvert_thread.
Below is the stack snippet.

The patch disables interrupts when acquiring dc_task_lock spinlock.

	ocfs2_wake_downconvert_thread
	ocfs2_rw_unlock
	ocfs2_dio_end_io
	dio_complete
	.....
	bio_endio
	req_bio_endio
	....
	scsi_io_completion
	blk_done_softirq
	__do_softirq
	do_softirq
	irq_exit
	do_IRQ
	ocfs2_downconvert_thread
	[kthread]

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:15 -07:00
roel
16865b7c42 ocfs2: Misplaced parens in unlikley
Fix misplaced parentheses

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:27:13 -07:00
Junxiao Bi
3e5d3c35a6 ocfs2: clear unaligned io flag when dio fails
The unaligned io flag is set in the kiocb when an unaligned
dio is issued, it should be cleared even when the dio fails,
or it may affect the following io which are using the same
kiocb.

Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joel Becker <jlbec@evilplan.org>
2012-07-03 23:26:50 -07:00
Tyler Hicks
60d65f1f07 eCryptfs: Fix lockdep warning in miscdev operations
Don't grab the daemon mutex while holding the message context mutex.
Addresses this lockdep warning:

 ecryptfsd/2141 is trying to acquire lock:
  (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}, at: [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]

 but task is already holding lock:
  (&(*daemon)->mux){+.+...}, at: [<ffffffffa029c2ec>] ecryptfs_miscdev_read+0x21c/0x470 [ecryptfs]

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(*daemon)->mux){+.+...}:
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c5d7>] ecryptfs_send_miscdev+0x97/0x120 [ecryptfs]
        [<ffffffffa029b744>] ecryptfs_send_message+0x134/0x1e0 [ecryptfs]
        [<ffffffffa029a24e>] ecryptfs_generate_key_packet_set+0x2fe/0xa80 [ecryptfs]
        [<ffffffffa02960f8>] ecryptfs_write_metadata+0x108/0x250 [ecryptfs]
        [<ffffffffa0290f80>] ecryptfs_create+0x130/0x250 [ecryptfs]
        [<ffffffff811963a4>] vfs_create+0xb4/0x120
        [<ffffffff81197865>] do_last+0x8c5/0xa10
        [<ffffffff811998f9>] path_openat+0xd9/0x460
        [<ffffffff81199da2>] do_filp_open+0x42/0xa0
        [<ffffffff81187998>] do_sys_open+0xf8/0x1d0
        [<ffffffff81187a91>] sys_open+0x21/0x30
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

 -> #0 (&ecryptfs_msg_ctx_arr[i].mux){+.+.+.}:
        [<ffffffff810a3418>] __lock_acquire+0x1bf8/0x1c50
        [<ffffffff810a3b8d>] lock_acquire+0x9d/0x220
        [<ffffffff8151c6da>] __mutex_lock_common+0x5a/0x4b0
        [<ffffffff8151cc64>] mutex_lock_nested+0x44/0x50
        [<ffffffffa029c213>] ecryptfs_miscdev_read+0x143/0x470 [ecryptfs]
        [<ffffffff811887d3>] vfs_read+0xb3/0x180
        [<ffffffff811888ed>] sys_read+0x4d/0x90
        [<ffffffff81527d69>] system_call_fastpath+0x16/0x1b

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
2012-07-03 16:34:10 -07:00
Tyler Hicks
9fe79d7600 eCryptfs: Properly check for O_RDONLY flag before doing privileged open
If the first attempt at opening the lower file read/write fails,
eCryptfs will retry using a privileged kthread. However, the privileged
retry should not happen if the lower file's inode is read-only because a
read/write open will still be unsuccessful.

The check for determining if the open should be retried was intended to
be based on the access mode of the lower file's open flags being
O_RDONLY, but the check was incorrectly performed. This would cause the
open to be retried by the privileged kthread, resulting in a second
failed open of the lower file. This patch corrects the check to
determine if the open request should be handled by the privileged
kthread.

Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-07-03 16:34:09 -07:00
Linus Torvalds
a3da2c6913 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block bits from Jens Axboe:
 "As vacation is coming up, thought I'd better get rid of my pending
  changes in my for-linus branch for this iteration.  It contains:

   - Two patches for mtip32xx.  Killing a non-compliant sysfs interface
     and moving it to debugfs, where it belongs.

   - A few patches from Asias.  Two legit bug fixes, and one killing an
     interface that is no longer in use.

   - A patch from Jan, making the annoying partition ioctl warning a bit
     less annoying, by restricting it to !CAP_SYS_RAWIO only.

   - Three bug fixes for drbd from Lars Ellenberg.

   - A fix for an old regression for umem, it hasn't really worked since
     the plugging scheme was changed in 3.0.

   - A few fixes from Tejun.

   - A splice fix from Eric Dumazet, fixing an issue with pipe
     resizing."

* 'for-linus' of git://git.kernel.dk/linux-block:
  scsi: Silence unnecessary warnings about ioctl to partition
  block: Drop dead function blk_abort_queue()
  block: Mitigate lock unbalance caused by lock switching
  block: Avoid missed wakeup in request waitqueue
  umem: fix up unplugging
  splice: fix racy pipe->buffers uses
  drbd: fix null pointer dereference with on-congestion policy when diskless
  drbd: fix list corruption by failing but already aborted reads
  drbd: fix access of unallocated pages and kernel panic
  xen/blkfront: Add WARN to deal with misbehaving backends.
  blkcg: drop local variable @q from blkg_destroy()
  mtip32xx: Create debugfs entries for troubleshooting
  mtip32xx: Remove 'registers' and 'flags' from sysfs
  blkcg: fix blkg_alloc() failure path
  block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED
  block: fix return value on cfq_init() failure
  mtip32xx: Remove version.h header file inclusion
  xen/blkback: Copy id field when doing BLKIF_DISCARD.
2012-07-03 15:45:10 -07:00
Jeff Layton
ec01d738a1 cifs: when server doesn't set CAP_LARGE_READ_X, cap default rsize at MaxBufferSize
When the server doesn't advertise CAP_LARGE_READ_X, then MS-CIFS states
that you must cap the size of the read at the client's MaxBufferSize.
Unfortunately, testing with many older servers shows that they often
can't service a read larger than their own MaxBufferSize.

Since we can't assume what the server will do in this situation, we must
be conservative here for the default. When the server can't do large
reads, then assume that it can't satisfy any read larger than its
MaxBufferSize either.

Luckily almost all modern servers can do large reads, so this won't
affect them. This is really just for older win9x and OS/2 era servers.
Also, note that this patch just governs the default rsize. The admin can
always override this if he so chooses.

Cc: <stable@vger.kernel.org> # 3.2
Reported-by: David H. Durgee <dhdurgee@acm.org>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steven French <sfrench@w500smf.(none)>
2012-07-03 12:54:42 -05:00
Chris Mason
b6305567e7 Btrfs: run delayed directory updates during log replay
While we are resolving directory modifications in the
tree log, we are triggering delayed metadata updates to
the filesystem btrees.

This commit forces the delayed updates to run so the
replay code can find any modifications done.  It stops
us from crashing because the directory deleltion replay
expects items to be removed immediately from the tree.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-02 15:39:19 -04:00
Josef Bacik
7fd1a3f73f Btrfs: hold a ref on the inode during writepages
We can race with unlink and not actually be able to do our igrab in
btrfs_add_ordered_extent.  This will result in all sorts of problems.
Instead of doing the complicated work to try and handle returning an error
properly from btrfs_add_ordered_extent, just hold a ref to the inode during
writepages.  If we cannot grab a ref we know we're freeing this inode anyway
and can just drop the dirty pages on the floor, because screw them we're
going to invalidate them anyway.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Josef Bacik
bdb7d303b3 Btrfs: fix tree log remove space corner case
The tree log stuff can have allocated space that we end up having split
across a bitmap and a real extent.  The free space code does not deal with
this, it assumes that if it finds an extent or bitmap entry that the entire
range must fall within the entry it finds.  This isn't necessarily the case,
so rework the remove function so it can handle this case properly.  This
fixed two panics the user hit, first in the case where the space was
initially in a bitmap and then in an extent entry, and then the reverse
case.  Thanks,

Reported-and-tested-by: Shaun Reich <sreich@kde.org>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:18 -04:00
Liu Bo
6bf02314d9 Btrfs: fix wrong check during log recovery
When we're evicting an inode during log recovery, we need to ensure that the inode
is not in orphan state any more, which means inode's run_time flags has _no_
BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
check for the flags.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-02 15:39:17 -04:00
Alexander Block
d3a94048c9 Btrfs: use _IOR for BTRFS_IOC_SUBVOL_GETFLAGS
We used the wrong ioctl macro for the getflags ioctl before.
As we don't have the set/getflags ioctls in the user space ioctl.h
at the moment, it's safe to fix it now.

Reviewed-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
2b6ba629b5 Btrfs: resume balance on rw (re)mounts properly
This introduces btrfs_resume_balance_async(), which, given that
restriper state was recovered earlier by btrfs_recover_balance(),
resumes balance in btrfs-balance kthread.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:17 -04:00
Ilya Dryomov
68310a5e42 Btrfs: restore restriper state on all mounts
Fix a bug that triggered asserts in btrfs_balance() in both normal and
resume modes -- restriper state was not properly restored on read-only
mounts.  This factors out resuming code from btrfs_restore_balance(),
which is now also called earlier in the mount sequence to avoid the
problem of some early writes getting the old profile.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2012-07-02 15:39:16 -04:00
Josef Bacik
c3473e8300 Btrfs: fix dio write vs buffered read race
Miao pointed out there's a problem with mixing dio writes and buffered
reads.  If the read happens between us invalidating the page range and
actually locking the extent we can bring in pages into page cache.  Then
once the write finishes if somebody tries to read again it will just find
uptodate pages and we'll read stale data.  So we need to lock the extent and
check for uptodate bits in the range.  If there are uptodate bits we need to
unlock and invalidate again.  This will keep this race from happening since
we will hold the extent locked until we create the ordered extent, and then
teh read side always waits for ordered extents.  There was also a race in
how we updated i_size, previously we were relying on the generic DIO stuff
to adjust the i_size after the DIO had completed, but this happens outside
of the extent lock which means reads could come in and not see the updated
i_size.  So instead move this work into where we create the extents, and
then this way the update ordered i_size stuff works properly in the endio
handlers.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-07-02 15:36:23 -04:00
Stefan Behrens
597a60fade Btrfs: don't count I/O statistic read errors for missing devices
It is normal behaviour of the low level btrfs function btrfs_map_bio()
to complete a bio with -EIO if the device is missing, instead of just
preventing the bio creation in an earlier step.
This used to cause I/O statistic read error increments and annoying
printk_ratelimited messages. This commit fixes the issue.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Reported-by: Carey Underwood <cwillu@cwillu.com>
2012-07-02 15:36:23 -04:00
Dave Chinner
9b73bd7b61 xfs: factor buffer reading from xfs_dir2_leaf_getdents
The buffer reading code in xfs_dir2_leaf_getdents is complex and difficult to
follow due to the readahead and all the context is carries. it is also badly
indented and so difficult to read. Factor it out into a separate function to
make it easier to understand and optimise in future patches.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:08 -05:00
Dave Chinner
1d9025e561 xfs: remove struct xfs_dabuf and infrastructure
The struct xfs_dabuf now only tracks a single xfs_buf and all the
information it holds can be gained directly from the xfs_buf. Hence
we can remove the struct dabuf and pass the xfs_buf around
everywhere.

Kill the struct dabuf and the associated infrastructure.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:07 -05:00
Dave Chinner
3605431fb9 xfs: use discontiguous xfs_buf support in dabuf wrappers
First step in converting the directory code to use native
discontiguous buffers and replacing the dabuf construct.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:07 -05:00
Dave Chinner
372cc85ec6 xfs: support discontiguous buffers in the xfs_buf_log_item
discontigous buffer in separate buffer format structures. This means log
recovery will recover all the changes on a per segment basis without
requiring any knowledge of the fact that it was logged from a
compound buffer.

To do this, we need to be able to determine what buffer segment any
given offset into the compound buffer sits over. This enables us to
translate the dirty bitmap in the number of separate buffer format
structures required.

We also need to be able to determine the number of bitmap elements
that a given buffer segment has, as this determines the size of the
buffer format structure. Hence we need to be able to determine the
both the start offset into the buffer and the length of a given
segment to be able to calculate this.

With this information, we can preallocate, build and format the
correct log vector array for each segment in a compound buffer to
appear exactly the same as individually logged buffers in the log.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:06 -05:00
Dave Chinner
de2a4f5919 xfs: add discontiguous buffer support to transactions
Now that the buffer cache supports discontiguous buffers, add
support to the transaction buffer interface for getting and reading
buffers.

Note that this patch does not convert the buffer item logging to
support discontiguous buffers. That will be done as a separate
commit.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:06 -05:00
Dave Chinner
6dde27077e xfs: add discontiguous buffer map interface
With the internal interfaces supporting discontiguous buffer maps,
add external lookup, read and get interfaces so they can start to be
used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:05 -05:00
Dave Chinner
3e85c868a6 xfs: convert internal buffer functions to pass maps
While the external interface currently uses separate blockno/length
variables, we need to move internal interfaces to passing and
parsing vector maps. This will then allow us to add external
interfaces to support discontiguous buffer maps as the internal code
will already support them.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:05 -05:00
Dave Chinner
cbb7baab28 xfs: separate buffer indexing from block map
To support discontiguous buffers in the buffer cache, we need to
separate the cache index variables from the I/O map. While this is
currently a 1:1 mapping, discontiguous buffer support will break
this relationship.

However, for caching purposes, we can still treat them the same as a
contiguous buffer - the block number of the first block and the
length of the buffer - as that is still a unique representation.
Also, the only way we will ever access the discontiguous regions of
buffers is via bulding the complete buffer in the first place, so
using the initial block number and entire buffer length is a sane
way to index the buffers.

Add a block mapping vector construct to the xfs_buf and use it in
the places where we are doing IO instead of the current
b_bn/b_length variables.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:04 -05:00
Dave Chinner
77c1a08fc9 xfs: struct xfs_buf_log_format isn't variable sized.
The struct xfs_buf_log_format wants to think the dirty bitmap is
variable sized.  In fact, it is variable size on disk simply due to
the way we map it from the in-memory structure, but we still just
use a fixed size memory allocation for the in-memory structure.

Hence it makes no sense to set the function up as a variable sized
structure when we already know it's maximum size, and we always
allocate it as such. Simplify the structure by making the dirty
bitmap a fixed sized array and just using the size of the structure
for the allocation size.

This will make it much simpler to allocate and manipulate an array
of format structures for discontiguous buffer support.

The previous struct xfs_buf_log_item size according to
/proc/slabinfo was 224 bytes. pahole doesn't give the same size
because of the variable size definition. With this modification,
pahole reports the same as /proc/slabinfo:

	/* size: 224, cachelines: 4, members: 6 */

Because the xfs_buf_log_item size is now determined by the maximum
supported block size we introduce a dependency on xfs_alloc_btree.h.
Avoid this dependency by moving the idefines for the maximum block
sizes supported to xfs_types.h with all the other max/min type
defines to avoid any new dependencies.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-07-01 14:50:04 -05:00
Theodore Ts'o
f6fb99cadc ext4: pass a char * to ext4_count_free() instead of a buffer_head ptr
Make it possible for ext4_count_free to operate on buffers and not
just data in buffer_heads.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@kernel.org
2012-06-30 19:14:57 -04:00
Zheng Liu
f4e95b3316 ext4: honor O_(D)SYNC semantic in ext4_fallocate()
Ext4 must make sure the transaction to be commited to the disk when
user opens a file with O_(D)SYNC flag and do a fallocate(2) call.

This problem had been reported by Christoph Hellwig in this thread:
http://www.spinics.net/lists/linux-btrfs/msg13621.html

Reported-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-30 19:12:57 -04:00
Aditya Kali
1c8457cadc ext4: avoid uneeded calls to ext4_mb_load_buddy() while reading mb_groups
Currently ext4_mb_load_buddy is called for every group, irrespective
of whether the group info is already in memory, while reading
/proc/fs/ext4/<partition>/mb_groups proc file.  For the purpose of
mb_groups proc file, it is unnecessary to load the file group info
from disk if it was loaded in past.  These calls to ext4_mb_load_buddy
make reading the mb_groups proc file expensive.

Also, the locks around ext4_get_group_info are not required.

This patch modifies the code to call ext4_mb_load_buddy only if the
group info had never been loaded into memory in past. It also removes
the mb group locking around ext4_get_group_info call.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-06-30 19:10:57 -04:00
Bryan Schumaker
a8d8f02cf0 NFS: Create custom NFS v4 write_inode() function
This gives pnfs a chance to do a layout commit inside the v4 code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:47 -04:00
Bryan Schumaker
57208fa7e5 NFS: Create an write_pageio_init() function
pNFS needs to select a write function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing writes.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
1abb50886a NFS: Create an read_pageio_init() function
pNFS needs to select a read function based on the layout driver
currently in use, so I let each NFS version decide how to best handle
initializing reads.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
6663ee7f81 NFS: Create an alloc_client rpc_op
This gives NFS v4 a way to set up callbacks and sessions without v2 or
v3 having to do them as well.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:46 -04:00
Bryan Schumaker
cdb7ecedec NFS: Create a free_client rpc_op
NFS v4 needs a way to shut down callbacks and sessions, but v2 and v3
don't.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:45 -04:00
Bryan Schumaker
57ec14c55d NFS: Create a return_delegation rpc op
Delegations are a v4 feature, so push return_delegation out of the
generic client by creating a new rpc_op and renaming the old function to
be in the nfs v4 "namespace"

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:45 -04:00
Bryan Schumaker
011e2a7fd5 NFS: Create a have_delegation rpc_op
Delegations are a v4 feature, so push them out of the generic code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
a5c58892b4 NFS: Create a v4-specific fsync function
v2 and v3 don't need to worry about doing a pnfs layoutcommit.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
eeebf91675 NFS: Use nfs4_destroy_server() to clean up NFS v4
I can use this function to return delegations and unset the pnfs layout
driver rather than continuing to do these things in the generic client.
With this change, we no longer need an nfs4_kill_super().

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:44 -04:00
Bryan Schumaker
e38eb6506f NFS: set_pnfs_layoutdriver() from nfs4_proc_fsinfo()
The generic client doesn't need to know about pnfs layout drivers, so
this should be done in the v4 code.

Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:46:43 -04:00
Andy Adamson
6e5b587d2f NFSv4.1 handle OPEN O_CREATE mdsthreshold
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-29 11:33:43 -04:00
Trond Myklebust
e3074507d9 NFS: Simplify NFSv4.1 Kconfig
Convert the pNFS file layout to use the same system as the
object and block layout.
Remove unnecessary dependencies on NFS_FS

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:51 -04:00
Andy Adamson
05bf14adca NFSv4.1: Use session max response size for GETDEVICEINFO gdia_maxcount
We prepare for the largest possible GETDEVICEINFO response, which
can not be greater than the negotiated session maximum response size.

Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
2f2c63bc22 NFS: Cleanup - only store the write verifier in struct nfs_page
The 'committed' field is not needed once we have put the struct nfs_page
on the right list.

Also correct the type of the verifier: it is not an array of __be32, but
simply an 8 byte long opaque array.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
98d9452448 NFSv4: Decode getdevicelist should use nfs4_verifier
The verifier returned by the GETDEVICELIST operation is not a write
verifier, but a nfs4_verifier.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:50 -04:00
Trond Myklebust
140150dbb1 SUNRPC: Remove unused function xdr_encode_pages
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:49 -04:00
Trond Myklebust
b42353ff8d NFSv4.1: Clean up nfs4_reclaim_lease
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-06-28 17:20:49 -04:00