Commit graph

6149 commits

Author SHA1 Message Date
Trond Myklebust
c03b402461 NFS: Convert struct nfs_page to use krefs
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:26 -04:00
Trond Myklebust
a50f7951a3 NFS: Fix an Oops in the nfs_access_cache_shrinker()
The nfs_access_cache_shrinker may race with nfs_access_zap_cache().

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
e2f032e9ef NFS: nfs3_proc_create() should use nfs_post_op_update_inode()
Also get rid of a redundant call to nfs_setattr_update_inode(). The call to
nfs3_proc_setattr() already takes care of that.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton
aa53ed541a NFS4: on a O_EXCL OPEN make sure SETATTR sets the fields holding the verifier
The Linux NFS4 client simply skips over the bitmask in an O_EXCL open
call and so it doesn't bother to reset any fields that may be holding
the verifier. This patch has us save the first two words of the bitmask
(which is all the current client has #defines for). The client then
later checks this bitmask and turns on the appropriate flags in the
sattr->ia_verify field for the following SETATTR call.

This patch only currently checks to see if the server used the atime
and mtime slots for the verifier (which is what the Linux server uses
for this). I'm not sure of what other fields the server could
reasonably use, but adding checks for others should be trivial.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
fc6ae3cf48 NFS: Re-enable forced umounts
They disappeared some time around 2.6.18.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:25 -04:00
Jeff Layton
83d93f2229 NFS: Use GFP_HIGHUSER for page allocation in nfs_symlink()
nfs_symlink() allocates a GFP_KERNEL page for the pagecache. Most
pagecache pages are allocated using GFP_HIGHUSER, and there's no reason
not to do that in nfs_symlink() as well.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
2007-07-10 23:40:25 -04:00
Trond Myklebust
a0356862bc NFS: Fix nfs_reval_fsid()
We don't need to revalidate the fsid on the root directory. It suffices to
revalidate it on the current directory.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
b39e625b6e NFSv4: Clean up nfs4_call_async()
Use rpc_run_task() instead of doing it ourselves.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
4a35bd41af NFSv4: Ensure that nfs4_do_close() doesn't race with umount
nfs4_do_close() does not currently have any way to ensure that the user
won't attempt to unmount the partition while the asynchronous RPC call
is completing. This again may cause Oopses in nfs_update_inode().

Add a vfsmount argument to nfs4_close_state to ensure that the partition
remains mounted while we're closing the file.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
ad389da79f NFSv4: Ensure asynchronous open() calls always pin the mountpoint
A number of race conditions may currently ensue if the user presses ^C
and then unmounts the partition while an asynchronous open() is in
progress.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
539cd03a57 NFSv4: Cleanup: pass the nfs_open_context to open recovery code
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:24 -04:00
Trond Myklebust
88be9f990f NFS: Replace vfsmount and dentry in nfs_open_context with struct path
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
de05a0cc2a NFS: Minor read optimisation...
Since PG_uptodate may now end up getting set during the call to
nfs_wb_page(), we can avoid putting a read request on the wire in those
situations.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
44dd151d5c NFS: Don't mark a written page as uptodate until it is on disk
The write may fail, so we should not mark the page as uptodate until we are
certain that the data has been accepted and written to disk by the server.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Trond Myklebust
d9df8d6b38 NFS: Don't fail an O_DIRECT read/write if get_user_pages() returns pages
There is no need to fail the entire O_DIRECT read/write just because
get_user_pages() returned fewer pages than we requested.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Chuck Lever
070ea60214 NFS: Clean ups in fs/nfs/direct.c
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-07-10 23:40:23 -04:00
Pavel Emelianov
bcf67e1625 Make common helpers for seq_files that work with list_heads
Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.

This makes code about 300 lines smaller:

The first version of this patch made the helper functions static inline
in the seq_file.h header. This patch moves them to the fs/seq_file.c as
Andrew proposed. The vmlinux .text section sizes are as follows:

2.6.22-rc1-mm1:              0x001794d5
with the previous version:   0x00179505
with this patch:             0x00179135

The config file used was make allnoconfig with the "y" inclusion of all
the possible options to make the files modified by the patch compile plus
drivers I have on the test node.

This patch:

Many places in kernel use seq_file API to iterate over a regular list_head.
The code for such iteration is identical in all the places, so it's worth
introducing a common helpers.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-10 17:51:13 -07:00
Eric Sandeen
54c57dc3b6 [PATCH] ocfs2: zero_user_page conversion
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:10 -07:00
Mark Fasheh
b25801038d ocfs2: Support xfs style space reservation ioctls
We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to
allocate and deallocate regions to a file without zeroing data or changing
i_size.

Though renamed, the structure passed in from user is identical to struct
xfs_flock64. The three fields that are actually used right now are l_whence,
l_start and l_len.

This should get ocfs2 immediate compatibility with userspace software using
the pre-existing xfs ioctls.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:09 -07:00
Mark Fasheh
063c4561f5 ocfs2: support for removing file regions
Provide an internal interface for the removal of arbitrary file regions.

ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:08 -07:00
Mark Fasheh
35edec1d52 ocfs2: update truncate handling of partial clusters
The partial cluster zeroing code used during truncate usually assumes that
the rightmost byte in the range to be zeroed lies on a cluster boundary.
This makes sense for truncate, but punching holes might require zeroing on
non-aligned rightmost boundaries.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:07 -07:00
Mark Fasheh
d0c7d7082e ocfs2: btree support for removal of arbirtrary extents
Add code to the btree paths to support the removal of arbitrary regions
within an existing extent. With proper higher level support this can be used
to "punch holes" in a file. Truncate (a special case of hole punching) could
also be converted to use these methods.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:05 -07:00
Mark Fasheh
2ae99a6037 ocfs2: Support creation of unwritten extents
This can now be trivially supported with re-use of our existing extend code.

ocfs2_allocate_unwritten_extents() takes a start offset and a byte length
and iterates over the inode, adding extents (marked as unwritten) until len
is reached. Existing extents are skipped over.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:04 -07:00
Mark Fasheh
b27b7cbcf1 ocfs2: support writing of unwritten extents
Update the write code to detect when the user is asking to write to an
unwritten extent. Like writing to a hole, we must zero the region between
the write and the cluster boundaries. Most of the existing cluster zeroing
logic can be re-used with some additional checks for the unwritten flag on
extent records.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:03 -07:00
Mark Fasheh
0d172baa55 ocfs2: small cleanup of ocfs2_write_begin_nolock()
We can easily seperate out the write descriptor setup and manipulation
into helper functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:01 -07:00
Mark Fasheh
328d5752e1 ocfs2: btree changes for unwritten extents
Writes to a region marked as unwritten might result in a record split or
merge. We can support splits by making minor changes to the existing insert
code. Merges require left rotations which mostly re-use right rotation
support functions.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:32:00 -07:00
Mark Fasheh
c3afcbb344 ocfs2: abstract btree growing calls
The top level calls and logic for growing a tree can easily be abstracted
out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree().

This allows future code to easily grow btrees when needed.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:58 -07:00
Mark Fasheh
1f6697d072 ocfs2: use all extent block suballocators
Now that we have a method to deallocate blocks from them, each node should
allocate extent blocks from their local suballocator file.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:56 -07:00
Mark Fasheh
59a5e416d1 ocfs2: plug truncate into cached dealloc routines
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:55 -07:00
Mark Fasheh
2b604351bc ocfs2: simplify deallocation locking
Deallocation of suballocator blocks, most notably extent blocks, might
involve multiple suballocator inodes.

The locking for this can get extremely complicated, especially when the
suballocator inodes to delete from aren't known until deep within an
unrelated codepath.

Implement a simple scheme for recording the blocks to be unlinked so that
the actual deallocation can be done in a context which won't deadlock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:54 -07:00
Mark Fasheh
bce997682f ocfs2: harden buffer check during mapping of page blocks
We don't want to submit buffer_new blocks for read i/o. This actually won't
happen right now because those requests during an allocating write are all nicely
aligned. It's probably a good idea to provide an explicit check though.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:52 -07:00
Mark Fasheh
7307de8051 ocfs2: shared writeable mmap
Implement cluster consistent shared writeable mappings using the
->page_mkwrite() callback.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:51 -07:00
Mark Fasheh
607d44aa3f ocfs2: factor out write aops into nolock variants
ocfs2_mkwrite() will want this so that it can add some mmap specific checks
before asking for a write.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:49 -07:00
Mark Fasheh
3a307ffc27 ocfs2: rework ocfs2_buffered_write_cluster()
Use some ideas from the new-aops patch series and turn
ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
copying data in between. The code now understands multiple cluster writes as
a result of having to deal with a full page write for greater than 4k pages.

This sets us up to easily call into the write path during ->page_mkwrite().

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:31:46 -07:00
Mark Fasheh
2e89b2e48e ocfs2: take ip_alloc_sem during entire truncate
Use of the alloc sem during truncate was too narrow - we want to protect
the i_size change and page truncation against mmap now.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:57 -07:00
Sunil Mushran
baf4661a82 ocfs2: Add "preferred slot" mount option
ocfs2 will attempt to assign the node the slot# provided in the mount
option. Failure to assign the preferred slot is not an error. This small
feature can be useful for automated testing.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:54 -07:00
Shani Moideen
5fb0f7f010 [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in
fs/ocfs2/dlm/dlmrecovery.c

Signed-off-by: Shani Moideen <shani.moideen@wipro.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:52 -07:00
Christoph Hellwig
800deef3f6 [PATCH] ocfs2: use list_for_each_entry where benefical
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:49 -07:00
Joel Becker
e6df3a663a ocfs2: Wake up a starting region if it gets killed in the background.
Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the
heartbeat region while it is starting up.  Then o2hb_region_dev_write()
can check to see if it is alive and act accordingly.  This prevents a hang
(not being woken) and a crash (if it's woken by a signal).

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:46 -07:00
Joel Becker
16c6a4f24d ocfs2: live heartbeat depends on the local node configuration
Removing the local node configuration out from underneath a running
heartbeat is "bad".  Provide an API in the ocfs2 nodemanager to request
a configfs dependancy on the local node, then use it in heartbeat.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:43 -07:00
Joel Becker
14829422be ocfs2: Depend on configfs heartbeat items.
ocfs2 mounts require a heartbeat region.  Use the new configfs_depend_item()
facility to actually depend on them so they can't go away from under us.

First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem.
Then teach o2hb_register_callbacks to take a UUID and depend on the
appropriate region.  Finally, teach all users of o2hb to pass a UUID or
NULL if they don't require a pin.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:19:40 -07:00
Joel Becker
631d1febab configfs: config item dependancies.
Sometimes other drivers depend on particular configfs items.  For
example, ocfs2 mounts depend on a heartbeat region item.  If that
region item is removed with rmdir(2), the ocfs2 mount must BUG or go
readonly.  Not happy.

This provides two additional API calls: configfs_depend_item() and
configfs_undepend_item().  A client driver can call
configfs_depend_item() on an existing item to tell configfs that it is
depended on.  configfs will then return -EBUSY from rmdir(2) for that
item.  When the item is no longer depended on, the client driver calls
configfs_undepend_item() on it.

These API cannot be called underneath any configfs callbacks, as
they will conflict.  They can block and allocate.  A client driver
probably shouldn't calling them of its own gumption.  Rather it should
be providing an API that external subsystems call.

How does this work?  Imagine the ocfs2 mount process.  When it mounts,
it asks for a heart region item.  This is done via a call into the
heartbeat code.  Inside the heartbeat code, the region item is looked
up.  Here, the heartbeat code calls configfs_depend_item().  If it
succeeds, then heartbeat knows the region is safe to give to ocfs2.
If it fails, it was being torn down anyway, and heartbeat can gracefully
pass up an error.

[ Fixed some bad whitespace in configfs.txt. --Mark ]

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:18:59 -07:00
Joel Becker
299894cc90 configfs: accessing item hierarchy during rmdir(2)
Add a notification callback, ops->disconnect_notify(). It has the same
prototype as ->drop_item(), but it will be called just before the item
linkage is broken. This way, configfs users who want to do work while
the object is still in the heirarchy have a chance.

Client drivers will still need to config_item_put() in their
->drop_item(), if they implement it.  They need do nothing in
->disconnect_notify().  They don't have to provide it if they don't
care.  But someone who wants to be notified before ci_parent is set to
NULL can now be notified.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:11:01 -07:00
Johannes Berg
6d748924b7 [PATCH] configsfs buffer: use mutex
Seems copied from sysfs, but I don't see a reason here nor there to use
a semaphore instead of a mutex. Convert.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:10:58 -07:00
Joel Becker
e6bd07aee7 configfs: Convert subsystem semaphore to mutex
Convert the su_sem member of struct configfs_subsystem to a struct
mutex, as that's what it is. Also convert all the users and update
Documentation/configfs.txt and Documentation/configfs_example.c
accordingly.

[ Conflict in fs/dlm/config.c with commit
  3168b0780d manually resolved. --Mark ]

Inspired-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:10:56 -07:00
Satyam Sharma
3fe6c5ce11 [PATCH] configfs+dlm: Rename config_group_find_obj and state semantics clearly
Configfs being based upon sysfs code, config_group_find_obj() is probably
so named because of the similar kset_find_obj() in sysfs. However,
"kobject"s in sysfs become "config_item"s in configfs, so let's call it
config_group_find_item() instead, for sake of uniformity, and make
corresponding change in the users of this function.

BTW a crucial difference between kset_find_obj and config_group_find_item
is in locking expectations. kset_find_obj does its locking by itself, but
config_group_find_item expects the *caller* to do the locking. The reason
for this: kset's have their own locks, config_group's don't but instead
rely on the subsystem mutex. And, subsystem needn't necessarily be around
when config_group_find_item() is called.

So let's state these locking semantics explicitly, and rectify the comment,
otherwise bugs could continue to occur in future, as they did in the past
(refer commit d82b8191e238 in gfs2-2.6-fixes.git).

[ I also took the opportunity to fix some bad whitespace and
double-empty lines. --Joel ]

[ Conflict in fs/dlm/config.c with commit
  3168b0780d manually resolved. --Mark ]

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 17:02:31 -07:00
Satyam Sharma
9b1d9aa4e9 [PATCH] configfs+dlm: Separate out __CONFIGFS_ATTR into configfs.h
fs/dlm/config.c contains a useful generic macro called __CONFIGFS_ATTR
that is similar to sysfs' __ATTR macro that makes defining attributes
easy for any user of configfs. Separate it out into configfs.h so that
other users (forthcoming in dynamic netconsole patchset) can use it too.

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Cc: David Teigland <teigland@redhat.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 16:52:27 -07:00
Satyam Sharma
4c62b53454 configfs: misc cleanups
1. item.c:config_item_cleanup() is a private function (only called by
config_item_release() in same file). However, it is spuriously
exported in include/linux/configfs.h, so remove that export and make
it static in item.c. Also, it is no longer exported / interface
function, so no need to give comment for this function (the comment
was stating obvious thing, anyway).

2. Kernel-doc comment format does not allow empty line between end of
comment and start of function (declaration line). There were several
such spurious empty lines in item.c, so fix them.

  fs/configfs/item.c       |   15 +++------------
  include/linux/configfs.h |    1 -
  2 files changed, 3 insertions(+), 13 deletions(-)

Signed-off-by: Satyam Sharma <ssatyam@cse.iitk.ac.in>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 16:52:25 -07:00
Joel Becker
b23cdde4c6 configfs: consistent attribute size
The attribute store/show code currently limits attributes at PAGE_SIZE.
This code comes from sysfs, where it still works that way.

However, PAGE_SIZE is not constant.  A 16k attribute string works on
ia64 but not on x86.  Really a subsystem shouldn't allow different
attribute sizes based on platform.

As such, limit all simple attributes to 4k.  This works on all
platforms, and is consistent with all current code.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>
2007-07-10 16:52:22 -07:00
Linus Torvalds
9f9d763216 Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6
* 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6:
  [S390] vmlogrdr function annotation.
  [S390] s390: rename CPU_IDLE to S390_CPU_IDLE
  [S390] cio: Remove prototype for non-existing function cmf_reset().
  [S390] zcrypt: fix request timeout handling
  [S390] system call optimization.
  [S390] dasd: Avoid compile warnings on !CONFIG_DASD_PROFILE
  [S390] Remove volatile from atomic_t
  [S390] Program check in diag 210 under 31 bit
  [S390] Bogomips calculation for 64 bit.
  [S390] smp: Merge smp_count_cpus() and smp_get_save_areas().
  [S390] zcore: Fix __user annotation.
  [S390] fixed cdl-format detection.
  [S390] sclp: Test facility list before executing a service call.
  [S390] sclp: introduce some new interfaces.
  [S390] Fixed comment typo.
  [S390] vmcp cleanup
2007-07-10 14:46:09 -07:00