Commit graph

17620 commits

Author SHA1 Message Date
Boaz Harrosh
d9c740d225 exofs: Define on-disk per-inode optional layout attribute
* Layouts describe the way a file is spread on multiple devices.
  The layout information is stored in the objects attribute introduced
  in this patch.

* There can be multiple generating function for the layout.
  Currently defined:
    - No attribute present - use below moving-window on global
      device table, all devices.
      (This is the only one currently used in exofs)
    - an obj_id generated moving window - the obj_id is a randomizing
      factor in the otherwise global map layout.
    - An explicit layout stored, including a data_map and a device
      index list.
    - More might be defined in future ...

* There are two attributes defined of the same structure:
  A-data-files-layout - This layout is used by data-files. If present
                        at a directory, all files of that directory will
                        be created with this layout.
  A-meta-data-layout - This layout is used by a directory and other
                       meta-data information. Also inherited at creation
                       of subdirectories.

* At creation time inodes are created with the layout specified above.
  A usermode utility may change the creation layout on a give directory
  or file. Which in the case of directories, will also apply to newly
  created files/subdirectories, children of that directory.
  In the simple unaltered case of a newly created exofs, no layout
  attributes are present, and all layouts adhere to the layout specified
  at the device-table.

* In case of a future file system loaded in an old exofs-driver.
  At iget(), the generating_function is inspected and if not supported
  will return an IO error to the application and the inode will not
  be loaded. So not to damage any data.
  Note: After this patch we do not yet support any type of layout
        only the RAID0 patch that enables striping at the super-block
        level will add support for RAID0 layouts above. This way we
        are past and future compatible and fully bisectable.

* Access to the device table is done by an accessor since
  it will change according to above information.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:28 -08:00
Boaz Harrosh
46f4d973f6 exofs: unindent exofs_sbi_read
The original idea was that a mirror read can be sub-divided
to multiple devices. But this has very little gain and only
at very large IOes so it's not going to be implemented soon.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:27 -08:00
Boaz Harrosh
45d3abcb1a exofs: Move layout related members to a layout structure
* Abstract away those members in exofs_sb_info that are related/needed
  by a layout into a new exofs_layout structure. Embed it in exofs_sb_info.

* At exofs_io_state receive/keep a pointer to an exofs_layout. No need for
  an exofs_sb_info pointer, all we need is at exofs_layout.

* Change any usage of above exofs_sb_info members to their new name.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:27 -08:00
Boaz Harrosh
22ddc55638 exofs: Recover in the case of read-passed-end-of-file
In check_io, implement the case of reading passed end of
file, by clearing the pages and recover with no error. In
a raid arrangement this can become a legitimate situation
in case of holes in the file.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:26 -08:00
Boaz Harrosh
518f167a37 exofs: Micro-optimize exofs_i_info
optimize the exofs_i_info struct usage by moving the embedded
vfs_inode to be first. A compiler might optimize away an "add"
operation with constant zero. (Which it cannot with other constants)

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:25 -08:00
Boaz Harrosh
34ce4e7c23 exofs: debug print even less
* Last debug trimming left in some stupid print, remove them.
  Fixup some other prints
* Shift printing from inode.c to ios.c
* Add couple of prints when memory allocation fails.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2010-02-28 03:35:25 -08:00
Wengang Wang
5051f76883 ocfs2: send SIGXFSZ if new filesize exceeds limit -v2
This patch makes ocfs2 send SIGXFSZ if new file size exceeds the rlimit.
Processes may get SIGXFSZ on one node (in the cluster) while others will
not on another if file size limits are different on the two nodes.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-27 20:08:51 -08:00
Sunil Mushran
6fcef3f04a ocfs2/userdlm: Add tracing in userdlm
Make use of the newly added BASTS masklog to trace ASTs and BASTs in userdlm.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-27 19:57:07 -08:00
Sunil Mushran
9b915181af ocfs2: Use a separate masklog for AST and BASTs
This patch adds a new masklog and uses it allow tracing ASTs and BASTs
in the dlmglue layer. This has been found to be very useful in debugging
cluster locking issues.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-27 19:57:06 -08:00
Christian Kujau
4912002fff Remove EXPERIMENTAL from NFS_FSCACHE
There's currently an open Ubuntu bug[0], with the intent to compile NFS_FSCACHE
(and possibly AFS_FSCACHE, 9P_FSCACHE) into the standard Ubuntu kernel.
However, since *_FSCACHE still depends on EXPERIMENTAL, this won't happen.

As Arjan van de Ven pointed out[1], the EXPERIMENTAL flag doesn't mean that
much any more, I propose the following patch to fs/nfs/Kconfig.  I'd do the
same for fs/9p/Kconfig and fs/afs/Kconfig, but as I did not test 9p or AFS, I
feel it would not be appropriate for me to remove the flag.

[0] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/440522/comments/5
[1] http://lkml.org/lkml/2010/1/23/145

Signed-off-by: Christian Kujau <lists@nerdbynature.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-26 17:22:35 -08:00
Linus Torvalds
4cbd55188f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
  dlm: use bastmode in debugfs output
  dlm: Send lockspace name with uevents
  dlm: send reply before bast
  dlm: fix ordering of bast and cast
2010-02-26 17:19:30 -08:00
Linus Torvalds
b305956abc Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs
* 'for-linus' of git://oss.sgi.com/xfs/xfs: (52 commits)
  fs/xfs: Correct NULL test
  xfs: optimize log flushing in xfs_fsync
  xfs: only clear the suid bit once in xfs_write
  xfs: kill xfs_bawrite
  xfs: log changed inodes instead of writing them synchronously
  xfs: remove invalid barrier optimization from xfs_fsync
  xfs: kill the unused XFS_QMOPT_* flush flags V2
  xfs: Use delay write promotion for dquot flushing
  xfs: Sort delayed write buffers before dispatch
  xfs: Don't issue buffer IO direct from AIL push V2
  xfs: Use delayed write for inodes rather than async V2
  xfs: Make inode reclaim states explicit
  xfs: more reserved blocks fixups
  xfs: turn off sign warnings
  xfs: don't hold onto reserved blocks on remount,ro
  xfs: quota limit statvfs available blocks
  xfs: replace KM_LARGE with explicit vmalloc use
  xfs: cleanup up xfs_log_force calling conventions
  xfs: kill XLOG_VEC_SET_TYPE
  xfs: remove duplicate buffer flags
  ...
2010-02-26 17:18:52 -08:00
Linus Torvalds
f24407d2bd Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/xfs-vipt
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/xfs-vipt:
  xfs: fix xfs to work with Virtually Indexed architectures
  sh: add mm API for DMA to vmalloc/vmap areas
  arm: add mm API for DMA to vmalloc/vmap areas
  parisc: add mm API for DMA to vmalloc/vmap areas
  mm: add coherence API for DMA to vmalloc/vmap areas
2010-02-26 17:05:10 -08:00
Srinivas Eeda
bc9838c4d4 dlm: allow dlm do recovery during shutdown
If a node down event happens while dlm shutdown in progress, dlm recovery
should be done before dlm is shutdown.  We can't migrate unrecovered locks,
obviously.  But dlm_reco_thread only does recovery if the dlm_state is
in DLM_CTXT_JOINED.

dlm_reco_thread should do recovery if dlm_state is in DLM_CTXT_JOINED or
DLM_CTXT_IN_SHUTDOWN.

Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:19 -08:00
Tao Ma
cbaee472f2 ocfs2: Only bug out in direct io write for reflinked extent.
In ocfs2_direct_IO_get_blocks, we only need to bug out
in case of we are going to write a recounted extent rec.

What a silly bug introduced by me!

Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Cc: stable@kernel.org
2010-02-26 15:41:19 -08:00
Coly Li
66b116c9d8 ocfs2: fix warning in ocfs2_file_aio_write()
This patch fixes a compiling warning in ocfs2_file_aio_write().

Signed-off-by: Coly Li <coly.li@suse.de>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:18 -08:00
Joel Becker
cbe0e331fd ocfs2_dlmfs: Enable the use of user cluster stacks.
Unlike ocfs2, dlmfs has no permanent storage.  It can't store off a
cluster stack it is supposed to be using.  So it can't specify the stack
name in ocfs2_cluster_connect().

Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses
the stack that is currently enabled.  This is find for dlmfs, which will
rely on the stack initialization.

We add the "stackglue" capability to dlmfs's capability list.  This lets
userspace know dlmfs can be used with all cluster stacks.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:18 -08:00
Joel Becker
0016eedc41 ocfs2_dlmfs: Use the stackglue.
Rather than directly using o2dlm, dlmfs can now use the stackglue.  This
allows it to use userspace cluster stacks and fs/dlm.  This commit
forces o2cb for now.  A latter commit will bump the protocol version and
allow non-o2cb stacks.

This is one big sed, really.  LKM_xxMODE becomes DLM_LOCK_xx.  LKM_flag
becomes DLM_LKF_flag.

We also learn to check that the LVB is valid before reading it.  Any DLM
can lose the contents of the LVB during a complicated recovery.  userdlm
should be checking this.  Now it does.  dlmfs will return 0 from read(2)
if the LVB was invalid.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:18 -08:00
Joel Becker
e8fce482f3 ocfs2_dlmfs: Don't honor truncate. The size of a dlmfs file is LVB_LEN
We want folks using dlmfs to be able to use the LVB in places other than
just write(2)/read(2).  By ignoring truncate requests, we allow 'echo
"contents" > /dlm/space/lockname' to work.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:18 -08:00
Joel Becker
553b5eb91a ocfs2: Pass the locking protocol into ocfs2_cluster_connect().
Inside the stackglue, the locking protocol structure is hanging off of
the ocfs2_cluster_connection.  This takes it one further; the locking
protocol is passed into ocfs2_cluster_connect().  Now different cluster
connections can have different locking protocols with distinct asts.
Note that all locking protocols have to keep their maximum protocol
version in lock-step.

With the protocol structure set in ocfs2_cluster_connect(), there is no
need for the stackglue to have a static pointer to a specific protocol
structure.  We can change initialization to only pass in the maximum
protocol version.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:17 -08:00
Joel Becker
e603cfb074 ocfs2: Remove the ast pointers from ocfs2_stack_plugins
With the full ocfs2_locking_protocol hanging off of the
ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers
there.  They don't need to get them from their plugin structure.

The user plugin still needs the maximum locking protocol version,
though.  This changes the plugin structure so that it only holds the max
version, not the entire ocfs2_locking_protocol pointer.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:16 -08:00
Joel Becker
110946c8fb ocfs2: Hang the locking proto on the cluster conn and use it in asts.
With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we
have access to it in the ast and bast wrapper functions.  Attach the
ocfs2_locking_protocol to the conn.

Now, instead of refering to a static variable for ast/bast pointers, the
wrappers can look at the connection.  This means different connections
can have different ast/bast pointers, and it reduces the need for the
static pointer.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:16 -08:00
Joel Becker
c0e4133851 ocfs2: Attach the connection to the lksb
We're going to want it in the ast functions, so we convert union
ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:14 -08:00
Joel Becker
a796d2862a ocfs2: Pass lksbs back from stackglue ast/bast functions.
The stackglue ast and bast functions tried to maintain the fiction that
their arguments were void pointers.  In reality, stack_user.c had to
know that the argument was an ocfs2_lock_res in order to get the status
off of the lksb.  That's ugly.

This changes stackglue to always pass the lksb as the argument to ast
and bast functions.  The caller can always use container_of() to get the
ocfs2_lock_res or user_dlm_lock_res.  The net effect to the caller is
zero.  They still get back the lockres in their ast.  stackglue gets
cleaner, and now can use the lksb itself.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:14 -08:00
Joel Becker
34a9dd7e29 ocfs2_dlmfs: Move to its own directory
We're going to remove the tie between ocfs2_dlmfs and o2dlm.
ocfs2_dlmfs doesn't belong in the fs/ocfs2/dlm directory anymore.  Here
we move it to fs/ocfs2/dlmfs.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:14 -08:00
Joel Becker
65b6f34034 ocfs2_dlmfs: Use poll() to signify BASTs.
o2dlm's userspace filesystem is an easy way to use the DLM from
userspace.  It is intentionally simple. For example, it does not allow
for asynchronous behavior or lock conversion.  This is intentional to
keep the interface simple.

Because there is no asynchronous notification, there is no way for a
process holding a lock to know another node needs the lock.  This is the
number one complaint of ocfs2_dlmfs users.  Turns out, we can solve this
very easily.  We add poll() support to ocfs2_dlmfs.  When a BAST is
received, the lock's file descriptor will receive POLLIN.

This is trivial to implement.  Userdlm already has an appropriate
waitqueue, and the lock knows when it is blocked.

We add the "bast" capability to tell userspace this is available.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
Acked-by: Mark Fasheh <mfasheh@suse.com>

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:14 -08:00
Joel Becker
14a437c2b6 ocfs2_dlmfs: Add capabilities parameter.
Over time, dlmfs has added some features that were not part of the
initial ABI.  Unfortunately, some of these features are not detectable
via standard usage.  For example, Linux's default poll always returns
POLLIN, so there is no way for a caller of poll(2) to know when dlmfs
added poll support.  Instead, we provide this list of new capabilities.

Capabilities is a read-only attribute.  We do it as a module parameter
so we can discover it whether dlmfs is built in, loaded, or even not
loaded (via modinfo).

The ABI features are local to this machine's dlmfs mount.  This is
distinct from the locking protocol, which is concerned with inter-node
interaction.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:13 -08:00
Joel Becker
399ff3a748 ocfs2: Handle errors while setting external xattr values.
ocfs2 can store extended attribute values as large as a single file.  It
does this using a standard ocfs2 btree for the large value.  However,
the previous code did not handle all error cases cleanly.

There are multiple problems to have.

1) We have trouble allocating space for a new xattr.  This leaves us
   with an empty xattr.
2) We overwrote an existing local xattr with a value root, and now we
   have an error allocating the storage.  This leaves us an empty xattr.
   where there used to be a value.  The value is lost.
3) We have trouble truncating a reused value.  This leaves us with the
   original entry pointing to the truncated original value.  The value
   is lost.
4) We have trouble extending the storage on a reused value.  This leaves
   us with the original value safely in place, but with more storage
   allocated when needed.

This doesn't consider storing local xattrs (values that don't require a
btree).  Those only fail when the journal fails.

Case (1) is easy.  We just remove the xattr we added.  We leak the
storage because we can't safely remove it, but otherwise everything is
happy.  We'll print a warning about the leak.

Case (4) is easy.  We still have the original value in place.  We can
just leave the extra storage attached to this xattr.  We return the
error, but the old value is untouched.  We print a warning about the
storage.

Case (2) and (3) are hard because we've lost the original values.  In
the old code, we ended up with values that could be partially read.
That's not good.  Instead, we just wipe the xattr entry and leak the
storage.  It stinks that the original value is lost, but now there isn't
a partial value to be read.  We'll print a big fat warning.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:13 -08:00
Joel Becker
139ffacebf ocfs2: Set inline xattr entries with ocfs2_xa_set()
ocfs2_xattr_ibody_set() is the only remaining user of
ocfs2_xattr_set_entry().  ocfs2_xattr_set_entry() actually does two
things: it calls ocfs2_xa_set(), and it initializes the inline xattrs.
Initializing the inline space really belongs in its own call.

We lift the initialization to ocfs2_xattr_ibody_init(), called from
ocfs2_xattr_ibody_set() only when necessary.  Now
ocfs2_xattr_ibody_set() can call ocfs2_xa_set() directly.
ocfs2_xattr_set_entry() goes away.

Another nice fact is that ocfs2_init_dinode_xa_loc() can trust
i_xattr_inline_size.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:13 -08:00
Joel Becker
d3981544d7 ocfs2: Set xattr block entries with ocfs2_xa_set()
ocfs2_xattr_block_set() calls into ocfs2_xattr_set_entry() with just the
HAS_XATTR flag.  Most of the machinery of ocfs2_xattr_set_entry() is
skipped.  All that really happens other than the call to ocfs2_xa_set()
is making sure the HAS_XATTR flag is set on the inode.

But HAS_XATTR should be set when we also set di->i_xattr_loc.  And
that's done in ocfs2_create_xattr_block().  So let's move it there, and
then ocfs2_xattr_block_set() can just call ocfs2_xa_set().

While we're there, ocfs2_create_xattr_block() can take the set_ctxt for
a smaller argument list.  It also learns to set HAS_XATTR_FL, because it
knows for sure.  ocfs2_create_empty_xatttr_block() in the reflink path
fakes a set_ctxt to call ocfs2_create_xattr_block().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:13 -08:00
Joel Becker
c5d95df5f7 ocfs2: Let ocfs2_xa_prepare_entry() do space checks.
ocfs2_xattr_set_in_bucket() doesn't need to do its own hacky space
checking.  Let's let ocfs2_xa_prepare_entry() (via ocfs2_xa_set()) do
the more accurate work.  Whenever it doesn't have space,
ocfs2_xattr_set_in_bucket() can try to get more space.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:12 -08:00
Joel Becker
bca5e9bd1e ocfs2: Gell into ocfs2_xa_set()
ocfs2_xa_set() wraps the ocfs2_xa_prepare_entry()/ocfs2_xa_store_value()
logic.  Both callers can now use the same routine.  ocfs2_xa_remove()
moves directly into ocfs2_xa_set().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:11 -08:00
Joel Becker
73857ee0b5 ocfs2: Allocation in ocfs2_xa_prepare_entry(), values in ocfs2_xa_store_value()
ocfs2_xa_prepare_entry() gets all the logic to add, remove, or modify
external value trees.  Now, when it exits, the entry is ready to receive
a value of any size.

ocfs2_xa_remove() is added to handle the complete removal of an entry.
It truncates the external value tree before calling
ocfs2_xa_remove_entry().

ocfs2_xa_store_inline_value() becomes ocfs2_xa_store_value().  It can
store any value.

ocfs2_xattr_set_entry() loses all the allocation logic and just uses
these functions.  ocfs2_xattr_set_value_outside() disappears.

ocfs2_xattr_set_in_bucket() uses these functions and makes
ocfs2_xattr_set_entry_in_bucket() obsolete.  That goes away, as does
ocfs2_xattr_bucket_set_value_outside() and
ocfs2_xattr_bucket_value_truncate().

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:11 -08:00
Joel Becker
cf2bc80940 ocfs2: Teach ocfs2_xa_loc how to do its own journal work
We're going to want to make sure our buffers get accessed and dirtied
correctly.  So have the xa_loc do the work.  This includes storing the
inode on ocfs2_xa_loc.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:11 -08:00
Joel Becker
3fc12afa0c ocfs2: Provide ocfs2_xa_fill_value_buf() for external value processing
We use the ocfs2_xattr_value_buf structure to manage external values.
It lets the value tree code do its work regardless of the containing
storage.  ocfs2_xa_fill_value_buf() initializes a value buf from an
ocfs2_xa_loc entry.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:11 -08:00
Joel Becker
9dc474005d ocfs2: Handle value tree roots in ocfs2_xa_set_inline_value()
Previously the xattr code would send in a fake value, containing a tree
root, to the function that installed name+value pairs.  Instead, we pass
the real value to ocfs2_xa_set_inline_value(), and it notices that the
value cannot fit.  Thus, it installs a tree root.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:10 -08:00
Joel Becker
69a3e539d0 ocfs2: Set the xattr name+value pair in one place
We create two new functions on ocfs2_xa_loc, ocfs2_xa_prepare_entry()
and ocfs2_xa_store_inline_value().

ocfs2_xa_prepare_entry() makes sure that the xl_entry field of
ocfs2_xa_loc is ready to receive an xattr.  The entry will point to an
appropriately sized name+value region in storage.  If an existing entry
can be reused, it will be.  If no entry already exists, it will be
allocated.  If there isn't space to allocate it, -ENOSPC will be
returned.

ocfs2_xa_store_inline_value() stores the data that goes into the 'value'
part of the name+value pair.  For values that don't fit directly, this
stores the value tree root.

A number of operations are added to ocfs2_xa_loc_operations to support
these functions.  This reflects the disparate behaviors of xattr blocks
and buckets.

With these functions, the overlapping ocfs2_xattr_set_entry_local() and
ocfs2_xattr_set_entry_normal() can be replaced with a single call
scheme.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:10 -08:00
Joel Becker
199799a360 ocfs2: Wrap calculation of name+value pair size.
An ocfs2 xattr entry stores the text name and value as a pair in the
storage area.  Obviously names and values can be variable-sized.  If a
value is too large for the entry storage, a tree root is stored instead.
The name+value pair is also padded.

Because of this, there are a million places in the code that do:

	if (needs_external_tree(value_size)
		namevalue_size = pad(name_size) + tree_root_size;
	else
		namevalue_size = pad(name_size) + pad(value_size);

Let's create some convenience functions to make the code more readable.
There are three forms.  The first takes the raw sizes.  The second takes
an ocfs2_xattr_info structure.  The third takes an existing
ocfs2_xattr_entry.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:10 -08:00
Joel Becker
18853b95d1 ocfs2: Add a name_len field to ocfs2_xattr_info.
Rather than calculating strlen all over the place, let's store the
name length directly on ocfs2_xattr_info.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:09 -08:00
Joel Becker
6b240ff63c ocfs2: Prefix the member fields of struct ocfs2_xattr_info.
struct ocfs2_xattr_info is a useful structure describing an xattr
you'd like to set.  Let's put prefixes on the member fields so it's
easier to read and use.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:09 -08:00
Joel Becker
bde1e5400a ocfs2: Remove xattrs via ocfs2_xa_loc
Add ocfs2_xa_remove_entry(), which will remove an xattr entry from its
storage via the ocfs2_xa_loc descriptor.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:09 -08:00
Joel Becker
11179f2c92 ocfs2: Introduce ocfs2_xa_loc
The ocfs2 extended attribute (xattr) code is very flexible.  It can
store xattrs in the inode itself, in an external block, or in a tree of
data structures.  This allows the number of xattrs to be bounded by the
filesystem size.

However, the code that manages each possible storage location is
different.  Maintaining the ocfs2 xattr code requires changing each hunk
separately.

This patch is the start of a series introducing the ocfs2_xa_loc
structure.  This structure wraps the on-disk details of an xattr
entry.  The goal is that the generic xattr routines can use
ocfs2_xa_loc without knowing the underlying storage location.

This first pass merely implements the basic structure, initializing it,
and wiping the name+value pair of the entry.

Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:08 -08:00
Sunil Mushran
8545e03d82 ocfs2: Add current->comm in trace output
Add current->comm to the standard mlog() output to help with debugging.

Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:08 -08:00
Wengang Wang
96a1cc731a ocfs2: Clean up the checks for CoW and direct I/O.
When ocfs2 has to do CoW for refcounted extents, we disable direct I/O
and go through the buffered I/O path.  This makes the combined check
easier to read.

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:07 -08:00
Tiger Yang
b89c54282d ocfs2: add extent block stealing for ocfs2 v5
This patch add extent block (metadata) stealing mechanism for
extent allocation. This mechanism is same as the inode stealing.
if no room in slot specific extent_alloc, we will try to
allocate extent block from the next slot.

Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Acked-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
2010-02-26 15:41:07 -08:00
Alex Elder
398007f863 Merge branch 'linux-2.6.33' 2010-02-26 14:34:02 -06:00
David Teigland
b6fa8796b2 dlm: use bastmode in debugfs output
The bast mode that appears in the debugfs output should be
useful on both master and process nodes.  lkb_highbast is
currently printed, and is only useful on the master node.
lkb_bastmode is only useful on the process node.  This
patch sets lkb_bastmode on the master node as well, and
uses that value in the debugfs print.

Signed-off-by: David Teigland <teigland@redhat.com>
2010-02-26 12:15:54 -06:00
Steven Whitehouse
b4a5d4bc37 dlm: Send lockspace name with uevents
Although it is possible to get this information from the path,
its much easier to provide the lockspace as a seperate env
variable.

Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David Teigland <teigland@redhat.com>
2010-02-26 12:14:25 -06:00
Sage Weil
080af17e9c ceph: remove bogus mds forward warning
The must_resend flag is always true, not false.  In any case, we can
just ignore it anyway.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-02-26 10:02:03 -08:00
David Teigland
cf6620acc0 dlm: send reply before bast
When the lock master processes a successful operation (request,
convert, cancel, or unlock), it will process the effects of the
change before sending the reply for the operation.  The "effects"
of the operation are:

- blocking callbacks (basts) for any newly granted locks
- waiting or converting locks that can now be granted

The cast is queued on the local node when the reply from the lock
master is received.  This means that a lock holder can receive a
bast for a lock mode that is doesn't yet know has been granted.

Signed-off-by: David Teigland <teigland@redhat.com>
2010-02-26 11:57:37 -06:00