Commit graph

3802 commits

Author SHA1 Message Date
Alex Elder
9d4df01f08 rbd: define separate read and write format funcs
Separate rbd_osd_req_format() into two functions, one for read
requests and the other for write requests.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:07 -07:00
Alex Elder
c5b5ef6c51 rbd: issue stat request before layered write
This is a step toward fully implementing layered writes.

Add checks before request submission for the object(s) associated
with an image request.  For write requests, if we don't know that
the target object exists, issue a STAT request to find out.  When
that request completes, mark the known and exists flags for the
original object request accordingly and re-submit the object
request.  (Note that this still does the existence check only; the
copyup operation is not yet done.)

A new object request is created to perform the existence check.  A
pointer to the original request is added to that object request to
allow the stat request to re-issue the original request after
updating its flags.  If there is a failure with the stat request
the error code is stored with the original request, which is then
completed.

This resolves:
    http://tracker.ceph.com/issues/3418

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:04 -07:00
Alex Elder
5679c59f60 rbd: add target object existence flags
This creates two new flags for object requests to indicate what is
known about the existence of the object to which a request is to be
sent.  The KNOWN flag will be true if the the EXISTS flag is
meaningful.  That is:

    KNOWN   EXISTS
    -----   ------
      0       0     don't know whether the object exists
      0       1     (not used/invalid)
      1       0     object is known to not exist
      1       0     object is known to exist

This will be used in determining how to handle write requests for
data objects for layered rbd images.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:03 -07:00
Alex Elder
57acbaa7fb rbd: always check IMG_DATA flag
In a few spots, whether the an object request's img_request pointer
is null is used to determine whether an object request is being done
as part of an image data request.

Stop doing that, and instead always use the object request IMG_DATA
flag for that purpose.  Swap the order of the definition of the
IMG_DATA and DONE flag helpers, because obj_request_done_set() now
refers to obj_request_img_data_set() to get its rbd_dev value.

This will become important because the img_request pointer is
about to become part of a union.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:02 -07:00
Alex Elder
b155e86cf6 rbd: adjust image object request ref counting
An extra reference is taken when an object request is added as one
of the requests making up an image object.  A reference is dropped
again when the image's object requests get submitted.

The original reference for the object request will remain throughout
this period, so we don't need to add and then take away an extra
one.

This can be interpreted as the image request inheriting the original
object request's reference.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:19:01 -07:00
Alex Elder
406e2c9f92 libceph: kill off osd data write_request parameters
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data.  Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:58 -07:00
Alex Elder
8b3e1a5698 rbd: implement layered reads
Implement layered read requests for format 2 rbd images.

If an rbd image is a clone of a snapshot, the snapshot will be the
clone's "parent" image.  When an object read request on a clone
comes back with ENOENT it indicates that the clone is not yet
populated with that portion of the image's data, and the parent
image should be consulted to satisfy the read.

When this occurs, a new image request is created, directed to the
parent image.  The offset and length of the image are the same as
the image-relative offset and length of the object request that
produced ENOENT.  Data from the parent image therefore satisfies the
object read request for the original image request.

While this code works, it will not be active until we enable the
layering feature (by adding RBD_FEATURE_LAYERING to the value of
RBD_FEATURES_SUPPORTED).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:48 -07:00
Alex Elder
2f82ee54d9 rbd: probe the parent of an image if present
Call the probe function for the parent device if one is present.
Since we don't formally support the layering feature we won't
be using this functionality just yet.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:47 -07:00
Alex Elder
6365d33a27 rbd: add an object request flag for image data objects
Add a flag to distinguish between object requests being done on
standalone objects and requests being sent for objects representing
rbd image data (i.e., object requests that are the result of image
request).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:46 -07:00
Alex Elder
926f9b3f08 rbd: define an rbd object request flags field
We're going to need some more Boolean values for object requests,
so create a flags bit field and use it to record whether the request
is done.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:44 -07:00
Alex Elder
1217857fbf rbd: encapsulate image object end request handling
Encapsulate the code that completes processing of an object request
that's part of an image request.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:43 -07:00
Alex Elder
d0b2e94455 rbd: define image request layered flag
Define a flag indicating whether an image request is for a layered
image (one with a parent image to which requests will be redirected
if the target object of a request does not exist).  The code that
checks this flag will be added shortly.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:42 -07:00
Alex Elder
9849e98636 rbd: define image request originator flag
Define a flag indicating whether an image request originated from
the Linux block layer (from blk_fetch_request()) or whether it was
initiated in order to satisfy an object request for a child image
of a layered rbd device.  For image requests initiated by objects of
child images we'll save a pointer to the object request rather than
the Linux block request.

For now, only block requests are used.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:41 -07:00
Alex Elder
0c425248e0 rbd: define image request flags
There are several Boolean values we'll be maintaining for image
requests.  Switch from the single write_request field to a
general-purpose flags field, and use one if its bits to represent
the direction of I/O for the image request.  Define helper functions
for setting and testing that flag.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:40 -07:00
Alex Elder
7da22d296d rbd: record image-relative offset in object requests
For an image object request we will need to know what offset within
the rbd image the request covers.  Record that when the object
request gets created.

Update the I/O error warnings so they use this so what's reported
is more informative.

Rename a local variable to fit the convention used everywhere else.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:39 -07:00
Alex Elder
55f27e0931 rbd: record aggregate image transfer count
Compute the total number of bytes transferred for an image
request--the sum across each of the request's object requests.
To avoid contention do it only when all object requests are
complete, in rbd_img_request_complete().

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:38 -07:00
Alex Elder
a5a337d438 rbd: record overall image request result
If any image object request produces a non-zero result, preserve
that as the result of the overall image request.  If multiple
objects have non-zero results, save only the first one.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:37 -07:00
Alex Elder
5cbf6f12c4 rbd: update feature bits
There is a new rbd feature bit defined for "fancy striping." Add
it to the ones defined in the kernel client.

Change RBD_FEATURES_ALL so it represents the set of all feature
bits (rather than just the ones we support).  Define a new symbol
RBD_FEATURES_SUPPORTED to indicate the supported ones.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:36 -07:00
Alex Elder
04017e29bb libceph: make method call data be a separate data item
Right now the data for a method call is specified via a pointer and
length, and it's copied--along with the class and method name--into
a pagelist data item to be sent to the osd.  Instead, encode the
data in a data item separate from the class and method names.

This will allow large amounts of data to be supplied to methods
without copying.  Only rbd uses the class functionality right now,
and when it really needs this it will probably need to use a page
array rather than a page list.  But this simple implementation
demonstrates the functionality on the osd client, and that's enough
for now.

This resolves:
    http://tracker.ceph.com/issues/4104

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:35 -07:00
Alex Elder
a4ce40a9a7 libceph: combine initializing and setting osd data
This ends up being a rather large patch but what it's doing is
somewhat straightforward.

Basically, this is replacing two calls with one.  The first of the
two calls is initializing a struct ceph_osd_data with data (either a
page array, a page list, or a bio list); the second is setting an
osd request op so it associates that data with one of the op's
parameters.  In place of those two will be a single function that
initializes the op directly.

That means we sort of fan out a set of the needed functions:
    - extent ops with pages data
    - extent ops with pagelist data
    - extent ops with bio list data
and
    - class ops with page data for receiving a response

We also have define another one, but it's only used internally:
    - class ops with pagelist data for request parameters

Note that we *still* haven't gotten rid of the osd request's
r_data_in and r_data_out fields.  All the osd ops refer to them for
their data.  For now, these data fields are pointers assigned to the
appropriate r_data_* field when these new functions are called.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:23 -07:00
Alex Elder
2169238dd3 rbd: rearrange some code for consistency
This patch just trivially moves around some code for consistency.

In preparation for initializing osd request data fields in
ceph_osdc_build_request(), I wanted to verify that rbd did in fact
call that immediately before it called ceph_osdc_start_request().
It was true (although image requests are built in a group and then
started as a group).  But I made the changes here just to make
it more obvious, by making all of the calls follow a common
sequence:
	osd_req_op_<optype>_init();
	ceph_osd_data_<type>_init()
	osd_req_op_<optype>_<datafield>()
	rbd_osd_req_format()
	...
	ret = rbd_obj_request_submit()

I moved the initialization of the callback for image object requests
into rbd_img_request_fill_bio(), again, for consistency.  To avoid
a forward reference, I moved the definition of rbd_img_obj_callback()
up in the file.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:18 -07:00
Alex Elder
44cd188d48 rbd: separate initialization of osd data
The osd data for a request is currently initialized inside
rbd_osd_req_create(), but that assumes an object request's data
belongs in the osd request's data in or data out field.

There are only three places where requests with data are set up, and
it turns out it's easier to call just the osd data init routines
directly there rather than handling it in rbd_osd_req_create().

(The real motivation here is moving toward getting rid of the
osd request in and out data fields.)

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:17 -07:00
Alex Elder
2fa123201a rbd: don't set data in rbd_osd_req_format_op()
Currently an object request has its osd request's data field set in
rbd_osd_req_format_op().  That assumes a single osd op per object
request, and that won't be the case for long.

Move the code that sets this out and into the caller.

Rename rbd_osd_req_format_op() to be just rbd_osd_req_format(),
removing the notion that it's doing anything op-specific.

This and the next patch resolve:
    http://tracker.ceph.com/issues/4658

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:16 -07:00
Alex Elder
c99d2d4abb libceph: specify osd op by index in request
An osd request now holds all of its source op structures, and every
place that initializes one of these is in fact initializing one
of the entries in the the osd request's array.

So rather than supplying the address of the op to initialize, have
caller specify the osd request and an indication of which op it
would like to initialize.  This better hides the details the
op structure (and faciltates moving the data pointers they use).

Since osd_req_op_init() is a common routine, and it's not used
outside the osd client code, give it static scope.  Also make
it return the address of the specified op (so all the other
init routines don't have to repeat that code).

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:15 -07:00
Alex Elder
8c042b0df9 libceph: add data pointers in osd op structures
An extent type osd operation currently implies that there will
be corresponding data supplied in the data portion of the request
(for write) or response (for read) message.  Similarly, an osd class
method operation implies a data item will be supplied to receive
the response data from the operation.

Add a ceph_osd_data pointer to each of those structures, and assign
it to point to eithre the incoming or the outgoing data structure in
the osd message.  The data is not always available when an op is
initially set up, so add two new functions to allow setting them
after the op has been initialized.

Begin to make use of the data item pointer available in the osd
operation rather than the request data in or out structure in
places where it's convenient.  Add some assertions to verify
pointers are always set the way they're expected to be.

This is a sort of stepping stone toward really moving the data
into the osd request ops, to allow for some validation before
making that jump.

This is the first in a series of patches that resolve:
    http://tracker.ceph.com/issues/4657

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:14 -07:00
Alex Elder
79528734f3 libceph: keep source rather than message osd op array
An osd request keeps a pointer to the osd operations (ops) array
that it builds in its request message.

In order to allow each op in the array to have its own distinct
data, we will need to keep track of each op's data, and that
information does not go over the wire.

As long as we're tracking the data we might as well just track the
entire (source) op definition for each of the ops.  And if we're
doing that, we'll have no more need to keep a pointer to the
wire-encoded version.

This patch makes the array of source ops be kept with the osd
request structure, and uses that instead of the version encoded in
the message in places where that was previously used.  The array
will be embedded in the request structure, and the maximum number of
ops we ever actually use is currently 2.  So reduce CEPH_OSD_MAX_OP
to 2 to reduce the size of the structure.

The result of doing this sort of ripples back up, and as a result
various function parameters and local variables become unnecessary.

Make r_num_ops be unsigned, and move the definition of struct
ceph_osd_req_op earlier to ensure it's defined where needed.

It does not yet add per-op data, that's coming soon.

This resolves:
    http://tracker.ceph.com/issues/4656

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:12 -07:00
Alex Elder
430c28c3cb rbd: define rbd_osd_req_format_op()
Define rbd_osd_req_format_op(), which encapsulates formatting
an osd op into an object request's osd request message.  Only
one op is supported right now.

Stop calling ceph_osdc_build_request() in rbd_osd_req_create().
Instead, call rbd_osd_req_format_op() in each of the callers of
rbd_osd_req_create().

This is to prepare for the next patch, in which the source ops for
an osd request will be held in the osd request itself.  Because of
that, we won't have the source op to work with until after the
request is created, so we can't format the op until then.

This an the next patch resolve:
    http://tracker.ceph.com/issues/4656

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:11 -07:00
Alex Elder
43bfe5de9f libceph: define osd data initialization helpers
Define and use functions that encapsulate the initializion of a
ceph_osd_data structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:06 -07:00
Alex Elder
6010a451c3 rbd: define inbound data size for method ops
When rbd creates an object request containing an object method call
operation it is passing 0 for the size.  I originally thought this
was because the length was not needed for method calls, but I think
it really should be supplied, to describe how much space is
available to receive response data.  So provide the supplied length.

This resolves:
    http://tracker.ceph.com/issues/4659

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:18:04 -07:00
Alex Elder
fdce58ccb5 libceph: record length of bio list with bio
When assigning a bio pointer to an osd request, we don't have an
efficient way of knowing the total length bytes in the bio list.
That information is available at the point it's set up by the rbd
code, so record it with the osd data when it's set.

This and the next patch are related to maintaining the length of a
message's data independent of the message header, as described here:
    http://tracker.ceph.com/issues/4589

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:56 -07:00
Alex Elder
33803f3300 libceph: define source request op functions
The rbd code has a function that allocates and populates a
ceph_osd_req_op structure (the in-core version of an osd request
operation).  When reviewed, Josh suggested two things: that the
big varargs function might be better split into type-specific
functions; and that this functionality really belongs in the osd
client rather than rbd.

This patch implements both of Josh's suggestions.  It breaks
up the rbd function into separate functions and defines them
in the osd client module as exported interfaces.  Unlike the
rbd version, however, the functions don't allocate an osd_req_op
structure; they are provided the address of one and that is
initialized instead.

The rbd function has been eliminated and calls to it have been
replaced by calls to the new routines.  The rbd code now now use a
stack (struct) variable to hold the op rather than allocating and
freeing it each time.

For now only the capabilities used by rbd are implemented.
Implementing all the other osd op types, and making the rest of the
code use it will be done separately, in the next few patches.

Note that only the extent, cls, and watch portions of the
ceph_osd_req_op structure are currently used.  Delete the others
(xattr, pgls, and snap) from its definition so nobody thinks it's
actually implemented or needed.  We can add it back again later
if needed, when we know it's been tested.

This (and a few follow-on patches) resolves:
    http://tracker.ceph.com/issues/3861

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:45 -07:00
Alex Elder
adfe695a25 ceph: move max constant definitions
Move some definitions for max integer values out of the rbd code and
into the more central "decode.h" header file.  These really belong
in a Linux (or libc) header somewhere, but I haven't gotten around
to proposing that yet.

This is in preparation for moving some code out of rbd.c and into
the osd client.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:42 -07:00
Alex Elder
175face2ba libceph: let osd ops determine request data length
The length of outgoing data in an osd request is dependent on the
osd ops that are embedded in that request.  Each op is encoded into
a request message using osd_req_encode_op(), so that should be used
to determine the amount of outgoing data implied by the op as it
is encoded.

Have osd_req_encode_op() return the number of bytes of outgoing data
implied by the op being encoded, and accumulate and use that in
ceph_osdc_build_request().

As a result, ceph_osdc_build_request() no longer requires its "len"
parameter, so get rid of it.

Using the sum of the op lengths rather than the length provided is
a valid change because:
    - The only callers of osd ceph_osdc_build_request() are
      rbd and the osd client (in ceph_osdc_new_request() on
      behalf of the file system).
    - When rbd calls it, the length provided is only non-zero for
      write requests, and in that case the single op has the
      same length value as what was passed here.
    - When called from ceph_osdc_new_request(), (it's not all that
      easy to see, but) the length passed is also always the same
      as the extent length encoded in its (single) write op if
      present.

This resolves:
    http://tracker.ceph.com/issues/4406

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:17:02 -07:00
Alex Elder
e0c594878e libceph: record byte count not page count
Record the byte count for an osd request rather than the page count.
The number of pages can always be derived from the byte count (and
alignment/offset) but the reverse is not true.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:36 -07:00
Alex Elder
0fff87ec79 libceph: separate read and write data
An osd request defines information about where data to be read
should be placed as well as where data to write comes from.
Currently these are represented by common fields.

Keep information about data for writing separate from data to be
read by splitting these into data_in and data_out fields.

This is the key patch in this whole series, in that it actually
identifies which osd requests generate outgoing data and which
generate incoming data.  It's less obvious (currently) that an osd
CALL op generates both outgoing and incoming data; that's the focus
of some upcoming work.

This resolves:
    http://tracker.ceph.com/issues/4127

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:27 -07:00
Alex Elder
2ac2b7a6d4 libceph: distinguish page and bio requests
An osd request uses either pages or a bio list for its data.  Use a
union to record information about the two, and add a data type
tag to select between them.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:25 -07:00
Alex Elder
2794a82a11 libceph: separate osd request data info
Pull the fields in an osd request structure that define the data for
the request out into a separate structure.

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-05-01 21:16:24 -07:00
Linus Torvalds
20b4fb4852 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
  don't bother with deferred freeing of fdtables
  proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
  proc: Make the PROC_I() and PDE() macros internal to procfs
  proc: Supply a function to remove a proc entry by PDE
  take cgroup_open() and cpuset_open() to fs/proc/base.c
  ppc: Clean up scanlog
  ppc: Clean up rtas_flash driver somewhat
  hostap: proc: Use remove_proc_subtree()
  drm: proc: Use remove_proc_subtree()
  drm: proc: Use minor->index to label things, not PDE->name
  drm: Constify drm_proc_list[]
  zoran: Don't print proc_dir_entry data in debug
  reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
  proc: Supply an accessor for getting the data from a PDE's parent
  airo: Use remove_proc_subtree()
  rtl8192u: Don't need to save device proc dir PDE
  rtl8187se: Use a dir under /proc/net/r8180/
  proc: Add proc_mkdir_data()
  proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
  proc: Move PDE_NET() to fs/proc/proc_net.c
  ...
2013-05-01 17:51:54 -07:00
Arjan van de Ven
564a232c05 NVMe: Set TASK_INTERRUPTIBLE before processing queues
The kthread has two tasks; handling timeouts (for which it runs once per
second), and submitting queued BIOs.  If a BIO happens to be queued after
the thread has processed the queue but before it calls schedule_timeout(),
the thread will sleep for a second before submitting it, which can cause
performance problems in some rare cases (that will become more common in
a subsequent patch).

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-05-01 16:40:27 -04:00
Mihnea Dobrescu-Balaur
60abc786dd aoe: replace kmalloc and then memcpy with kmemdup
Signed-off-by: Mihnea Dobrescu-Balaur <mihneadb@gmail.com>
Cc: Ed Cashin <ecashin@coraid.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:08 -07:00
Michal Belczyk
078be02b80 nbd: increase default and max request sizes
Raise the default max request size for nbd to 128KB (from 127KB) to get it
4KB aligned.  This patch also allows the max request size to be increased
(via /sys/block/nbd<x>/queue/max_sectors_kb) to 32MB.

The patch makes nbd network traffic more efficient by:
- reducing request fragmentation (4KB alignment)
- reducing the number of requests (fewer round trips, less network overhead)

Especially in high latency networks, larger request size can make a dramatic

Signed-off-by: Paul Clements <paul.clements@steeleye.com>
Signed-off-by: Michal Belczyk <belczyk@bsd.krakow.pl>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-30 17:04:07 -07:00
Akinobu Mita
38b682b261 drbd: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-04-29 18:28:42 -07:00
Mike Miller
0821e90405 cciss: bug fix to prevent cciss from loading in kdump crash kernel
By default the cciss driver supports all "older" HP Smart Array
controllers and hpsa supports all controllers starting with the G6 family.
 There are module parameters that allow a user to override those defaults
and use hpsa for any HP Smart Array controller.

If the user does override the default behavior and uses hpsa for older
controllers it is possible that cciss may try to load in a kdump crash
kernel.  This may happen if cciss is loaded first from the kdump initrd
image.  If cciss does load rather than hpsa and reset_devices is true we
immediately call cciss_hard_reset_controller.  This will result in a
kernel panic and the core file cannot be created.  This patch prevents
cciss from trying to load in this scenario.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Mike Miller
e4292e05d4 cciss: add cciss_allow_hpsa module parameter
Add the cciss_allow_hpsa modules parameter.  This allows users to use the
hpsa driver instead of cciss for older controllers.

Signed-off-by: Mike Miller <mike.miller@hp.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Jingoo Han
aed3d67e57 drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
Add CONFIG_PM_SLEEP to suspend/resume functions to fix the following build
warning when CONFIG_PM_SLEEP is not selected.  This is because sleep PM
callbacks defined by SIMPLE_DEV_PM_OPS are only used when the
CONFIG_PM_SLEEP is enabled.

drivers/block/mg_disk.c:783:12: warning: 'mg_suspend' defined but not used [-Wunused-function]
drivers/block/mg_disk.c:807:12: warning: 'mg_resume' defined but not used [-Wunused-function]

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:24:02 +02:00
Asai Thambi S P
2077d94726 mtip32xx: Workaround for unaligned writes
Workaround for handling unaligned writes: limit number of outstanding
unaligned writes

Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Asai Thambi S P <asamymuthupa@micron.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-04-29 21:19:49 +02:00
Linus Torvalds
96d8683483 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "It's a simple fix for a hard to hit race, but low-risk and clearly
  correct"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  rbd: do a safe list traversal in rbd_img_request_submit()
2013-04-17 12:52:02 -07:00
Alex Elder
46faeed4a6 rbd: do a safe list traversal in rbd_img_request_submit()
It's possible that the reference to the object request dropped
inside the loop in rbd_img_request_submit() will be the last
one, in which case the content of the object pointer can't be
trusted.

Use a safe form of the object request list traversal to avoid
problems.

This resolves:
    http://tracker.ceph.com/issues/4705

Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-04-17 11:39:09 -07:00
Keith Busch
5e82e952f0 NVMe: Add a character device for each nvme device
Registers a miscellaneous device for each nvme controller probed. This
creates character device files as /dev/nvmeN, where N is the device
instance, and supports nvme admin ioctl commands so devices without
namespaces can be managed.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
2013-04-16 15:43:55 -04:00
Matthew Wilcox
1c9b52651d NVMe: Fix endian-related problems in user I/O submission path
When constructing the command, dsmgmt needs to be treated as a 32-bit
value, not a 16-bit value.  reftag, apptag and appmask all need to be
converted from native-endian to little-endian.  Again, sparse's bitwise
warnings caught this problem.  Thanks to Keith for pointing out the
correct way to fix the reftag.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Acked-by: Keith Busch <keith.busch@intel.com>
2013-04-16 15:21:06 -04:00