Commit graph

24807 commits

Author SHA1 Message Date
Joe Perches
8076c363da udf: Rename udf_error to udf_err
Rename udf_error to udf_err for consistency with normal logging
uses of pr_err.

Rename function udf_err to _udf_err.
Remove __func__ from uses and move __func__ to a new udf_err
macro that calls _udf_err.
Some of the udf_error uses had \n terminations, some did not so
standardize \n's to udf_err uses, remove \n from _udf_err function.
Coalesce udf_err formats.
One message prefixed with udf_read_super is now prefixed with
udf_fill_super.

Reviewed-by: NamJae Jeon <linkinjeon@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-10-10 19:29:01 +02:00
Joe Perches
7e273e3b41 udf: Promote some debugging messages to udf_error
If there is a problem with a scratched disc or loader, it's valuable to know
which error occurred.

Convert some debug messages to udf_error, neaten those messages too.
Add the calculated tag checksum and the read checksum to error message.
Make udf_error a public function and move the logging prototypes together.

Original-patch-by: NamJae Jeon <linkinjeon@gmail.com>
Reviewed-by: NamJae Jeon <linkinjeon@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-10-10 19:26:24 +02:00
Tao Ma
40bfa16dac ext3: Remove the obsolete broken EXT3_IOC32_WAIT_FOR_READONLY.
There are no user of EXT3_IOC32_WAIT_FOR_READONLY and also it is
broken. No one set the set_ro_timer, no one wake up us and our
state is set to TASK_INTERRUPTIBLE not RUNNING. So remove it.

Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-10-10 18:25:59 +02:00
Linus Torvalds
65112dccf8 Merge git://git.samba.org/sfrench/cifs-2.6
* git://git.samba.org/sfrench/cifs-2.6:
  [CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2
2011-10-10 14:53:11 +12:00
Fabrice Jouhaud
d44651d0f9 ext4: fix ext4 so it works without CONFIG_PROC_FS
This fixes a bug which was introduced in dd68314ccf.  The problem
came from the test of the return value of proc_mkdir which is always
false without procfs, and this would initialization of ext4.

Signed-off-by: Fabrice Jouhaud <yargil@free.fr>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 16:26:03 -04:00
Tao Ma
6ee3b21224 ext4: use le32_to_cpu for ext4_extent_idx.ei_block in ext4_ext_search_left()
ext4_extent_idx.e_block is __le32, so use le32_to_cpu() in
ext4_ext_search_left().

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 16:08:34 -04:00
Tao Ma
7fd59c83b0 ext4: remove the obsolete/broken EXT4_IOC_WAIT_FOR_READONLY ioctl
There are no users of the EXT4_IOC_WAIT_FOR_READONLY ioctl, and it is
also broken.  No one sets the set_ro_timer, no one wakes up us and our
state is set to TASK_INTERRUPTIBLE not RUNNING.  So remove it.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 15:56:35 -04:00
Tao Ma
df3ab17072 ext4: fix the comment describing ext4_ext_search_right()
The comment describing what ext4_ext_search_right() does is incorrect.
We return 0 in *phys when *logical is the 'largest' allocated block,
not smallest.  

Fix a few other typos while we're at it.

Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Tao Ma <boyu.mt@taobao.com>
2011-10-08 15:53:49 -04:00
Lukas Czerner
4113c4caa4 ext4: remove deprecated oldalloc
For a long time now orlov is the default block allocator in the
ext4. It performs better than the old one and no one seems to claim
otherwise so we can safely drop it and make oldalloc and orlov mount
option deprecated.

This is a part of the effort to reduce number of ext4 options hence the
test matrix.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-08 14:34:47 -04:00
Steve French
9d1e397b7b [CIFS] Fix first time message on mount, ntlmv2 upgrade delayed to 3.2
Microsoft has a bug with ntlmv2 that requires use of ntlmssp, but
we didn't get the required information on when/how to use ntlmssp to
old (but once very popular) legacy servers (various NT4 fixpacks
for example) until too late to merge for 3.1.  Will upgrade
to NTLMv2 in NTLMSSP in 3.2

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
2011-10-07 20:17:56 -05:00
Tao Ma
dcf2d804ed ext4: Free resources in some error path in ext4_fill_super
Some of the error path in ext4_fill_super don't release the
resouces properly. So this patch just try to release them
in the right way.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-06 12:10:11 -04:00
Tao Ma
7aa0baeaba ext4: Free resources in ext4_mb_init()'s error paths
In commit 79a77c5ac, we move ext4_mb_init_backend after the allocation
of s_locality_group to avoid memory leak in error path, but there are
still some other error paths in ext4_mb_init that need to do the same
work. So this patch adds all the error patch for ext4_mb_init. And all
the pointers are reset to NULL in case the caller may double free them.

Signed-off-by: Tao Ma <boyu.mt@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2011-10-06 10:22:28 -04:00
Namjae Jeon
bc1123239a udf: Add readpages support for udf.
Use mpage_readpages() instead of multiple calls to udf_readpage() to reduce the
CPU utilization and make performance higher.

Signed-off-by: Namjae Jeon <linkinjeon@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-10-06 00:13:48 +02:00
H Hartley Sweeten
3ee77f2091 ext3/balloc.c: local functions should be static
This quites the sparse noise:

warning: symbol 'ext3_trim_all_free' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Signed-off-by: Jan Kara <jack@suse.cz>
2011-10-05 01:29:22 +02:00
Boaz Harrosh
d866d875f6 ore/exofs: Change the type of the devices array (API change)
In the pNFS obj-LD the device table at the layout level needs
to point to a device_cache node, where it is possible and likely
that many layouts will point to the same device-nodes.

In Exofs we have a more orderly structure where we have a single
array of devices that repeats twice for a round-robin view of the
device table

This patch moves to a model that can be used by the pNFS obj-LD
where struct ore_components holds an array of ore_dev-pointers.
(ore_dev is newly defined and contains a struct osd_dev *od
 member)

Each pointer in the array of pointers will point to a bigger
user-defined dev_struct. That can be accessed by use of the
container_of macro.

In Exofs an __alloc_dev_table() function allocates the
ore_dev-pointers array as well as an exofs_dev array, in one
allocation and does the addresses dance to set everything pointing
correctly. It still keeps the double allocation trick for the
inodes round-robin view of the table.

The device table is always allocated dynamically, also for the
single device case. So it is unconditionally freed at umount.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-04 12:13:59 +02:00
Linus Torvalds
7fd21be75d Merge branch 'btrfs-3.0' of git://github.com/chrismason/linux
* 'btrfs-3.0' of git://github.com/chrismason/linux:
  Btrfs: force a page fault if we have a shorty copy on a page boundary
2011-10-03 12:17:44 -07:00
Boaz Harrosh
eb507bc189 ore: Make ore_striping_info and ore_calc_stripe_info public
The struct ore_striping_info will be used later in other
structures. And ore_calc_stripe_info as well. Rename them
make struct ore_striping_info public. ore_calc_stripe_info
is still static, will be made public on first use.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-03 17:07:51 +02:00
Boaz Harrosh
8d2d83a835 exofs: Remove unused data_map member from exofs_sb_info
The struct pnfs_osd_data_map data_map member of exofs_sb_info was
never used after mount. In fact all it's members were duplicated
by the ore_layout structure. So just remove the duplicated information.

Also removed some stupid, but perfectly supported, restrictions on
layout parameters. The case where num_devices is not divisible by
mirror_count+1 is perfectly fine since the rotating device view
will eventually use all the devices it can get.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@tonian.com>
2011-10-03 17:07:51 +02:00
Boaz Harrosh
5bf696dad4 exofs: Rename struct ore_components comps => oc
ore_components already has a comps member so this leads
to things like comps->comps which is annoying. the name oc
was already used in new code. So rename all old usage of
ore_components comps => ore_components oc.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-03 17:07:50 +02:00
H Hartley Sweeten
de74b05ace exofs/super.c: local functions should be static
This quiets the following sparse noise:

warning: symbol 'exofs_sync_fs' was not declared. Should it be static?
warning: symbol 'exofs_free_sbi' was not declared. Should it be static?
warning: symbol 'exofs_get_parent' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-03 17:07:29 +02:00
H Hartley Sweeten
1958c7c284 exofs/ore.c: local functions should be static
This quiets the sparse noise:

warning: symbol '_calc_trunk_info' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2011-10-03 17:06:47 +02:00
Wu Fengguang
b00949aa2d writeback: per-bdi background threshold
One thing puzzled me is that in JBOD case, the per-disk writeout
performance is smaller than the corresponding single-disk case even
when they have comparable bdi_thresh. Tracing shows find that in single
disk case, bdi_writeback is always kept high while in JBOD case, it
could drop low from time to time and correspondingly bdi_reclaimable
could sometimes rush high.

The fix is to watch bdi_reclaimable and kick background writeback as
soon as it goes high. This resembles the global background threshold
but in per-bdi manner. The trick is, as long as bdi_reclaimable does
not go high, bdi_writeback naturally won't go low because
bdi_reclaimable+bdi_writeback ~= bdi_thresh.

With less fluctuated writeback pages, JBOD performance is observed to
increase noticeably in various cases.

vmstat:nr_written values before/after patch:

  3.1.0-rc4-wo-underrun+      3.1.0-rc4-bgthresh3+  
------------------------  ------------------------  
               125596480       +25.9%    158179363  JBOD-10HDD-16G/ext4-100dd-1M-24p-16384M-20:10-X
                61790815      +110.4%    130032231  JBOD-10HDD-16G/ext4-10dd-1M-24p-16384M-20:10-X
                58853546        -0.1%     58823828  JBOD-10HDD-16G/ext4-1dd-1M-24p-16384M-20:10-X
               110159811       +24.7%    137355377  JBOD-10HDD-16G/xfs-100dd-1M-24p-16384M-20:10-X
                69544762       +10.8%     77080047  JBOD-10HDD-16G/xfs-10dd-1M-24p-16384M-20:10-X
                50644862        +0.5%     50890006  JBOD-10HDD-16G/xfs-1dd-1M-24p-16384M-20:10-X
                42677090       +28.0%     54643527  JBOD-10HDD-thresh=100M/ext4-100dd-1M-24p-16384M-100M:10-X
                47491324       +13.3%     53785605  JBOD-10HDD-thresh=100M/ext4-10dd-1M-24p-16384M-100M:10-X
                52548986        +0.9%     53001031  JBOD-10HDD-thresh=100M/ext4-1dd-1M-24p-16384M-100M:10-X
                26783091       +36.8%     36650248  JBOD-10HDD-thresh=100M/xfs-100dd-1M-24p-16384M-100M:10-X
                35526347       +14.0%     40492312  JBOD-10HDD-thresh=100M/xfs-10dd-1M-24p-16384M-100M:10-X
                44670723        -1.1%     44177606  JBOD-10HDD-thresh=100M/xfs-1dd-1M-24p-16384M-100M:10-X
               127996037       +22.4%    156719990  JBOD-10HDD-thresh=2G/ext4-100dd-1M-24p-16384M-2048M:10-X
                57518856        +3.8%     59677625  JBOD-10HDD-thresh=2G/ext4-10dd-1M-24p-16384M-2048M:10-X
                51919909       +12.2%     58269894  JBOD-10HDD-thresh=2G/ext4-1dd-1M-24p-16384M-2048M:10-X
                86410514       +79.0%    154660433  JBOD-10HDD-thresh=2G/xfs-100dd-1M-24p-16384M-2048M:10-X
                40132519       +38.6%     55617893  JBOD-10HDD-thresh=2G/xfs-10dd-1M-24p-16384M-2048M:10-X
                48423248        +7.5%     52042927  JBOD-10HDD-thresh=2G/xfs-1dd-1M-24p-16384M-2048M:10-X
               206041046       +44.1%    296846536  JBOD-10HDD-thresh=4G/xfs-100dd-1M-24p-16384M-4096M:10-X
                72312903       -19.4%     58272885  JBOD-10HDD-thresh=4G/xfs-10dd-1M-24p-16384M-4096M:10-X
                50635672        -0.5%     50384787  JBOD-10HDD-thresh=4G/xfs-1dd-1M-24p-16384M-4096M:10-X
                68308534      +115.7%    147324758  JBOD-10HDD-thresh=800M/ext4-100dd-1M-24p-16384M-800M:10-X
                57882933       +14.5%     66269621  JBOD-10HDD-thresh=800M/ext4-10dd-1M-24p-16384M-800M:10-X
                52183472       +12.8%     58855181  JBOD-10HDD-thresh=800M/ext4-1dd-1M-24p-16384M-800M:10-X
                53788956       +94.2%    104460352  JBOD-10HDD-thresh=800M/xfs-100dd-1M-24p-16384M-800M:10-X
                44493342       +35.5%     60298210  JBOD-10HDD-thresh=800M/xfs-10dd-1M-24p-16384M-800M:10-X
                42641209       +18.9%     50681038  JBOD-10HDD-thresh=800M/xfs-1dd-1M-24p-16384M-800M:10-X

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:58 +08:00
Wu Fengguang
af6a311384 writeback: add bg_threshold parameter to __bdi_update_bandwidth()
No behavior change.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
2011-10-03 21:08:56 +08:00
Arne Jansen
7a26285eea btrfs: use readahead API for scrub
Scrub uses a simple tree-enumeration to bring the relevant portions
of the extent- and csum-tree into the page cache before starting the
scrub-I/O. This is now replaced by using the new readahead-API.
During readahead the scrub is being accounted as paused, so it won't
hold off transaction commits.

This change raises the average disk bandwith utilisation on my test
volume from 70% to 90%. On another volume, the time for a test run
went down from 89s to 43s.

Changes v5:
 - reada1/2 are now of type struct reada_control *

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:48:45 +02:00
Arne Jansen
4bb31e928d btrfs: hooks for readahead
This adds the hooks needed for readahead. In the readpage_end_io_hook,
the extent state is checked for the EXTENT_READAHEAD flag. Only in this
case the readahead hook is called, to keep the impact on non-ra as low
as possible.
Additionally, a hook for a failed IO is added, otherwise readahead would
wait indefinitely for the extent to finish.

Changes for v2:
 - eliminate race condition

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:48:44 +02:00
Arne Jansen
7414a03fbf btrfs: initial readahead code and prototypes
This is the implementation for the generic read ahead framework.

To trigger a readahead, btrfs_reada_add must be called. It will start
a read ahead for the given range [start, end) on tree root. The returned
handle can either be used to wait on the readahead to finish
(btrfs_reada_wait), or to send it to the background (btrfs_reada_detach).

The read ahead works as follows:
On btrfs_reada_add, the root of the tree is inserted into a radix_tree.
reada_start_machine will then search for extents to prefetch and trigger
some reads. When a read finishes for a node, all contained node/leaf
pointers that lie in the given range will also be enqueued. The reads will
be triggered in sequential order, thus giving a big win over a naive
enumeration. It will also make use of multi-device layouts. Each disk
will have its on read pointer and all disks will by utilized in parallel.
Also will no two disks read both sides of a mirror simultaneously, as this
would waste seeking capacity. Instead both disks will read different parts
of the filesystem.
Any number of readaheads can be started in parallel. The read order will be
determined globally, i.e. 2 parallel readaheads will normally finish faster
than the 2 started one after another.

Changes v2:
 - protect root->node by transaction instead of node_lock
 - fix missed branches:
    The readahead had a too simple check to determine if a branch from
    a node should be checked or not. It now also records the upper bound
    of each node to see if the requested RA range lies within.
 - use KERN_CONT to debug output, to avoid line breaks
 - defer reada_start_machine to worker to avoid deadlock

Changes v3:
 - protect root->node by rcu

Changes v5:
 - changed EIO-semantics of reada_tree_block_flagged
 - remove spin_lock from reada_control and make elems an atomic_t
 - remove unused read_total from reada_control
 - kill reada_key_cmp, use btrfs_comp_cpu_keys instead
 - use kref-style release functions where possible
 - return struct reada_control * instead of void * from btrfs_reada_add

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:48:44 +02:00
Arne Jansen
90519d66ab btrfs: state information for readahead
Add state information for readahead to btrfs_fs_info and btrfs_device

Changes v2:
 - don't wait in radix_trees
 - add own set of workers for readahead

Reviewed-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:48:30 +02:00
Arne Jansen
ab0fff0305 btrfs: add READAHEAD extent buffer flag
Add a READAHEAD extent buffer flag.
Add a function to trigger a read with this flag set.

Changes v2:
 - use extent buffer flags instead of extent state flags

Changes v5:
 - adapt to changed read_extent_buffer_pages interface
 - don't return eb from reada_tree_block_flagged if it has CORRUPT flag set

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:47:57 +02:00
Arne Jansen
bb82ab88df btrfs: add an extra wait mode to read_extent_buffer_pages
read_extent_buffer_pages currently has two modes, either trigger a read
without waiting for anything, or wait for the I/O to finish. The former
also bails when it's unable to lock the page. This patch now adds an
additional parameter to allow it to block on page lock, but don't wait
for completion.

Changes v5:
 - merge the 2 wait parameters into one and define WAIT_NONE, WAIT_COMPLETE and
   WAIT_PAGE_LOCK

Change v6:
 - fix bug introduced in v5

Signed-off-by: Arne Jansen <sensille@gmx.net>
2011-10-02 08:47:55 +02:00
Chris Mason
286d6e70aa Merge branch 'btrfs-3.0' into for-linus 2011-09-30 15:26:09 -04:00
Josef Bacik
b6316429af Btrfs: force a page fault if we have a shorty copy on a page boundary
A user reported a problem where ceph was getting into 100% cpu usage while doing
some writing.  It turns out it's because we were doing a short write on a not
uptodate page, which means we'd fall back at one page at a time and fault the
page in.  The problem is our position is on the page boundary, so our fault in
logic wasn't actually reading the page, so we'd just spin forever or until the
page got read in by somebody else.  This will force a readpage if we end up
doing a short copy.  Alexandre could reproduce this easily with ceph and reports
it fixes his problem.  I also wrote a reproducer that no longer hangs my box
with this patch.  Thanks,

Reported-and-tested-by: Alexandre Oliva <aoliva@redhat.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
2011-09-30 15:23:54 -04:00
Jan Schmidt
5da6fcbc4e btrfs: integrating raid-repair and scrub-fixup-nodatasum
This ties nodatasum fixup in scrub together with raid repair patches. While
both series are working fine alone, scrub will report uncorrectable errors
if they occur in a nodatasum extent *and* the page is in the page cache.

Previously, we would have triggered readpage to find good data and do the
repair. However, readpage wouldn't read anything in the case where the page
is up to date in the cache. So, we simply take that good data we have and
call repair_io_failure directly (unless the page in the cache is dirty).

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 13:38:43 +02:00
Jan Schmidt
4a54c8c165 btrfs: Moved repair code from inode.c to extent_io.c
The raid-retry code in inode.c can be generalized so that it works for
metadata as well. Thus, this patch moves it to extent_io.c and makes the
raid-retry code a raid-repair code.

Repair works that way: Whenever a read error occurs and we have more
mirrors to try, note the failed mirror, and retry another. If we find a
good one, check if we did note a failure earlier and if so, do not allow
the read to complete until after the bad sector was written with the good
data we just fetched. As we have the extent locked while reading, no one
can change the data in between.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 13:38:42 +02:00
Jan Schmidt
2774b2ca3d btrfs: Put mirror_num in bi_bdev
The error correction code wants to make sure that only the bad mirror is
rewritten. Thus, we need to know which mirror is the bad one. I did not
find a more apropriate field than bi_bdev. But I think using this is fine,
because it is modified by the block layer, anyway, and should not be read
after the bio returned.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 13:38:42 +02:00
Jan Schmidt
1503140d3e btrfs: Do not use bio->bi_bdev after submission
The block layer modifies bio->bi_bdev and bio->bi_sector while working on
the bio, they do _not_ come back unmodified in the completion callback.

To call add_page, we need at least some bi_bdev set, which is why the code
was working, previously. With this patch, we use the latest_bdev from
fsinfo instead of the leftover in the bio. This gives us the possibility to
use the bi_bdev field for another purpose.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 13:38:42 +02:00
Jan Schmidt
a1d3c4786a btrfs: btrfs_multi_bio replaced with btrfs_bio
btrfs_bio is a bio abstraction able to split and not complete after the last
bio has returned (like the old btrfs_multi_bio). Additionally, btrfs_bio
tracks the mirror_num used to read data which can be used for error
correction purposes.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 13:38:42 +02:00
Jan Schmidt
d7728c960d btrfs: new ioctls to do logical->inode and inode->path resolving
these ioctls make use of the new functions initially added for scrub. they
return all inodes belonging to a logical address (BTRFS_IOC_LOGICAL_INO) and
all paths belonging to an inode (BTRFS_IOC_INO_PATHS).

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
0ef8e45158 btrfs scrub: add fixup code for errors on nodatasum files
This removes a FIXME comment and introduces the first part of nodatasum
fixup: It gets the corresponding inode for a logical address and triggers a
regular readpage for the corrupted sector.

Once we have on-the-fly error correction our error will be automatically
corrected. The correction code is expected to clear the newly introduced
EXTENT_DAMAGED flag, making scrub report that error as "corrected" instead
of "uncorrectable" eventually.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
e12fa9cd39 btrfs scrub: use int for mirror_num, not u64
the rest of the code uses int mirror_num, and so should scrub

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
8ddc7d9cd0 btrfs: add mirror_num to extent_read_full_page
Currently, extent_read_full_page always assumes we are trying to read mirror
0, which generally is the best we can do. To add flexibility, pass it as a
parameter. This will be needed by scrub fixup code.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
193ea74b27 btrfs scrub: bugfix: mirror_num off by one
Fix the mirror_num determination in scrub_stripe. The rest of the scrub code
did not use mirror_num for anything important and that error went unnoticed.
The nodatasum fixup patch of this set depends on a correct mirror_num.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
558540c177 btrfs scrub: print paths of corrupted files
While scrubbing, we may encounter various errors. Previously, a logical
address was printed to the log only. Now, all paths belonging to that
address are resolved and printed separately. That should work for hardlinks
as well as reflinks.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:28 +02:00
Jan Schmidt
13db62b7a1 btrfs scrub: added unverified_errors
In normal operation, scrub is reading data sequentially in large portions.
In case of an i/o error, we try to find the corrupted area(s) by issuing
page sized read requests. With this commit we increment the
unverified_errors counter if all of the small size requests succeed.

Userland patches carrying such conspicous events to the administrator should
already be around.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:27 +02:00
Jan Schmidt
a542ad1baf btrfs: added helper functions to iterate backrefs
These helper functions iterate back references and call a function for each
backref. There is also a function to resolve an inode to a path in the
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2011-09-29 12:54:27 +02:00
Jesper Juhl
95c7545453 CIFS: Don't free volume_info->UNC until we are entirely done with it.
In cleanup_volume_info_contents() we kfree(volume_info->UNC); and then
proceed to use that variable on the very next line.
This causes (at least) Coverity Prevent to complain about use-after-free
of that variable (and I guess other checkers may do that as well).
There's not any /real/ problem here since we are just using the value of
the pointer, not actually dereferencing it, but it's still trivial to
silence the tool, so why not?
To me at least it also just seems nicer to defer freeing the variable
until we are entirely done with it in all respects.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27 18:08:04 +02:00
Paul Bolle
395cf9691d doc: fix broken references
There are numerous broken references to Documentation files (in other
Documentation files, in comments, etc.). These broken references are
caused by typo's in the references, and by renames or removals of the
Documentation files. Some broken references are simply odd.

Fix these broken references, sometimes by dropping the irrelevant text
they were part of.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2011-09-27 18:08:04 +02:00
Linus Torvalds
b6c8069d35 vfs: remove LOOKUP_NO_AUTOMOUNT flag
That flag no longer makes sense, since we don't look up automount points
as eagerly any more.  Additionally, it turns out that the NO_AUTOMOUNT
handling was buggy to begin with: it would avoid automounting even for
cases where we really *needed* to do the automount handling, and could
return ENOENT for autofs entries that hadn't been instantiated yet.

With our new non-eager automount semantics, one discussion has been
about adding a AT_AUTOMOUNT flag to vfs_fstatat (and thus the
newfstatat() and fstatat64() system calls), but it's probably not worth
it: you can always force at least directory automounting by simply
adding the final '/' to the filename, which works for *all* of the stat
family system calls, old and new.

So AT_NO_AUTOMOUNT (and thus LOOKUP_NO_AUTOMOUNT) really were just a
result of our bad default behavior.

Acked-by: Ian Kent <raven@themaw.net>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-27 08:12:33 -07:00
Trond Myklebust
815d405cef VFS: Fix the remaining automounter semantics regressions
The concensus seems to be that system calls such as stat() etc should
not trigger an automount.  Neither should the l* versions.

This patch therefore adds a LOOKUP_AUTOMOUNT flag to tag those lookups
that _should_ trigger an automount on the last path element.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
[ Edited to leave out the cases that are already covered by LOOKUP_OPEN,
  LOOKUP_DIRECTORY and LOOKUP_CREATE - all of which also fundamentally
  force automounting for their own reasons   - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26 19:16:46 -07:00
Linus Torvalds
d94c177bee vfs pathname lookup: Add LOOKUP_AUTOMOUNT flag
Since we've now turned around and made LOOKUP_FOLLOW *not* force an
automount, we want to add the ability to force an automount event on
lookup even if we don't happen to have one of the other flags that force
it implicitly (LOOKUP_OPEN, LOOKUP_DIRECTORY, LOOKUP_PARENT..)

Most cases will never want to use this, since you'd normally want to
delay automounting as long as possible, which usually implies
LOOKUP_OPEN (when we open a file or directory, we really cannot avoid
the automount any more).

But Trond argued sufficiently forcefully that at a minimum bind mounting
a file and quotactl will want to force the automount lookup.  Some other
cases (like nfs_follow_remote_path()) could use it too, although
LOOKUP_DIRECTORY would work there as well.

This commit just adds the flag and logic, no users yet, though.  It also
doesn't actually touch the LOOKUP_NO_AUTOMOUNT flag that is related, and
was made irrelevant by the same change that made us not follow on
LOOKUP_FOLLOW.

Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Ian Kent <raven@themaw.net>
Cc: Jeff Layton <jlayton@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Greg KH <gregkh@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-09-26 17:44:55 -07:00
Heiko Carstens
c4253cb074 sysfs: add unsigned long cast to prevent compile warning
"sysfs: use rb-tree for inode number lookup" added a new printk which
causes a new compile warning on s390 (and few other architectures):

fs/sysfs/dir.c: In function 'sysfs_link_sibling':
fs/sysfs/dir.c:63:4: warning: format '%lx' expects argument of type
  'long unsigned int', but argument 2 has type 'ino_t' [-Wform

Add an explicit unsigned long cast since ino_t is an unsigned long on
most architectures.

Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2011-09-26 16:21:15 -07:00