Commit graph

2767 commits

Author SHA1 Message Date
Miao Xie
c0f62dedd0 Btrfs: fix wrong mtime and ctime when creating snapshots
When we created a new snapshot, the mtime and ctime of its parent directory
were not updated. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:36 -04:00
Arne Jansen
22cd2e7de7 Btrfs: fix race in run_clustered_refs
With commit

commit d1270cd91f
Author: Arne Jansen <sensille@gmx.net>
Date:   Tue Sep 13 15:16:43 2011 +0200

     Btrfs: put back delayed refs that are too new

I added a window where the delayed_ref's head->ref_mod code can diverge
from the sum of the remaining refs, because we release the head->mutex
in the middle. This leads to btrfs_lookup_extent_info returning wrong
numbers. This patch fixes this by adjusting the head's ref_mod with each
delayed ref we run.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:35 -04:00
Chris Mason
b12a3b1ea2 Btrfs: don't run __tree_mod_log_free_eb on leaves
When we split a leaf, we may end up inserting a new root on top of that
leaf.  The reflog code was incorrectly assuming the old root was always
a node.  This makes sure we skip over leaves.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:34 -04:00
Josef Bacik
6fc823b10f Btrfs: increase the size of the free space cache
Arne was complaining about the space cache having mismatching generation
numbers when debugging a deadlock.  This is because we can run out of space
in our preallocated range for our space cache if you have a pretty
fragmented amount of space in your pinned space.  So just increase the
amount of space we preallocate for space cache so we can be sure to have
enough space.  This will only really affect data ranges since their the only
chunks that end up larger than 256MB.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:34 -04:00
Josef Bacik
66657b318e Btrfs: barrier before waitqueue_active
We need a barrir before calling waitqueue_active otherwise we will miss
wakeups.  So in places that do atomic_dec(); then atomic_read() use
atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
and then add an explicit memory barrier everywhere else that need them.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:33 -04:00
Arne Jansen
1fa11e265f Btrfs: fix deadlock in wait_for_more_refs
Commit a168650c introduced a waiting mechanism to prevent busy waiting in
btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
where a tree_mod_seq is held while waiting for the io to complete, while
the end_io calls btrfs_run_delayed_refs.
This whole mechanism is unnecessary. If not enough runnable refs are
available to satisfy count, just return as count is more like a guideline
than a strict requirement.
In case we have to run all refs, commit transaction makes sure that no
other threads are working in the transaction anymore, so we just assert
here that no refs are blocked.

Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-28 16:53:32 -04:00
Fengguang Wu
6209526531 btrfs: fix second lock in btrfs_delete_delayed_items()
Fix a real bug caught by coccinelle.

fs/btrfs/delayed-inode.c:1013:1-11: second lock on line 1013

Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
2012-08-28 16:53:31 -04:00
Josef Bacik
c329861da4 Btrfs: don't allocate a seperate csums array for direct reads
We've been allocating a big array for csums instead of storing them in the
io_tree like we do for buffered reads because previously we were locking the
entire range, so we didn't have an extent state for each sector of the
range.  But now that we do the range locking as we map the buffers we can
limit the mapping lenght to sectorsize and use the private part of the
io_tree for our csums.  This allows us to avoid an extra memory allocation
for direct reads which could incur latency.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:30 -04:00
Josef Bacik
99f5944b84 Btrfs: do not strdup non existent strings
When we close devices we add back empty devices for some reason that escapes
me.  In the case of a missing dev we don't allocate an rcu_string for it's
name, so check to see if the device has a name and if it doesn't don't
bother strdup()'ing it.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:29 -04:00
Josef Bacik
aa9ddcd4b5 Btrfs: do not use missing devices when showing devname
If you do the following

mkfs.btrfs /dev/sdb /dev/sdc
rmmod btrfs
dd if=/dev/zero of=/dev/sdb bs=1M count=1
mount -o degraded /dev/sdc /mnt/btrfs-test

the box will panic trying to deref the name for the missing dev since it is
the lower numbered devid.  So fix show_devname to not use missing devices.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:29 -04:00
Stefan Behrens
3627bf4503 Btrfs: fix that error value is changed by mistake
In iterate_inodes_from_logical() the error result from
extent_from_logical() is patched by mistake. Typically ENOENT is
patched to EINVAL because (-ENOENT & BTRFS_EXTENT_FLAG_TREE_BLOCK)
evaluates to true.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-28 16:53:28 -04:00
Josef Bacik
eb838e73dc Btrfs: lock extents as we map them in DIO
A deadlock in xfstests 113 was uncovered by commit

d187663ef2

This is because we would not return EIOCBQUEUED for short AIO reads, instead
we'd wait for the DIO to complete and then return the amount of data we
transferred, which would allow our stuff to unlock the remaning amount.  But
with this change this no longer happens, so if we have a short AIO read (for
example if we try to read past EOF), we could leave the section from EOF to
the end of where we tried to read locked.  Fixing this is tricky since there
is no clear way to know exactly how much data DIO truly submitted for IO, so
to make this less hard on ourselves and less combersome we need to lock the
extents as we try to map them, and then we unlock any areas we didn't
actually map.  This makes us completely safe from deadlocks and reliance on
a particular behavior of the DIO code.  This also lays the groundwork for
allowing us to use the normal csum storage method for reads which means we
can remove an allocation.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-08-28 16:53:27 -04:00
Dan Carpenter
dadd1105ca Btrfs: fix some endian bugs handling the root times
"trans->transid" is cpu endian but we want to store the data as little
endian.  "item->ctime.nsec" is only 32 bits, not 64.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:26 -04:00
Dan Carpenter
55e591ffde Btrfs: unlock on error in btrfs_delalloc_reserve_metadata()
We should release this mutex before returning the error code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:25 -04:00
Dan Carpenter
57a5a88203 Btrfs: checking for NULL instead of IS_ERR
add_qgroup_rb() never returns NULL, only error pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:25 -04:00
Dan Carpenter
5986802c2f Btrfs: fix some error codes in btrfs_qgroup_inherit()
These are returning zero when it should be returning a negative error
code.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-28 16:53:24 -04:00
Stefan Behrens
aa2ffd0616 Btrfs: fix a misplaced address operator in a condition
This should obviously not be "if (&flag)" but "if (flag)".

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-08-28 16:53:23 -04:00
Linus Torvalds
15fc5deb1f Merge branch 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs merge fix from Chris Mason:
 "This fixes a merge error in rc1.  The calls to mnt_want_write should
  have been removed."

* 'for-linus-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove mnt_want_write call in btrfs_mksubvol
2012-08-12 21:28:41 +03:00
Alexander Block
e00da2067b Btrfs: remove mnt_want_write call in btrfs_mksubvol
We got a recursive lock in mksubvol because the caller already held
a lock. I think we got into this due to a merge error. Commit a874a63
removed the mnt_want_write call from btrfs_mksubvol and added a
replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid.
Commit e7848683 however tried to move all calls to mnt_want_write above
i_mutex. So somewhere while merging this, it got mixed up. The
solution is to remove the mnt_want_write call completely from
mksubvol.

Reported-by: David Sterba <dave@jikos.cz>
Signed-off-by: Alexander Block <ablock84@googlemail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-08-09 11:01:54 -04:00
Artem Bityutskiy
b257031408 btrfs: nuke pdflush from comments
The pdflush thread is long gone, so this patch removes references to pdflush
from btrfs comments.

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-04 12:15:35 +04:00
Artem Bityutskiy
34eaadaf22 btrfs: nuke write_super from comments
The '->write_super' superblock method is gone, and this patch removes all the
references to 'write_super' from btrfs.

Cc: Chris Mason <chris.mason@fusionio.com>
Cc: linux-btrfs@vger.kernel.org
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-08-04 12:15:35 +04:00
Linus Torvalds
a0e881b7c1 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull second vfs pile from Al Viro:
 "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
  deadlock reproduced by xfstests 068), symlink and hardlink restriction
  patches, plus assorted cleanups and fixes.

  Note that another fsfreeze deadlock (emergency thaw one) is *not*
  dealt with - the series by Fernando conflicts a lot with Jan's, breaks
  userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
  for massive vfsmount leak; this is going to be handled next cycle.
  There probably will be another pull request, but that stuff won't be
  in it."

Fix up trivial conflicts due to unrelated changes next to each other in
drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  delousing target_core_file a bit
  Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
  fs: Remove old freezing mechanism
  ext2: Implement freezing
  btrfs: Convert to new freezing mechanism
  nilfs2: Convert to new freezing mechanism
  ntfs: Convert to new freezing mechanism
  fuse: Convert to new freezing mechanism
  gfs2: Convert to new freezing mechanism
  ocfs2: Convert to new freezing mechanism
  xfs: Convert to new freezing code
  ext4: Convert to new freezing mechanism
  fs: Protect write paths by sb_start_write - sb_end_write
  fs: Skip atime update on frozen filesystem
  fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
  fs: Improve filesystem freezing handling
  switch the protection of percpu_counter list to spinlock
  nfsd: Push mnt_want_write() outside of i_mutex
  btrfs: Push mnt_want_write() outside of i_mutex
  fat: Push mnt_want_write() outside of i_mutex
  ...
2012-08-01 10:26:23 -07:00
Jan Kara
b2b5ef5c8e btrfs: Convert to new freezing mechanism
We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
the transaction mechanism to avoid starting transactions on frozen filesystem.
At minimum this is necessary to stop iput() of unlinked file to change frozen
filesystem during truncation.

Checks in cleaner_kthread() and transaction_kthread() can be safely removed
since btrfs_freeze() will lock the mutexes and thus block the threads (and they
shouldn't have anything to do anyway).

CC: linux-btrfs@vger.kernel.org
CC: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 09:45:52 +04:00
Joe Perches
533574c6bc btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout
Use the generic printk_get_level() to search a message for a kern_level.

Add __printf to verify format and arguments.  Fix a few messages that
had mismatches in format and arguments.  Add #ifdef CONFIG_PRINTK blocks
to shrink the object size a bit when not using printk.

[akpm@linux-foundation.org: whitespace tweak]
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Chris Mason <chris.mason@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-30 17:25:14 -07:00
Jan Kara
e7848683ae btrfs: Push mnt_want_write() outside of i_mutex
When mnt_want_write() starts to handle freezing it will get a full lock
semantics requiring proper lock ordering. So push mnt_want_write() call
consistently outside of i_mutex.

CC: Chris Mason <chris.mason@oracle.com>
CC: linux-btrfs@vger.kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-07-31 01:02:51 +04:00
Stephen Rothwell
a1857ebe75 Btrfs: using vmalloc and friends needs vmalloc.h
On powerpc, we don't get the implicit vmalloc.h include, and as a result
the build fails noisily:

  fs/btrfs/send.c: In function 'fs_path_free':
  fs/btrfs/send.c:185:4: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c: In function 'fs_path_ensure_buf':
  fs/btrfs/send.c:215:4: error: implicit declaration of function 'vmalloc' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c:215:12: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:225:12: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:233:13: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c: In function 'iterate_dir_item':
  fs/btrfs/send.c:900:10: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:909:11: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c: In function 'btrfs_ioctl_send':
  fs/btrfs/send.c:4463:17: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4469:17: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4475:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
  fs/btrfs/send.c:4475:20: warning: assignment makes pointer from integer without a cast [enabled by default]
  fs/btrfs/send.c:4483:21: warning: assignment makes pointer from integer without a cast [enabled by default]

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-07-26 18:08:30 -07:00
Linus Torvalds
e2aed8dfa5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull large btrfs update from Chris Mason:
 "This pull request is very large, and the two main features in here
  have been under testing/devel for quite a while.

  We have subvolume quotas from the strato developers.  This enables
  full tracking of how many blocks are allocated to each subvolume (and
  all snapshots) and you can set limits on a per-subvolume basis.  You
  can also create quota groups and toss multiple subvolumes into a big
  group.  It's everything you need to be a web hosting company and give
  each user their own subvolume.

  The userland side of the quotas is being refreshed, they'll send out
  details on where to grab it soon.

  Next is the kernel side of btrfs send/receive from Alexander Block.
  This leverages the same infrastructure as the quota code to figure out
  relationships between blocks and their owners.  It can then compute
  the difference between two snapshots and sends the diffs in a neutral
  format into userland.

  The basic model:

        create a snapshot
        send that snapshot as the initial backup
        make changes
        create a second snapshot
        send the incremental as a backup
        delete the first snapshot
        (use the second snapshot for the next incremental)

  The receive portion is all in userland, and in the 'next' branch of my
  btrfs-progs repo.

  There's still some work to do in terms of optimizing the send side
  from kernel to userland.  The really important part is figuring out
  how two snapshots are different, and this is where we are
  concentrating right now.  The initial send of a dataset is a little
  slower than tar, but the incremental sends are dramatically faster
  than what rsync can do.

  On top of all of that, we have a nice queue of fixes, cleanups and
  optimizations."

Fix up trivial modify/del conflict in fs/btrfs/ioctl.c

Also fix up semantic conflict in fs/btrfs/send.c: the interface to
dentry_open() changed in commit 765927b2d5 ("switch dentry_open() to
struct path, make it grab references itself"), and since it now grabs
whatever references it needs, we should no longer do the mntget() on the
mnt (and we need to dput() the dentry reference we took).

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (65 commits)
  Btrfs: uninit variable fixes in send/receive
  Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
  Btrfs: add btrfs_compare_trees function
  Btrfs: introduce subvol uuids and times
  Btrfs: make iref_to_path non static
  Btrfs: add a barrier before a waitqueue_active check
  Btrfs: call the ordered free operation without any locks held
  Btrfs: Check INCOMPAT flags on remount and add helper function
  Btrfs: add helper for tree enumeration
  btrfs: allow cross-subvolume file clone
  Btrfs: improve multi-thread buffer read
  Btrfs: make btrfs's allocation smoothly with preallocation
  Btrfs: lock the transition from dirty to writeback for an eb
  Btrfs: fix potential race in extent buffer freeing
  Btrfs: don't return true in releasepage unless we actually freed the eb
  Btrfs: suppress printk() if all device I/O stats are zero
  Btrfs: remove unwanted printk() for btrfs device I/O stats
  Btrfs: rewrite BTRFS_SETGET_FUNCS
  Btrfs: zero unused bytes in inode item
  Btrfs: kill free_space pointer from inode structure
  ...

Conflicts:
	fs/btrfs/ioctl.c
2012-07-26 14:48:55 -07:00
Chris Mason
b24baf6917 Btrfs: uninit variable fixes in send/receive
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 19:21:10 -04:00
Chris Mason
113c1cb530 Merge branch 'send-v2' of git://github.com/ablock84/linux-btrfs into for-linus
This is the kernel portion of btrfs send/receive

Conflicts:
	fs/btrfs/Makefile
	fs/btrfs/backref.h
	fs/btrfs/ctree.c
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 19:19:10 -04:00
Alexander Block
31db9f7c23 Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive
This patch introduces the BTRFS_IOC_SEND ioctl that is
required for send. It allows btrfs-progs to implement
full and incremental sends. Patches for btrfs-progs will
follow.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:30:19 +02:00
Alexander Block
7069830a9e Btrfs: add btrfs_compare_trees function
This function is used to find the differences between
two trees. The tree compare skips whole subtrees if it
detects shared tree blocks and thus is pretty fast.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:30:14 +02:00
Alexander Block
8ea05e3a42 Btrfs: introduce subvol uuids and times
This patch introduces uuids for subvolumes. Each
subvolume has it's own uuid. In case it was snapshotted,
it also contains parent_uuid. In case it was received,
it also contains received_uuid.

It also introduces subvolume ctime/otime/stime/rtime. The
first two are comparable to the times found in inodes. otime
is the origin/creation time and ctime is the change time.
stime/rtime are only valid on received subvolumes.
stime is the time of the subvolume when it was
sent. rtime is the time of the subvolume when it was
received.

Additionally to the times, we have a transid for each
time. They are updated at the same place as the times.

btrfs receive uses stransid and rtransid to find out
if a received subvolume changed in the meantime.

If an older kernel mounts a filesystem with the
extented fields, all fields become invalid. The next
mount with a new kernel will detect this and reset the
fields.

Signed-off-by: Alexander Block <ablock84@googlemail.com>
Reviewed-by: David Sterba <dave@jikos.cz>
Reviewed-by: Arne Jansen <sensille@gmx.net>
Reviewed-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Reviewed-by: Alex Lyakas <alex.bolshoy.btrfs@gmail.com>
2012-07-25 23:28:38 +02:00
Alexander Block
91cb916ca2 Btrfs: make iref_to_path non static
Make iref_to_path non static (needed in send) and rename
it to btrfs_iref_to_path

Signed-off-by: Alexander Block <ablock84@googlemail.com>
2012-07-25 23:25:06 +02:00
Chris Mason
cd1cfc4915 Btrfs: add a barrier before a waitqueue_active check
We were missing wakeups on the delayed ref waitqueue due
to races on waitqueue_active.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:15:08 -04:00
Chris Mason
e9fbcb4220 Btrfs: call the ordered free operation without any locks held
Each ordered operation has a free callback, and this was called with the
worker spinlock held.  Josef made the free callback also call iput,
which we can't do with the spinlock.

This drops the spinlock for the free operation and grabs it again before
moving through the rest of the list.  We'll circle back around to this
and find a cleaner way that doesn't bounce the lock around so much.

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
cc: stable@kernel.org
2012-07-25 16:15:07 -04:00
Mitch Harder
2b0ce2c290 Btrfs: Check INCOMPAT flags on remount and add helper function
In support of the recently added capability to remount with lzo
compression, provide a helper function to check the compression
INCOMPAT flags when remounting with lzo compression, and set
the flags if necessary.

Also, implement the new helper function when defragmenting with
explicit lzo compression and when setting the default subvolume.

Signed-off-by: Mitch Harder <mitch.harder@sabayonlinux.org>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:14:31 -04:00
Chris Mason
b478b2baa3 Merge branch 'qgroup' of git://git.jan-o-sch.net/btrfs-unstable into for-linus
Conflicts:
	fs/btrfs/ioctl.c
	fs/btrfs/ioctl.h
	fs/btrfs/transaction.c
	fs/btrfs/transaction.h

Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-07-25 16:11:38 -04:00
Arne Jansen
e679376911 Btrfs: add helper for tree enumeration
Often no exact match is wanted but just the next lower or
higher item. There's a lot of duplicated code throughout
btrfs to deal with the corner cases. This patch adds a
helper function that can facilitate searching.

Signed-off-by: Arne Jansen <sensille@gmx.net>
2012-07-25 17:33:18 +02:00
David Sterba
362a20c5e2 btrfs: allow cross-subvolume file clone
Lift the EXDEV condition and allow different root trees for files being
cloned, then pass source inode's root when searching for extents.
Cloning is not allowed to cross vfsmounts, ie. when two subvolumes from
one filesystem are mounted separately.

Signed-off-by: David Sterba <dsterba@suse.cz>
2012-07-25 17:33:09 +02:00
Linus Torvalds
d14b7a419a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Trivial updates all over the place as usual."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (29 commits)
  Fix typo in include/linux/clk.h .
  pci: hotplug: Fix typo in pci
  iommu: Fix typo in iommu
  video: Fix typo in drivers/video
  Documentation: Add newline at end-of-file to files lacking one
  arm,unicore32: Remove obsolete "select MISC_DEVICES"
  module.c: spelling s/postition/position/g
  cpufreq: Fix typo in cpufreq driver
  trivial: typo in comment in mksysmap
  mach-omap2: Fix typo in debug message and comment
  scsi: aha152x: Fix sparse warning and make printing pointer address more portable.
  Change email address for Steve Glendinning
  Btrfs: fix typo in convert_extent_bit
  via: Remove bogus if check
  netprio_cgroup.c: fix comment typo
  backlight: fix memory leak on obscure error path
  Documentation: asus-laptop.txt references an obsolete Kconfig item
  Documentation: ManagementStyle: fixed typo
  mm/vmscan: cleanup comment error in balance_pgdat
  mm: cleanup on the comments of zone_reclaim_stat
  ...
2012-07-24 13:34:56 -07:00
Liu Bo
67c9684f48 Btrfs: improve multi-thread buffer read
While testing with my buffer read fio jobs[1], I find that btrfs does not
perform well enough.

Here is a scenario in fio jobs:

We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
and all of them will race on add_to_page_cache_lru(), and if one thread
successfully puts its page into the page cache, it takes the responsibility
to read the page's data.

And what's more, reading a page needs a period of time to finish, in which
other threads can slide in and process rest pages:

     t1          t2          t3          t4
   add Page1
   read Page1  add Page2
     |         read Page2  add Page3
     |            |        read Page3  add Page4
     |            |           |        read Page4
-----|------------|-----------|-----------|--------
     v            v           v           v
    bio          bio         bio         bio

Now we have four bios, each of which holds only one page since we need to
maintain consecutive pages in bio.  Thus, we can end up with far more bios
than we need.

Here we're going to
a) delay the real read-page section and
b) try to put more pages into page cache.

With that said, we can make each bio hold more pages and reduce the number
of bios we need.

Here is some numbers taken from fio results:
         w/o patch                 w patch
       -------------  --------  ---------------
READ:    745MB/s        +25%       934MB/s

[1]:
[global]
group_reporting
thread
numjobs=4
bs=32k
rw=read
ioengine=sync
directory=/mnt/btrfs/

[READ]
filename=foobar
size=2000M
invalidate=1

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:10 -04:00
Liu Bo
df57dbe6bf Btrfs: make btrfs's allocation smoothly with preallocation
For backref walking, we've introduce delayed ref's sequence.  However,
it changes our preallocation behavior.

The story is that when we preallocate an extent and then mark it written
piece by piece, the ideal case should be that we don't need to COW the
extent, which is why we use 'preallocate'.

But we may not make use of preallocation, since when we check for cross refs on
the extent, we may have two ref entries which have the same content except
the sequence value, and we recognize them as cross refs and do COW to allocate
another extent.

So we end up with several pieces of space instead of an whole extent.

Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com>
Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:10 -04:00
Josef Bacik
51561ffec9 Btrfs: lock the transition from dirty to writeback for an eb
There is a small window where an eb can have no IO bits set on it, which
could potentially result in extent_buffer_under_io() returning false when we
want it to return true, which could result in not fun things happening.  So
in order to protect this case we need to hold the refs_lock when we make
this transition to make sure we get reliable results out of
extent_buffer_udner_io().  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:09 -04:00
Josef Bacik
594831c4b2 Btrfs: fix potential race in extent buffer freeing
This sounds sort of impossible but it is the only thing I can think of and
at the very least it is theoretically possible so here it goes.

If we are in try_release_extent_buffer we will check that the ref count on
the extent buffer is 1 and not under IO, and then go down and clear the tree
ref.  If between this check and clearing the tree ref somebody else comes in
and grabs a ref on the eb and the marks it dirty before
try_release_extent_buffer() does it's tree ref clear we can end up with a
dirty eb that will be freed while it is still dirty which will result in a
panic.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:09 -04:00
Josef Bacik
e64860aa05 Btrfs: don't return true in releasepage unless we actually freed the eb
I noticed while looking at an extent_buffer race that we will
unconditionally return 1 if we get down to release_extent_buffer after
clearing the tree ref.  However we can easily race in here and get a ref on
the eb and not actually free the eb.  So make release_extent_buffer return 1
if it free'd the eb and 0 if not so we can be a little kinder to the vm.
Thanks,

Signed-off-by: Josef Bacik <jbacik@fusionio.com>
2012-07-23 16:28:08 -04:00
Stefan Behrens
a98cdb85b9 Btrfs: suppress printk() if all device I/O stats are zero
Code is added to suppress the I/O stats printing at mount time if all
statistic values are zero.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-07-23 16:28:07 -04:00
Stefan Behrens
5021976d8d Btrfs: remove unwanted printk() for btrfs device I/O stats
People complained about the annoying kernel log message
"btrfs: no dev_stats entry found ... (OK on first mount after mkfs)"
everytime a filesystem is mounted for the first time after running
mkfs. Since the distribution of the btrfs-progs is not synchronized
to the kernel version, mkfs like it is now will be used also in the
future. Then this message is not useful to find errors, it is just
annoying. This commit removes the printk().

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
2012-07-23 16:28:07 -04:00
Li Zefan
18077bb413 Btrfs: rewrite BTRFS_SETGET_FUNCS
BTRFS_SETGET_FUNCS macro is used to generate btrfs_set_foo() and
btrfs_foo() functions, which read and write specific fields in the
extent buffer.

The total number of set/get functions is ~200, but in fact we only
need 8 functions: 2 for u8 field, 2 for u16, 2 for u32 and 2 for u64.

It results in redunction of ~37K bytes.

   text    data     bss     dec     hex filename
 629661   12489     216  642366   9cd3e fs/btrfs/btrfs.o.orig
 592637   12489     216  605342   93c9e fs/btrfs/btrfs.o

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:06 -04:00
Li Zefan
293f7e0740 Btrfs: zero unused bytes in inode item
The otime field is not zeroed, so users will see random otime in an old
filesystem with a new kernel which has otime support in the future.

The reserved bytes are also not zeroed, and we'll have compatibility
issue if we make use of those bytes.

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:05 -04:00
Li Zefan
b4d7c3c945 Btrfs: kill free_space pointer from inode structure
Inodes always allocate free space with BTRFS_BLOCK_GROUP_DATA type,
which means every inode has the same BTRFS_I(inode)->free_space pointer.

This shrinks struct btrfs_inode by 4 bytes (or 8 bytes on 64 bits).

Signed-off-by: Li Zefan <lizefan@huawei.com>
2012-07-23 16:28:05 -04:00