Commit graph

30039 commits

Author SHA1 Message Date
Lukas Czerner
b8b8ff590f btrfs: Notify udev when removing device
Currently udev does not know about the device being removed from the
file system. This may result in the situation where we're unable to
mount the file system by UUID or by LABEL because the by-uuid and
by-label links may still point to the device which is no longer part of
the btrfs file system and hence does not have any btrfs super block.

It can be easily reproduced by the following:

mkfs.btrfs -L bugfs /dev/loop[0-6]
mount /dev/loop0 /mnt/test
btrfs device delete /dev/loop0 /mnt/test
umount /mnt/test

mount LABEL=bugfs /mnt/test <---- this fails

then see:

ls -l /dev/disk/by-label/bugfs

which will still point to the /dev/loop0

We did not noticed this before because libblkid would send the udev
event for us when it notice that the link does not fit the reality,
however it does not do that anymore and completely relies on udev
information.

Fix this by sending the KOBJ_CHANGE event to the bdev kobject after
successful device removal.

Note that this does not affect device addition, because we will open the
device prior the addition from userspace and udev will notice that and
reread the device afterwards.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:21 -05:00
Miao Xie
ac6a2b36f9 Btrfs: fix wrong return value of btrfs_truncate_page()
ret variant may be set to 0 if we read page successfully, but it might be
released before we lock it again. On this case, if we fail to allocate a
new page, we will return 0, it is wrong, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:20 -05:00
Miao Xie
7426cc04d4 Btrfs: punch hole past the end of the file
Since we can pre-allocate the space past EOF, we should be able to reclaim
that space if we need. This patch implements it by removing the EOF check.

Though the manual of fallocate command says we can use truncate command to
reclaim the pre-allocated space which past EOF, but because truncate command
changes the file size, we must run several commands to reclaim the space if we
don't want to change the file size, so it is not a good choice.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:20 -05:00
Miao Xie
0061280d2c Btrfs: fix the page that is beyond EOF
Steps to reproduce:
 # mkfs.btrfs <disk>
 # mount <disk> <mnt>
 # dd if=/dev/zero of=<mnt>/<file> bs=512 seek=5 count=8
 # fallocate -p -o 2048 -l 16384 <mnt>/<file>
 # dd if=/dev/zero of=<mnt>/<file> bs=4096 seek=3 count=8 conv=notrunc,nocreat
 # umount <mnt>
 # dmesg
 WARNING: at fs/btrfs/inode.c:7140 btrfs_destroy_inode+0x2eb/0x330

The reason is that we inputed a range which is beyond the end of the file. And
because the end of this range was not page-aligned, we had to truncate the last
page in this range, this operation is similar to a buffered file write. In other
words, we reserved enough space and clear the data which was in the hole range
on that page. But when we expanded that test file, write the data into the same
page, we forgot that we have reserved enough space for the buffered write of
that page because in most cases there is no page that is beyond the end of
the file. As a result, we reserved the space twice.

In fact, we needn't truncate the page if it is beyond the end of the file, just
release the allocated space in that range. Fix the above problem by this way.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:19 -05:00
Miao Xie
6347b3c433 Btrfs: fix off-by-one error of the same page check in btrfs_punch_hole()
(start + len) is the start of the adjacent extent, not the end of the current
extent, so we should not use it to check the hole is on the same page or not.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:19 -05:00
Miao Xie
4b5829a8e3 Btrfs: fix missing reserved space release in error path of delalloc reservation
We forget to release the reserved space in the error path of delalloc
reservatiom, fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:18 -05:00
Miao Xie
543eabd5e1 Btrfs: don't auto defrag a file when doing directIO
If we runt the direct IO, we should not run auto defrag, because it may
introduce buffered IO vs direcIO problem, and make direct IO slow down.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:18 -05:00
Wang Sheng-Hui
960097622d Btrfs: use ctl->unit for free space calculation instead of block_group->sectorsize
We should use ctl->unit for free space calculation instead of block_group->sectorsize
even though for free space use_bitmap or free space cluster we only have sectorsize assigned to ctl->unit currently. Also, we can keep it consisten in code style.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:17 -05:00
Filipe Brandenburger
43baa579b3 Btrfs: refactor error handling to drop inode in btrfs_create()
Refactor it by checking whether the inode has been created and needs to be
dropped (drop_inode_on_err) and also if the err variable is set. That way the
variable doesn't need to be set on each and every error handling block.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:17 -05:00
Filipe Brandenburger
2794ed013b Btrfs: fix permissions of empty files not affected by umask
When a new file is created with btrfs_create(), the inode will initially be
created with permissions 0666 and later on in btrfs_init_acl() it will be
adapted to mask out the umask bits. The problem is that this change won't make
it into the btrfs_inode unless there's another change to the inode (e.g. writing
content changing the size or touching the file changing the mtime.)

This fix adds a call to btrfs_update_inode() to btrfs_create() to make sure that
the change will not get lost if the in-memory inode is flushed before other
changes are made to the file.

Signed-off-by: Filipe Brandenburger <filbranden@google.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:16 -05:00
Tsutomu Itoh
05dadc09f5 Btrfs: add fiemap's flag check
When the flag not supported is specified, it is necessary to return the error
to the caller.
So, we add the validity check of the fiemap's flag.

Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:16 -05:00
Liu Bo
01e6deb25a Btrfs: don't add a NULL extended attribute
Passing a null extended attribute value means to remove the attribute,
but we don't have to add a new NULL extended attribute.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:15 -05:00
Liu Bo
755ac67f83 Btrfs: skip adding an acl attribute if we don't have to
If the acl can be exactly represented in the traditional file
mode permission bits, we don't set another acl attribute.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:15 -05:00
Miao Xie
0ff6fabdb0 Btrfs: fix off-by-one error of the reserved size of btrfs_allocate()
alloc_end is not the real end of the current extent, it is the start of the
next adjoining extent. So we needn't +1 when calculating the size the space
that is about to be reserved.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:15 -05:00
Miao Xie
797f427711 Btrfs: use existing align macros in btrfs_allocate()
The kernel developers have implemented some often-used align macros, we should
use them instead of the complex code.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:14 -05:00
Stefan Behrens
af1be4f851 Btrfs: fix a scrub regression in case of write errors
This regression was introduced by the device-replace patches.
Scrub immediately stops checking those disks that have write errors.
This is nothing that happens in the real world, but it is wrong
since scrub is the tool to detect and repair defects. Fix it.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:14 -05:00
Stefan Behrens
f9c83748de Btrfs: fix a build warning for an unused label
This issue was detected by the "0-DAY kernel build testing".

fs/btrfs/volumes.c: In function 'btrfs_rm_device':
fs/btrfs/volumes.c:1505:1: warning: label 'error_close' defined but not used [-Wunused-label]

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:13 -05:00
Stefan Behrens
cb3806ec88 Btrfs: fix race in check-integrity caused by usage of bitfield
The structure member mirror_num is modified concurrently to the
structure member is_iodone. This doesn't require any locking by
design, unless everything is stored in the same 32 bits of a
bit field. This was the case and xfstest 284 was able to
trigger false warnings from the checker code. This patch
seperates the bits and fixes the race.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:13 -05:00
Miao Xie
b66f00da0c Btrfs: fix freeze vs auto defrag
If we freeze the fs, the auto defragment should not run. Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:12 -05:00
Miao Xie
26176e7c2a Btrfs: restructure btrfs_run_defrag_inodes()
This patch restructure btrfs_run_defrag_inodes() and make the code of the auto
defragment more readable.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:12 -05:00
Miao Xie
8ddc473433 Btrfs: fix unprotected defragable inode insertion
We forget to get the defrag lock when we re-add the defragable inode,
Fix it.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:12 -05:00
Miao Xie
9247f3170b Btrfs: use slabs for auto defrag allocation
The auto defrag allocation is in the fast path of the IO, so use slabs
to improve the speed of the allocation.

And besides that, it can do check for leaked objects when the module is removed.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:11 -05:00
Miao Xie
905b0dda06 Btrfs: get write access for qgroup operations
We need get write access for qgroup operations, or we will modify the R/O fs.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:11 -05:00
Miao Xie
b8e95489bf Btrfs: get write access for scrub
We need get write access for scrub, or we will modify the R/O fs.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:10 -05:00
Miao Xie
da24927b1e Btrfs: get write access when removing a device
Steps to reproduce:
 # mkfs.btrfs -d single -m single <disk0> <disk1>
 # mount -o ro <disk0> <mnt0>
 # mount -o ro <disk0> <mnt1>
 # mount -o remount,rw <mnt0>
 # umount <mnt0>
 # btrfs device delete <disk1> <mnt1>

We can remove a device from a R/O filesystem. The reason is that we just check
the R/O flag of the super block object. It is not enough, because the kernel
may set the R/O flag only for the mount point. We need invoke

	mnt_want_write_file()

to do a full check.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:09 -05:00
Miao Xie
198605a8e2 Btrfs: get write access when doing resize fs
Steps to reproduce:
 # mkfs.btrfs <partition>
 # mount -o ro <partition> <mnt0>
 # mount -o ro <partition> <mnt1>
 # mount -o remount,rw <mnt0>
 # umount <mnt0>
 # btrfs fi resize 10g <mnt1>

We re-sized a R/O filesystem. The reason is that we just check the R/O flag
of the super block object. It is not enough, because the kernel may set the
R/O flag only for the mount point. We need invoke mnt_want_write_file() to
do a full check.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:09 -05:00
Miao Xie
3c04ce0105 Btrfs: get write access when setting the default subvolume
When wen want to set the default subvolume, we must get write access, or
we will change the R/O file system.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:09 -05:00
Miao Xie
8cd2807f79 Btrfs: fix wrong return value of btrfs_wait_for_commit()
If the id of the existed transaction is more than the one we specified, it
means the specified transaction was commited, so we should return 0, not
EINVAL.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:08 -05:00
Miao Xie
ff7c1d3355 Btrfs: don't start a new transaction when starting sync
If there is no running transaction in the fs, we needn't start a new one when
we want to start sync.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:08 -05:00
Miao Xie
9a8c28bec1 Btrfs: pass root object into btrfs_ioctl_{start, wait}_sync()
Since we have gotten the root in the caller, just pass it into
btrfs_ioctl_{start, wait}_sync() directly.

Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:07 -05:00
Liu Bo
db2254bce4 Btrfs: fix an while-loop of listxattr
If we found an invalid xattr dir item, we'd better try the next one instead.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:07 -05:00
Wang Sheng-Hui
071401258a Btrfs: do not warn_on io_ctl->cur in io_ctl_map_page
io_ctl_map_page is called by many functions in free-space-cache.
In most scenarios, the ->cur is not null, e.g. io_ctl_add_entry.
I think we'd better remove the warn_on here.

Signed-off-by: Wang Sheng-Hui <shhuiw@gmail.com>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:06 -05:00
Stefan Behrens
3f6bcfbd41 Btrfs: add support for device replace ioctls
This is the commit that allows to start the device replace
procedure.

An ioctl() interface is added that supports starting and
canceling the device replace procedure, and to retrieve
the status and progress.

Signed-off-by: Stefan Behrens <sbehrens@giantdisaster.de>
Signed-off-by: Chris Mason <chris.mason@fusionio.com>
2012-12-16 20:46:06 -05:00
Linus Torvalds
36cd5c19c3 There are two major features for this merge window. The first is
inline data, which allows small files or directories to be stored in
 the in-inode extended attribute area.  (This requires that the file
 system use inodes which are at least 256 bytes or larger; 128 byte
 inodes do not have any room for in-inode xattrs.)
 
 The second new feature is SEEK_HOLE/SEEK_DATA support.  This is
 enabled by the extent status tree patches, and this infrastructure
 will be used to further optimize ext4 in the future.
 
 Beyond that, we have the usual collection of code cleanups and bug
 fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJQzTaLAAoJENNvdpvBGATwpqEQAM0WO9Kva3R8SoaD6NYOg4lN
 8oxRlht6yogSd6wwYZm1c4YF9UrhloS9kHyWcH3Wmr9fhM5vig1ec12eDsDGrjBc
 Wb+x+YrmczSJzK380JLxmYnVSXQVFl7/hNqaRowffTOJwgySmp8oLrI88ZcaCmVU
 +qWG2x6eVhCEQrpin9Mv3D6pHkx2hfg9w5sB0K+kpgsdjqLZsmPRmxU9nx0nEJYC
 gmbpo8Dcsfqra6DJosQGo7eFq7J3fm9v1ql+QOxOjc9/zD2XwdQE1JZImehvno5i
 Ekwr9771fsw34/QHJebYRC/OkftmOn4OPuQejd+AKNdBR4mO8G/AsLCroD17uLNi
 NrtMkE6ecJPb3SflarZruNYTUhJfj3H6V9P/8wggpyPzT3l19sqP+2F6GwZspZiV
 EJb2iTKn0Phc2OD1MqO9gFP0g+IMH0kktYdxEf0V2QOQqhQHnPwxF+2Tp6bVQcQs
 KCetN37y60qJ+zKH9xukcXmWQJvnjgmWqZqpomoA4lrwgKazTNDJJ+R+N+r5HKMj
 5cz2ntAhF8FfPhqVf+8DHgjKNUwm6C++O1+Lb9swZ0FkFi5Ob3OlwWaC75Gf4H+P
 2DslBapfM79bX14a9BKaBjly5FsAha7OzR+xo0MZN+fEcMLEk33kcRovcY8DHqxU
 aadriOatYYixvSZ5lL3m
 =aNOf
 -----END PGP SIGNATURE-----

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 update from Ted Ts'o:
 "There are two major features for this merge window.  The first is
  inline data, which allows small files or directories to be stored in
  the in-inode extended attribute area.  (This requires that the file
  system use inodes which are at least 256 bytes or larger; 128 byte
  inodes do not have any room for in-inode xattrs.)

  The second new feature is SEEK_HOLE/SEEK_DATA support.  This is
  enabled by the extent status tree patches, and this infrastructure
  will be used to further optimize ext4 in the future.

  Beyond that, we have the usual collection of code cleanups and bug
  fixes."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (63 commits)
  ext4: zero out inline data using memset() instead of empty_zero_page
  ext4: ensure Inode flags consistency are checked at build time
  ext4: Remove CONFIG_EXT4_FS_XATTR
  ext4: remove unused variable from ext4_ext_in_cache()
  ext4: remove redundant initialization in ext4_fill_super()
  ext4: remove redundant code in ext4_alloc_inode()
  ext4: use sync_inode_metadata() when syncing inode metadata
  ext4: enable ext4 inline support
  ext4: let fallocate handle inline data correctly
  ext4: let ext4_truncate handle inline data correctly
  ext4: evict inline data out if we need to strore xattr in inode
  ext4: let fiemap work with inline data
  ext4: let ext4_rename handle inline dir
  ext4: let empty_dir handle inline dir
  ext4: let ext4_delete_entry() handle inline data
  ext4: make ext4_delete_entry generic
  ext4: let ext4_find_entry handle inline data
  ext4: create a new function search_dir
  ext4: let ext4_readdir handle inline data
  ext4: let add_dir_entry handle inline data properly
  ...
2012-12-16 17:33:01 -08:00
Linus Torvalds
2a74dbb9a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security
Pull security subsystem updates from James Morris:
 "A quiet cycle for the security subsystem with just a few maintenance
  updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  Smack: create a sysfs mount point for smackfs
  Smack: use select not depends in Kconfig
  Yama: remove locking from delete path
  Yama: add RCU to drop read locking
  drivers/char/tpm: remove tasklet and cleanup
  KEYS: Use keyring_alloc() to create special keyrings
  KEYS: Reduce initial permissions on keys
  KEYS: Make the session and process keyrings per-thread
  seccomp: Make syscall skipping and nr changes more consistent
  key: Fix resource leak
  keys: Fix unreachable code
  KEYS: Add payload preparsing opportunity prior to key instantiate or update
2012-12-16 15:40:50 -08:00
Trond Myklebust
ada8e20d04 NFS: Don't use SetPageError in the NFS writeback code
The writeback code is already capable of passing errors back to user space
by means of the open_context->error. In the case of ENOSPC, Neil Brown
is reporting seeing 2 errors being returned.

Neil writes:

"e.g. if /mnt2/ if an nfs mounted filesystem that has no space then

strace dd if=/dev/zero conv=fsync >> /mnt2/afile count=1

reported Input/output error and the relevant parts of the strace output are:

write(1, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512
fsync(1)                                = -1 EIO (Input/output error)
close(1)                                = -1 ENOSPC (No space left on device)"

Neil then shows that the duplication of error messages appears to be due to
the use of the PageError() mechanism, which causes filemap_fdatawait_range
to return the extra EIO. The regression was introduced by
commit 7b281ee026 (NFS: fsync() must exit
with an error if page writeback failed).

Fix this by removing the call to SetPageError(), and just relying on
open_context->error reporting the ENOSPC back to fsync().

Reported-by: Neil Brown <neilb@suse.de>
Tested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [3.6+]
2012-12-15 17:12:14 -05:00
Linus Torvalds
75e300c8ba Just a couple of fixes, nothing extraordinary.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABAgAGBQJQyVFxAAoJEGgI9fZJve1bbJcQAJciSd1cb3e545sgUF4wjFXL
 RN/yYnlqytGGhEV/wSDMLrCCheReYeHL3nLnbG9MezF6dzmTik67xaQSjiZ5WvfY
 OoQKT816sWYV6S6POhBkNXGmPYxfP+A5fSpZeSFDGu5gXk+Gl0ytHS1X1sWOpRw+
 cUUzB7D3+XbHrpFj23v7z++4A80hOtWHxrBfmdCX9JM0iP+0uiO+JLoO5Av0KhJw
 UU+lkmnlZRDQZuqKyAXO74V0Vu8Ze1u3a+aOuBRwLzFmomrBhdH3AHpBTTXc/nTh
 /mep23lr78pBsatemn3hDW1CH+41WmCeNWzxv2y9JJR6/MGV48QPzR6mFkPMKSf1
 FiKSsge03/wQ0H6mDSXs9eV9g1+it47/hE8uSjh+ZvbiBHzwrE9v+t27jVu6wMa9
 oWLYTqTQokHqUOvKKsXDx4pF/rF6sIRRytHybtmAHVYDbuyVLIsufro6FPKxGlpE
 z7zYciojWEQzsHweOC7mrQYqaJagReapObASF5G0vK5XFvSB87wwda5AXQHvHBq0
 mawc2DP5HSlmcb7KGjaqYDBNJj1ueUzFNBbnMab+ITx/rzitM/henPL7VsmOKXrc
 HRM4TA7oYW+zZbkSdOL56CmLWcWBuwIVAhOk6Ax71PtvqNzLKu0Z/GBA+fWwzjOL
 bsxQJMYniu0Fvyh5VkYD
 =0rWI
 -----END PGP SIGNATURE-----

Merge tag 'for-v3.8' of git://git.infradead.org/users/cbou/linux-pstore

Pull pstore update from Anton Vorontsov:
 "Here are just a few fixups for the pstore subsystem, nothing special
  this time"

* tag 'for-v3.8' of git://git.infradead.org/users/cbou/linux-pstore:
  pstore/ftrace: Adjust for ftrace_ops->func prototype change
  pstore/ram: Fix bounds checks for mem_size, record_size, console_size and ftrace_size
  pstore/ram: Fix undefined usage of rounddown_pow_of_two(0)
  pstore/ram: Fixup section annotations
2012-12-15 12:51:50 -08:00
Trond Myklebust
ac20d163fc NFSv4.1: Deal effectively with interrupted RPC calls.
If an RPC call is interrupted, assume that the server hasn't processed
the RPC call so that the next time we use the slot, we know that if we
get a NFS4ERR_SEQ_MISORDERED or NFS4ERR_SEQ_FALSE_RETRY, we just have
to bump the sequence number.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 15:39:59 -05:00
Linus Torvalds
08242bc221 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw
Pull GFS2 updates from Steven Whitehouse:
 "The main feature this time is the new Orlov allocator and the patches
  leading up to it which allow us to allocate new inodes from their own
  allocation context, rather than borrowing that of their parent
  directory.  It is this change which then allows us to choose a
  different location for subdirectories when required.  This works
  exactly as per the ext3 implementation from the users point of view.

  In addition to that, we've got a speed up in gfs2_rbm_from_block()
  from Bob Peterson, three locking related improvements from Dave
  Teigland plus a selection of smaller bug fixes and clean ups."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-nmw:
  GFS2: Set gl_object during inode create
  GFS2: add error check while allocating new inodes
  GFS2: don't reference inode's glock during block allocation trace
  GFS2: remove redundant lvb pointer
  GFS2: only use lvb on glocks that need it
  GFS2: skip dlm_unlock calls in unmount
  GFS2: Fix one RG corner case
  GFS2: Eliminate redundant buffer_head manipulation in gfs2_unlink_inode
  GFS2: Use dirty_inode in gfs2_dir_add
  GFS2: Fix truncation of journaled data files
  GFS2: Add Orlov allocator
  GFS2: Use proper allocation context for new inodes
  GFS2: Add test for resource group congestion status
  GFS2: Rename glops go_xmote_th to go_sync
  GFS2: Speed up gfs2_rbm_from_block
  GFS2: Review bug traps in glops.c
2012-12-15 12:34:21 -08:00
Trond Myklebust
8e63b6a8ad NFSv4.1: Move the RPC timestamp out of the slot.
Shave a few bytes off the slot table size by moving the RPC timestamp
into the sequence results.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 15:21:52 -05:00
Trond Myklebust
e879444084 NFSv4.1: Try to deal with NFS4ERR_SEQ_MISORDERED.
If the server returns NFS4ERR_SEQ_MISORDERED, it could be a sign
that the slot was retired at some point. Retry the attempt after
reinitialising the slot sequence number to 1.

Also add a handler for NFS4ERR_SEQ_FALSE_RETRY. Just bump the slot
sequence number and retry...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-15 14:49:09 -05:00
Eric W. Biederman
5e4a08476b userns: Require CAP_SYS_ADMIN for most uses of setns.
Andy Lutomirski <luto@amacapital.net> found a nasty little bug in
the permissions of setns.  With unprivileged user namespaces it
became possible to create new namespaces without privilege.

However the setns calls were relaxed to only require CAP_SYS_ADMIN in
the user nameapce of the targed namespace.

Which made the following nasty sequence possible.

pid = clone(CLONE_NEWUSER | CLONE_NEWNS);
if (pid == 0) { /* child */
	system("mount --bind /home/me/passwd /etc/passwd");
}
else if (pid != 0) { /* parent */
	char path[PATH_MAX];
	snprintf(path, sizeof(path), "/proc/%u/ns/mnt");
	fd = open(path, O_RDONLY);
	setns(fd, 0);
	system("su -");
}

Prevent this possibility by requiring CAP_SYS_ADMIN
in the current user namespace when joing all but the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-12-14 16:12:03 -08:00
Trond Myklebust
65a0c14954 NFS: nfs_lookup_revalidate should not trust an inode with i_nlink == 0
If the inode has no links, then we should force a new lookup.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14 17:51:40 -05:00
Trond Myklebust
1f018458b3 NFS: Fix calls to drop_nlink()
It is almost always wrong for NFS to call drop_nlink() after removing a
file. What we really want is to mark the inode's attributes for
revalidation, and we want to ensure that the VFS drops it if we're
reasonably sure that this is the final unlink().
Do the former using the usual cache validity flags, and the latter
by testing if inode->i_nlink == 1, and clearing it in that case.

This also fixes the following warning reported by Neil Brown and
Jeff Layton (among others).

[634155.004438] WARNING:
at /home/abuild/rpmbuild/BUILD/kernel-desktop-3.5.0/lin [634155.004442]
Hardware name: Latitude E6510 [634155.004577]  crc_itu_t crc32c_intel
snd_hwdep snd_pcm snd_timer snd soundcor [634155.004609] Pid: 13402, comm:
bash Tainted: G        W    3.5.0-36-desktop # [634155.004611] Call Trace:
[634155.004630]  [<ffffffff8100444a>] dump_trace+0xaa/0x2b0
[634155.004641]  [<ffffffff815a23dc>] dump_stack+0x69/0x6f
[634155.004653]  [<ffffffff81041a0b>] warn_slowpath_common+0x7b/0xc0
[634155.004662]  [<ffffffff811832e4>] drop_nlink+0x34/0x40
[634155.004687]  [<ffffffffa05bb6c3>] nfs_dentry_iput+0x33/0x70 [nfs]
[634155.004714]  [<ffffffff8118049e>] dput+0x12e/0x230
[634155.004726]  [<ffffffff8116b230>] __fput+0x170/0x230
[634155.004735]  [<ffffffff81167c0f>] filp_close+0x5f/0x90
[634155.004743]  [<ffffffff81167cd7>] sys_close+0x97/0x100
[634155.004754]  [<ffffffff815c3b39>] system_call_fastpath+0x16/0x1b
[634155.004767]  [<00007f2a73a0d110>] 0x7f2a73a0d10f

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org [3.3+]
2012-12-14 17:45:11 -05:00
Trond Myklebust
eed9935745 NFS: Ensure that we always drop inodes that have been marked as stale
There is no need to cache stale inodes.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2012-12-14 14:36:36 -05:00
Boaz Harrosh
861d66601a exofs: don't leak io_state and pages on read error
Same bug as fixed by Idan for write_exec was in read_exec.
Fix the io_state leak and pages state on read error.

Also while at it:
The if (!pcol->read_4_write) at the error path is redundant
because all goto err; are after the if (pcol->read_4_write)
bale out.

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
2012-12-14 12:17:32 +02:00
Linus Torvalds
15de059927 Merge branch 'autofs' (patches from Ian Kent)
Merge emailed autofs cleanup/fix patches from Ian Kent

* autofs:
  autofs4 - use simple_empty() for empty directory check
  autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount
2012-12-13 19:13:37 -08:00
Ian Kent
0259cb02c4 autofs4 - use simple_empty() for empty directory check
For direct (and offset) mounts, if an automounted mount is manually
umounted the trigger mount dentry can appear non-empty causing it to
not trigger mounts. This can also happen if there is a file handle
leak in a user space automounting application.

This happens because, when a ioctl control file handle is opened
on the mount, a cursor dentry is created which causes list_empty()
to see the dentry as non-empty. Since there is a case where listing
the directory of these dentrys is needed, the use of dcache_dir_*()
functions for .open() and .release() is needed.

Consequently simple_empty() must be used instead of list_empty()
when checking for an empty directory.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13 19:13:25 -08:00
Ian Kent
f55fb0c243 autofs4 - dont clear DCACHE_NEED_AUTOMOUNT on rootless mount
The DCACHE_NEED_AUTOMOUNT flag is cleared on mount and set on expire
for autofs rootless multi-mount dentrys to prevent unnecessary calls
to ->d_automount().

Since DCACHE_MANAGE_TRANSIT is always set on autofs dentrys ->d_managed()
is always called so the check can be done in ->d_manage() without the
need to change the flag. This still avoids unnecessary calls to
->d_automount(), adds negligible overhead and eliminates a seriously
ugly check in the expire code.

Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-12-13 19:13:25 -08:00
Linus Torvalds
f6e858a00a Merge branch 'akpm' (Andrew's patch-bomb)
Merge misc VM changes from Andrew Morton:
 "The rest of most-of-MM.  The other MM bits await a slab merge.

  This patch includes the addition of a huge zero_page.  Not a
  performance boost but it an save large amounts of physical memory in
  some situations.

  Also a bunch of Fujitsu engineers are working on memory hotplug.
  Which, as it turns out, was badly broken.  About half of their patches
  are included here; the remainder are 3.8 material."

However, this merge disables CONFIG_MOVABLE_NODE, which was totally
broken.  We don't add new features with "default y", nor do we add
Kconfig questions that are incomprehensible to most people without any
help text.  Does the feature even make sense without compaction or
memory hotplug?

* akpm: (54 commits)
  mm/bootmem.c: remove unused wrapper function reserve_bootmem_generic()
  mm/memory.c: remove unused code from do_wp_page()
  asm-generic, mm: pgtable: consolidate zero page helpers
  mm/hugetlb.c: fix warning on freeing hwpoisoned hugepage
  hwpoison, hugetlbfs: fix RSS-counter warning
  hwpoison, hugetlbfs: fix "bad pmd" warning in unmapping hwpoisoned hugepage
  mm: protect against concurrent vma expansion
  memcg: do not check for mm in __mem_cgroup_count_vm_event
  tmpfs: support SEEK_DATA and SEEK_HOLE (reprise)
  mm: provide more accurate estimation of pages occupied by memmap
  fs/buffer.c: remove redundant initialization in alloc_page_buffers()
  fs/buffer.c: do not inline exported function
  writeback: fix a typo in comment
  mm: introduce new field "managed_pages" to struct zone
  mm, oom: remove statically defined arch functions of same name
  mm, oom: remove redundant sleep in pagefault oom handler
  mm, oom: cleanup pagefault oom handler
  memory_hotplug: allow online/offline memory to result movable node
  numa: add CONFIG_MOVABLE_NODE for movable-dedicated node
  mm, memcg: avoid unnecessary function call when memcg is disabled
  ...
2012-12-13 13:11:15 -08:00