Commit graph

30972 commits

Author SHA1 Message Date
J. Bruce Fields
4f4a4fadde nfsd: handle vfs_getattr errors in acl protocol
We're currently ignoring errors from vfs_getattr.

The correct thing to do is to do the stat in the main service procedure
not in the response encoding.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:09 -05:00
Al Viro
3dadecce20 switch vfs_getattr() to struct path
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00
Sage Weil
79f9f99ad1 ceph: prepopulate inodes only when request is aborted
If r_aborted is true, we do not hold the dir i_mutex, and cannot touch
the dcache.  However, we still need to update the inodes with the state
returned by the MDS.

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Sage Weil <sage@inktank.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:08 -05:00
Al Viro
4f522a247b d_hash_and_lookup(): export, switch open-coded instances
* calling conventions change - ERR_PTR() is returned on ->d_hash() errors;
NULL is just for dcache miss now.
* exported, open-coded instances in ncpfs and cifs converted.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:07 -05:00
Al Viro
3592ac4440 9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:07 -05:00
Al Viro
5fa6300ae0 9p: split dropping the acls from v9fs_set_create_acl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:06 -05:00
Al Viro
be308f0796 9p: switch v9fs_acl_chmod() from dentry to inode+fid
caller has both, might as well pass them explicitly.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:06 -05:00
Al Viro
0f235caeae 9p: switch v9fs_set_acl() from dentry to fid
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:06 -05:00
Al Viro
7f165aaa7d 9p: lift the call of set_cached_acl() into the callers of v9fs_set_acl()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:05 -05:00
Al Viro
38baba9ea0 9p: add fid-based variant of v9fs_xattr_set()
... making v9fs_xattr_set() a wrapper for it.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:46:05 -05:00
Al Viro
0df4d6e5bd hugetlb_file_setup(): use d_alloc_pseudo()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-26 02:45:52 -05:00
Weston Andros Adamson
a47970ff78 NFSv4.1: Hold reference to layout hdr in layoutget
This fixes an oops where a LAYOUTGET is in still in the rpciod queue,
but the requesting processes has been killed.  Without this, killing
the process does the final pnfs_put_layout_hdr() and sets NFS_I(inode)->layout
to NULL while the LAYOUTGET rpc task still references it.

Example oops:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000080
IP: [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
PGD 7365b067 PUD 7365d067 PMD 0
Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
Modules linked in: nfs_layout_nfsv41_files nfsv4 auth_rpcgss nfs lockd sunrpc ipt_MASQUERADE ip6table_mangle ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ip6table_filter ip6_tables ppdev e1000 i2c_piix4 i2c_core shpchp parport_pc parport crc32c_intel aesni_intel xts aes_x86_64 lrw gf128mul ablk_helper cryptd mptspi scsi_transport_spi mptscsih mptbase floppy autofs4
CPU 0
Pid: 27, comm: kworker/0:1 Not tainted 3.8.0-dros_cthon2013+ #4 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
RIP: 0010:[<ffffffffa01bd586>]  [<ffffffffa01bd586>] pnfs_choose_layoutget_stateid+0x37/0xef [nfsv4]
RSP: 0018:ffff88007b0c1c88  EFLAGS: 00010246
RAX: ffff88006ed36678 RBX: 0000000000000000 RCX: 0000000ea877e3bc
RDX: ffff88007a729da8 RSI: 0000000000000000 RDI: ffff88007a72b958
RBP: ffff88007b0c1ca8 R08: 0000000000000002 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff88007a72b958
R13: ffff88007a729da8 R14: 0000000000000000 R15: ffffffffa011077e
FS:  0000000000000000(0000) GS:ffff88007f600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000080 CR3: 00000000735f8000 CR4: 00000000001407f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:1 (pid: 27, threadinfo ffff88007b0c0000, task ffff88007c2fa0c0)
Stack:
 ffff88006fc05388 ffff88007a72b908 ffff88007b240900 ffff88006fc05388
 ffff88007b0c1cd8 ffffffffa01a2170 ffff88007b240900 ffff88007b240900
 ffff88007b240970 ffffffffa011077e ffff88007b0c1ce8 ffffffffa0110791
Call Trace:
 [<ffffffffa01a2170>] nfs4_layoutget_prepare+0x7b/0x92 [nfsv4]
 [<ffffffffa011077e>] ? __rpc_atrun+0x15/0x15 [sunrpc]
 [<ffffffffa0110791>] rpc_prepare_task+0x13/0x15 [sunrpc]

Reported-by: Tigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Cc: stable@kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-25 18:32:59 -08:00
Linus Torvalds
69086a78bd Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull vfs fix from Al Viro:
 "Fix for 3.8 breakage introduced by "vfs: Allow unprivileged
  manipulation of the mount namespace" - accessing mnt->mnt_ns is done
  there without needed locking *and* without any real need.

  Definite -stable fodder, fortunately not going too far back.

  This is *not* all - there will be much bigger vfs pull request
  tomorrow."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  get rid of unprotected dereferencing of mnt->mnt_ns
2013-02-25 16:17:01 -08:00
Linus Torvalds
94f2f14234 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace and namespace infrastructure changes from Eric W Biederman:
 "This set of changes starts with a few small enhnacements to the user
  namespace.  reboot support, allowing more arbitrary mappings, and
  support for mounting devpts, ramfs, tmpfs, and mqueuefs as just the
  user namespace root.

  I do my best to document that if you care about limiting your
  unprivileged users that when you have the user namespace support
  enabled you will need to enable memory control groups.

  There is a minor bug fix to prevent overflowing the stack if someone
  creates way too many user namespaces.

  The bulk of the changes are a continuation of the kuid/kgid push down
  work through the filesystems.  These changes make using uids and gids
  typesafe which ensures that these filesystems are safe to use when
  multiple user namespaces are in use.  The filesystems converted for
  3.9 are ceph, 9p, afs, ocfs2, gfs2, ncpfs, nfs, nfsd, and cifs.  The
  changes for these filesystems were a little more involved so I split
  the changes into smaller hopefully obviously correct changes.

  XFS is the only filesystem that remains.  I was hoping I could get
  that in this release so that user namespace support would be enabled
  with an allyesconfig or an allmodconfig but it looks like the xfs
  changes need another couple of days before it they are ready."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (93 commits)
  cifs: Enable building with user namespaces enabled.
  cifs: Convert struct cifs_ses to use a kuid_t and a kgid_t
  cifs: Convert struct cifs_sb_info to use kuids and kgids
  cifs: Modify struct smb_vol to use kuids and kgids
  cifs: Convert struct cifsFileInfo to use a kuid
  cifs: Convert struct cifs_fattr to use kuid and kgids
  cifs: Convert struct tcon_link to use a kuid.
  cifs: Modify struct cifs_unix_set_info_args to hold a kuid_t and a kgid_t
  cifs: Convert from a kuid before printing current_fsuid
  cifs: Use kuids and kgids SID to uid/gid mapping
  cifs: Pass GLOBAL_ROOT_UID and GLOBAL_ROOT_GID to keyring_alloc
  cifs: Use BUILD_BUG_ON to validate uids and gids are the same size
  cifs: Override unmappable incoming uids and gids
  nfsd: Enable building with user namespaces enabled.
  nfsd: Properly compare and initialize kuids and kgids
  nfsd: Store ex_anon_uid and ex_anon_gid as kuids and kgids
  nfsd: Modify nfsd4_cb_sec to use kuids and kgids
  nfsd: Handle kuids and kgids in the nfs4acl to posix_acl conversion
  nfsd: Convert nfsxdr to use kuids and kgids
  nfsd: Convert nfs3xdr to use kuids and kgids
  ...
2013-02-25 16:00:49 -08:00
Alex Elder
2c3dd4ff59 ceph: eliminate sparse warnings in fs code
Fix the causes for sparse warnings reported in the ceph file system
code.  Here there are only two (and they're sort of silly but
they're easy to fix).

This partially resolves:
    http://tracker.ceph.com/issues/4184

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
2013-02-25 15:37:14 -06:00
Al Viro
3f6d078d4a fix compat truncate/ftruncate
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-25 09:24:55 -05:00
Al Viro
561c673197 switch lseek to COMPAT_SYSCALL_DEFINE
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-24 10:52:26 -05:00
Benny Halevy
78f33277f9 pnfs: fix resend_to_mds for directio
Pass the directio request on pageio_init to clean up the API.

Percolate pg_dreq from original nfs_pageio_descriptor to the
pnfs_{read,write}_done_resend_to_mds and use it on respective
call to nfs_pageio_init_{read,write} on the newly created
nfs_pageio_descriptor.

Reproduced by command:
 mount -o vers=4.1 server:/ /mnt
 dd bs=128k count=8 if=/dev/zero of=/mnt/dd.out oflag=direct

BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
IP: [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
PGD 34786067 PUD 34794067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nfs_layout_nfsv41_files nfsv4 nfs nfsd lockd nfs_acl auth_rpcgss exportfs sunrpc btrfs zlib_deflate libcrc32c ipv6 autofs4
CPU 1
Pid: 259, comm: kworker/1:2 Not tainted 3.8.0-rc6 #2 Bochs Bochs
RIP: 0010:[<ffffffffa021a3a8>]  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
RSP: 0018:ffff880038f8fa68  EFLAGS: 00010206
RAX: ffffffffa021a6a9 RBX: ffff880038f8fb48 RCX: 00000000000a0000
RDX: ffffffffa021e616 RSI: ffff8800385e9a40 RDI: 0000000000000028
RBP: ffff880038f8fa68 R08: ffffffff81ad6720 R09: ffff8800385e9510
R10: ffffffffa0228450 R11: ffff880038e87418 R12: ffff8800385e9a40
R13: ffff8800385e9a70 R14: ffff880038f8fb38 R15: ffffffffa0148878
FS:  0000000000000000(0000) GS:ffff88003e400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000028 CR3: 0000000034789000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 259, threadinfo ffff880038f8e000, task ffff880038302480)
Stack:
 ffff880038f8fa78 ffffffffa021a6bf ffff880038f8fa88 ffffffffa021bb82
 ffff880038f8fae8 ffffffffa021f454 ffff880038f8fae8 ffffffff8109689d
 ffff880038f8fab8 ffffffff00000006 0000000000000000 ffff880038f8fb48
Call Trace:
 [<ffffffffa021a6bf>] nfs_direct_pgio_init+0x16/0x18 [nfs]
 [<ffffffffa021bb82>] nfs_pgheader_init+0x6a/0x6c [nfs]
 [<ffffffffa021f454>] nfs_generic_pg_writepages+0x51/0xf8 [nfs]
 [<ffffffff8109689d>] ? mark_held_locks+0x71/0x99
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa021bc25>] nfs_pageio_doio+0x1a/0x43 [nfs]
 [<ffffffffa021be7c>] nfs_pageio_complete+0x16/0x2c [nfs]
 [<ffffffffa02608be>] pnfs_write_done_resend_to_mds+0x95/0xc5 [nfsv4]
 [<ffffffffa0148878>] ? rpc_release_resources_task+0x37/0x37 [sunrpc]
 [<ffffffffa028e27f>] filelayout_reset_write+0x8c/0x99 [nfs_layout_nfsv41_files]
 [<ffffffffa028e5f9>] filelayout_write_done_cb+0x4d/0xc1 [nfs_layout_nfsv41_files]
 [<ffffffffa024587a>] nfs4_write_done+0x36/0x49 [nfsv4]
 [<ffffffffa021f996>] nfs_writeback_done+0x53/0x1cc [nfs]
 [<ffffffffa021fb1d>] nfs_writeback_done_common+0xe/0x10 [nfs]
 [<ffffffffa028e03d>] filelayout_write_call_done+0x28/0x2a [nfs_layout_nfsv41_files]
 [<ffffffffa01488a1>] rpc_exit_task+0x29/0x87 [sunrpc]
 [<ffffffffa014a0c9>] __rpc_execute+0x11d/0x3cc [sunrpc]
 [<ffffffff810969dc>] ? trace_hardirqs_on_caller+0x117/0x173
 [<ffffffffa014a39f>] rpc_async_schedule+0x27/0x32 [sunrpc]
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff8105f8c1>] process_one_work+0x226/0x422
 [<ffffffff8105f7f4>] ? process_one_work+0x159/0x422
 [<ffffffff81094757>] ? lock_acquired+0x210/0x249
 [<ffffffffa014a378>] ? __rpc_execute+0x3cc/0x3cc [sunrpc]
 [<ffffffff810600d8>] worker_thread+0x126/0x1c4
 [<ffffffff8105ffb2>] ? manage_workers+0x240/0x240
 [<ffffffff81064ef8>] kthread+0xb1/0xb9
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
 [<ffffffff815206ec>] ret_from_fork+0x7c/0xb0
 [<ffffffff81064e47>] ? __kthread_parkme+0x65/0x65
Code: 00 83 38 02 74 12 48 81 4b 50 00 00 01 00 c7 83 60 07 00 00 01 00 00 00 48 89 df e8 55 fe ff ff 5b 41 5c 5d c3 66 90 55 48 89 e5 <f0> ff 07 5d c3 55 48 89 e5 f0 ff 0f 0f 94 c0 84 c0 0f 95 c0 0f
RIP  [<ffffffffa021a3a8>] atomic_inc+0x4/0x9 [nfs]
 RSP <ffff880038f8fa68>
CR2: 0000000000000028

Signed-off-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@kernel.org [>= 3.6]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-24 10:07:36 -05:00
Linus Torvalds
9e2d59ad58 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull signal handling cleanups from Al Viro:
 "This is the first pile; another one will come a bit later and will
  contain SYSCALL_DEFINE-related patches.

   - a bunch of signal-related syscalls (both native and compat)
     unified.

   - a bunch of compat syscalls switched to COMPAT_SYSCALL_DEFINE
     (fixing several potential problems with missing argument
     validation, while we are at it)

   - a lot of now-pointless wrappers killed

   - a couple of architectures (cris and hexagon) forgot to save
     altstack settings into sigframe, even though they used the
     (uninitialized) values in sigreturn; fixed.

   - microblaze fixes for delivery of multiple signals arriving at once

   - saner set of helpers for signal delivery introduced, several
     architectures switched to using those."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (143 commits)
  x86: convert to ksignal
  sparc: convert to ksignal
  arm: switch to struct ksignal * passing
  alpha: pass k_sigaction and siginfo_t using ksignal pointer
  burying unused conditionals
  make do_sigaltstack() static
  arm64: switch to generic old sigaction() (compat-only)
  arm64: switch to generic compat rt_sigaction()
  arm64: switch compat to generic old sigsuspend
  arm64: switch to generic compat rt_sigqueueinfo()
  arm64: switch to generic compat rt_sigpending()
  arm64: switch to generic compat rt_sigprocmask()
  arm64: switch to generic sigaltstack
  sparc: switch to generic old sigsuspend
  sparc: COMPAT_SYSCALL_DEFINE does all sign-extension as well as SYSCALL_DEFINE
  sparc: kill sign-extending wrappers for native syscalls
  kill sparc32_open()
  sparc: switch to use of generic old sigaction
  sparc: switch sys_compat_rt_sigaction() to COMPAT_SYSCALL_DEFINE
  mips: switch to generic sys_fork() and sys_clone()
  ...
2013-02-23 18:50:11 -08:00
Linus Torvalds
5ce1a70e2f Merge branch 'akpm' (more incoming from Andrew)
Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
2013-02-23 17:50:35 -08:00
Zhang Yanfei
697ce9be7d fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
The three variables are calculated from nr_free_buffer_pages so change
their types to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Zhang Yanfei
43be594a6b fs/buffer.c: change type of max_buffer_heads to unsigned long
max_buffer_heads is calculated from nr_free_buffer_pages(), so change
its type to unsigned long in case of overflow.

Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:22 -08:00
Shaohua Li
33806f06da swap: make each swap partition have one address_space
When I use several fast SSD to do swap, swapper_space.tree_lock is
heavily contended.  This makes each swap partition have one
address_space to reduce the lock contention.  There is an array of
address_space for swap.  The swap entry type is the index to the array.

In my test with 3 SSD, this increases the swapout throughput 20%.

[akpm@linux-foundation.org: revert unneeded change to  __add_to_swap_cache]
Signed-off-by: Shaohua Li <shli@fusionio.com>
Cc: Hugh Dickins <hughd@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:17 -08:00
Xishi Qiu
293c07e31a memory-failure: use num_poisoned_pages instead of mce_bad_pages
Since MCE is an x86 concept, and this code is in mm/, it would be better
to use the name num_poisoned_pages instead of mce_bad_pages.

[akpm@linux-foundation.org: fix mm/sparse.c]
Signed-off-by: Xishi Qiu <qiuxishi@huawei.com>
Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Suggested-by: Borislav Petkov <bp@alien8.de>
Reviewed-by: Wanpeng Li <liwanp@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:15 -08:00
Michel Lespinasse
41badc15cb mm: make do_mmap_pgoff return populate as a size in bytes, not as a bool
do_mmap_pgoff() rounds up the desired size to the next PAGE_SIZE
multiple, however there was no equivalent code in mm_populate(), which
caused issues.

This could be fixed by introduced the same rounding in mm_populate(),
however I think it's preferable to make do_mmap_pgoff() return populate
as a size rather than as a boolean, so we don't have to duplicate the
size rounding logic in mm_populate().

Signed-off-by: Michel Lespinasse <walken@google.com>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:11 -08:00
Michel Lespinasse
bebeb3d68b mm: introduce mm_populate() for populating new vmas
When creating new mappings using the MAP_POPULATE / MAP_LOCKED flags (or
with MCL_FUTURE in effect), we want to populate the pages within the
newly created vmas.  This may take a while as we may have to read pages
from disk, so ideally we want to do this outside of the write-locked
mmap_sem region.

This change introduces mm_populate(), which is used to defer populating
such mappings until after the mmap_sem write lock has been released.
This is implemented as a generalization of the former do_mlock_pages(),
which accomplished the same task but was using during mlock() /
mlockall().

Signed-off-by: Michel Lespinasse <walken@google.com>
Reported-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Rik van Riel <riel@redhat.com>
Tested-by: Andy Lutomirski <luto@amacapital.net>
Cc: Greg Ungerer <gregungerer@westnet.com.au>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-23 17:50:10 -08:00
Al Viro
4e6b897328 hostfs: directory methods have no business in non-directory inode_operations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:37 -05:00
Al Viro
740da42efa __d_materialise_unique() is too generic
Its first argument is always non-root, while the second one is
always root.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:36 -05:00
Jan Kara
54c807e71d fs: Fix possible use-after-free with AIO
Running AIO is pinning inode in memory using file reference. Once AIO
is completed using aio_complete(), file reference is put and inode can
be freed from memory. So we have to be sure that calling aio_complete()
is the last thing we do with the inode.

CC: Christoph Hellwig <hch@infradead.org>
CC: Jens Axboe <axboe@kernel.dk>
CC: Jeff Moyer <jmoyer@redhat.com>
CC: stable@vger.kernel.org
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:36 -05:00
Al Viro
da2d8455ed constify d_lookup() arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:35 -05:00
Al Viro
a713ca2ab9 constify __d_lookup() arguments
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:35 -05:00
Al Viro
cc2a527115 lookup_slow: get rid of name argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:35 -05:00
Al Viro
e97cdc87be lookup_fast: get rid of name argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:34 -05:00
Al Viro
21b9b07392 get rid of name and type arguments of walk_component()
... always can be found in nameidata now.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:34 -05:00
Al Viro
5f4a6a6950 link_path_walk(): move assignments to nd->last/nd->last_type up
... and clean the main loop a bit

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:34 -05:00
Jeff Layton
ad8ca3743c vfs: remove d_path_with_unreachable
The last caller was removed >2 years ago in commit 7b2a69ba7.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:33 -05:00
Anatol Pomozov
39b6525274 fs: Preserve error code in get_empty_filp(), part 2
Allocating a file structure in function get_empty_filp() might fail because
of several reasons:
 - not enough memory for file structures
 - operation is not allowed
 - user is over its limit

Currently the function returns NULL in all cases and we loose the exact
reason of the error. All callers of get_empty_filp() assume that the function
can fail with ENFILE only.

Return error through pointer. Change all callers to preserve this error code.

[AV: cleaned up a bit, carved the get_empty_filp() part out into a separate commit
(things remaining here deal with alloc_file()), removed pipe(2) behaviour change]

Signed-off-by: Anatol Pomozov <anatol.pomozov@gmail.com>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:32 -05:00
Al Viro
1afc99beaf propagate error from get_empty_filp() to its callers
Based on parts from Anatol's patch (the rest is the next commit).

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:32 -05:00
Al Viro
496ad9aa8e new helper: file_inode(file)
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Al Viro
57eccb830f mount: consolidate permission checks
... and ask for global CAP_SYS_ADMIN only for superblock-level remounts

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:31 -05:00
Al Viro
9b40bc90ab get rid of unprotected dereferencing of mnt->mnt_ns
It's safe only under namespace_sem or vfsmount_lock; all places
in fs/namespace.c that want mnt->mnt_ns->user_ns actually want to use
current->nsproxy->mnt_ns->user_ns (note the calls of check_mnt() in
there).

Cc: stable@vger.kernel.org
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2013-02-22 23:31:05 -05:00
Linus Torvalds
3b5d8510b9 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core locking changes from Ingo Molnar:
 "The biggest change is the rwsem lock-steal improvements, both to the
  assembly optimized and the spinlock based variants.

  The other notable change is the clean up of the seqlock implementation
  to be based on the seqcount infrastructure.

  The rest is assorted smaller debuggability, cleanup and continued -rt
  locking changes."

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rwsem-spinlock: Implement writer lock-stealing for better scalability
  futex: Revert "futex: Mark get_robust_list as deprecated"
  generic: Use raw local irq variant for generic cmpxchg
  lockdep: Selftest: convert spinlock to raw spinlock
  seqlock: Use seqcount infrastructure
  seqlock: Remove unused functions
  ntp: Make ntp_lock raw
  intel_idle: Convert i7300_idle_lock to raw_spinlock
  locking: Various static lock initializer fixes
  lockdep: Print more info when MAX_LOCK_DEPTH is exceeded
  rwsem: Implement writer lock-stealing for better scalability
  lockdep: Silence warning if CONFIG_LOCKDEP isn't set
  watchdog: Use local_clock for get_timestamp()
  lockdep: Rename print_unlock_inbalance_bug() to print_unlock_imbalance_bug()
  locking/stat: Fix a typo
2013-02-22 19:25:09 -08:00
Sage Weil
92a49fb0f7 ceph: fix statvfs fr_size
Different versions of glibc are broken in different ways, but the short of
it is that for the time being, frsize should == bsize, and be used as the
multiple for the blocks, free, and available fields.  This mirrors what is
done for NFS.  The previous reporting of the page size for frsize meant
that newer glibc and df would report a very small value for the fs size.

Fixes http://tracker.ceph.com/issues/3793.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Greg Farnum <greg@inktank.com>
2013-02-22 15:31:00 -08:00
Lukas Czerner
304e220f08 ext4: fix free clusters calculation in bigalloc filesystem
ext4_has_free_clusters() should tell us whether there is enough free
clusters to allocate, however number of free clusters in the file system
is converted to blocks using EXT4_C2B() which is not only wrong use of
the macro (we should have used EXT4_NUM_B2C) but it's also completely
wrong concept since everything else is in cluster units.

Moreover when calculating number of root clusters we should be using
macro EXT4_NUM_B2C() instead of EXT4_B2C() otherwise the result might be
off by one. However r_blocks_count should always be a multiple of the
cluster ratio so doing a plain bit shift should be enough here. We
avoid using EXT4_B2C() because it's confusing.

As a result of the first problem number of free clusters is much bigger
than it should have been and ext4_has_free_clusters() would return 1 even
if there is really not enough free clusters available.

Fix this by removing the EXT4_C2B() conversion of free clusters and
using bit shift when calculating number of root clusters. This bug
affects number of xfstests tests covering file system ENOSPC situation
handling. With this patch most of the ENOSPC problems with bigalloc file
system disappear, especially the errors caused by delayed allocation not
having enough space when the actual allocation is finally requested.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
2013-02-22 15:27:52 -05:00
Eryu Guan
d438147211 ext4: no need to remove extent if len is 0 in ext4_es_remove_extent()
len is 0 means no extent needs to be removed, so return immediately.
Otherwise it could trigger the following BUG_ON() in
ext4_es_remove_extent()

	end = lblk + len - 1;
	BUG_ON(end < lblk);

This could be reproduced by a simple truncate(1) command by an
unprivileged user

	truncate -s $(($((2**32 - 1)) * 4096)) /mnt/ext4/testfile

The same is true for __es_insert_extent().

Patched kernel passed xfstests regression test.

Signed-off-by: Eryu Guan <guaneryu@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: Zheng Liu <wenqing.lz@taobao.com>
2013-02-22 15:27:47 -05:00
Trond Myklebust
5a7a613a47 NFS: Don't allow NFS silly-renamed files to be deleted, no signal
Commit 73ca100 broke the code that prevents the client from deleting
a silly renamed dentry.  This affected "delete on last close"
semantics as after that commit, nothing prevented removal of
silly-renamed files.  As a result, a process holding a file open
could easily get an ESTALE on the file in a directory where some
other process issued 'rm -rf some_dir_containing_the_file' twice.
Before the commit, any attempt at unlinking silly renamed files would
fail inside may_delete() with -EBUSY because of the
DCACHE_NFSFS_RENAMED flag.  The following testcase demonstrates
the problem:
  tail -f /nfsmnt/dir/file &
  rm -rf /nfsmnt/dir
  rm -rf /nfsmnt/dir
  # second removal does not fail, 'tail' process receives ESTALE

The problem with the above commit is that it unhashes the old and
new dentries from the lookup path, even in the normal case when
a signal is not encountered and it would have been safe to call
d_move.  Unfortunately the old dentry has the special
DCACHE_NFSFS_RENAMED flag set on it.  Unhashing has the
side-effect that future lookups call d_alloc(), allocating a new
dentry without the special flag for any silly-renamed files.  As a
result, subsequent calls to unlink silly renamed files do not fail
but allow the removal to go through.  This will result in ESTALE
errors for any other process doing operations on the file.

To fix this, go back to using d_move on success.
For the signal case, it's unclear what we may safely do beyond d_drop.

Reported-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Cc: stable@vger.kernel.org
2013-02-22 14:55:34 -05:00
Guo Chao
de33127d8d block: remove redundant check to bd_openers()
bd_openers is stable under bd_mutex, no need to check it twice.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:42:46 +01:00
Guo Chao
d646a02a9d block: use i_size_write() in bd_set_size()
blkdev_ioctl(GETBLKSIZE) uses i_size_read() to read size of block device.
If we update block size directly, reader may see intermediate result in
some machines and configurations.  Use i_size_write() instead.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: M. Hindess <hindessm@uk.ibm.com>
Cc: Nikanth Karthikesan <knikanth@suse.de>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2013-02-22 10:42:46 +01:00
Linus Torvalds
9afa3195b9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "Assorted tiny fixes queued in trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (22 commits)
  DocBook: update EXPORT_SYMBOL entry to point at export.h
  Documentation: update top level 00-INDEX file with new additions
  ARM: at91/ide: remove unsused at91-ide Kconfig entry
  percpu_counter.h: comment code for better readability
  x86, efi: fix comment typo in head_32.S
  IB: cxgb3: delay freeing mem untill entirely done with it
  net: mvneta: remove unneeded version.h include
  time: x86: report_lost_ticks doesn't exist any more
  pcmcia: avoid static analysis complaint about use-after-free
  fs/jfs: Fix typo in comment : 'how may' -> 'how many'
  of: add missing documentation for of_platform_populate()
  btrfs: remove unnecessary cur_trans set before goto loop in join_transaction
  sound: soc: Fix typo in sound/codecs
  treewide: Fix typo in various drivers
  btrfs: fix comment typos
  Update ibmvscsi module name in Kconfig.
  powerpc: fix typo (utilties -> utilities)
  of: fix spelling mistake in comment
  h8300: Fix home page URL in h8300/README
  xtensa: Fix home page URL in Kconfig
  ...
2013-02-21 17:40:58 -08:00
Linus Torvalds
7c2db36e73 Merge branch 'akpm' (incoming from Andrew)
Merge misc patches from Andrew Morton:

 - Florian has vanished so I appear to have become fbdev maintainer
   again :(

 - Joel and Mark are distracted to welcome to the new OCFS2 maintainer

 - The backlight queue

 - Small core kernel changes

 - lib/ updates

 - The rtc queue

 - Various random bits

* akpm: (164 commits)
  rtc: rtc-davinci: use devm_*() functions
  rtc: rtc-max8997: use devm_request_threaded_irq()
  rtc: rtc-max8907: use devm_request_threaded_irq()
  rtc: rtc-da9052: use devm_request_threaded_irq()
  rtc: rtc-wm831x: use devm_request_threaded_irq()
  rtc: rtc-tps80031: use devm_request_threaded_irq()
  rtc: rtc-lp8788: use devm_request_threaded_irq()
  rtc: rtc-coh901331: use devm_clk_get()
  rtc: rtc-vt8500: use devm_*() functions
  rtc: rtc-tps6586x: use devm_request_threaded_irq()
  rtc: rtc-imxdi: use devm_clk_get()
  rtc: rtc-cmos: use dev_warn()/dev_dbg() instead of printk()/pr_debug()
  rtc: rtc-pcf8583: use dev_warn() instead of printk()
  rtc: rtc-sun4v: use pr_warn() instead of printk()
  rtc: rtc-vr41xx: use dev_info() instead of printk()
  rtc: rtc-rs5c313: use pr_err() instead of printk()
  rtc: rtc-at91rm9200: use dev_dbg()/dev_err() instead of printk()/pr_debug()
  rtc: rtc-rs5c372: use dev_dbg()/dev_warn() instead of printk()/pr_debug()
  rtc: rtc-ds2404: use dev_err() instead of printk()
  rtc: rtc-efi: use dev_err()/dev_warn()/pr_err() instead of printk()
  ...
2013-02-21 17:38:49 -08:00