Commit graph

40375 commits

Author SHA1 Message Date
Richard Weinberger
4c6dcafc69 hostfs: Allow fsync on directories
Historically hostfs did not open directories on the host filesystem
for performance and memory reasons.
But it turned out that this optimization has a drawback.
Calling fsync() on a hostfs directory returns immediately
with -EINVAL as fsync is not implemented.
While this is behavior is strictly speaking correct common userspace
like dpkg(1) stumbles over that and makes it impossible to use
hostfs as root filesystem.
The fix is easy, wire up the existing host open/fsync functions
to the directory file operations.

Reported-by: Daniel Gröber <dxld@darkboxed.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
2015-03-26 23:27:48 +01:00
Richard Weinberger
af9556586a hostfs: hostfs_file_open: Fix a fd leak in hostfs_file_open
In case of a race between to callers of hostfs_file_open()
it can happen that a file describtor is leaked.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-03-26 23:27:48 +01:00
Richard Weinberger
69886e676e hostfs: hostfs_file_open: Switch to data locking model
Instead of serializing hostfs_file_open() we can use
a per inode mutex to protect ->mode.

Signed-off-by: Richard Weinberger <richard@nod.at>
2015-03-26 23:27:47 +01:00
Kinglong Mee
7890203da2 NFSD: Fix bad update of layout in nfsd4_return_file_layout
With return layout as, (seg is return layout, lo is record layout)
seg->offset <= lo->offset and layout_end(seg) < layout_end(lo),
nfsd should update lo's offset to seg's end,
and,
seg->offset > lo->offset and layout_end(seg) >= layout_end(lo),
nfsd should update lo's end to seg's offset.

Fixes: 9cf514ccfa ("nfsd: implement pNFS operations")
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:13:03 -04:00
Kinglong Mee
376675daea NFSD: Take care the return value from nfsd4_encode_stateid
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:13:02 -04:00
Kinglong Mee
853695230e NFSD: Printk blocklayout length and offset as format 0x%llx
When testing pnfs with nfsd_debug on, nfsd print a negative number
of layout length and foff in nfsd4_block_proc_layoutget as,
"GET: -xxxx:-xxx 2"

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:13:02 -04:00
J. Bruce Fields
340f0ba1c6 nfsd: return correct lockowner when there is a race on hash insert
alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

Noticed by inspection after Jeff Layton fixed the same bug for open
owners.  Depending on client behavior, this one may be trickier to
trigger in practice.

Fixes: c58c6610ec "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:06:16 -04:00
Jeff Layton
c5952338bf nfsd: return correct openowner when there is a race to put one in the hash
alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

In commit 7ffb588086, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).

This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner.  That
has changed now with v4.0 kernels, and this race can now easily occur.

Fixes: 7ffb588086
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: Christoph Hellwig <hch@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-25 21:06:06 -04:00
Christoph Hellwig
e2e40f2c1e fs: move struct kiocb to fs.h
struct kiocb now is a generic I/O container, so move it to fs.h.
Also do a #include diet for aio.h while we're at it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-25 20:28:11 -04:00
Sergei Antonov
98cf21c61a hfsplus: fix B-tree corruption after insertion at position 0
Fix B-tree corruption when a new record is inserted at position 0 in the
node in hfs_brec_insert().  In this case a hfs_brec_update_parent() is
called to update the parent index node (if exists) and it is passed
hfs_find_data with a search_key containing a newly inserted key instead
of the key to be updated.  This results in an inconsistent index node.
The bug reproduces on my machine after an extents overflow record for
the catalog file (CNID=4) is inserted into the extents overflow B-tree.
Because of a low (reserved) value of CNID=4, it has to become the first
record in the first leaf node.

The resulting first leaf node is correct:

  ----------------------------------------------------
  | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
  ----------------------------------------------------

But the parent index key0 still contains the previous key CNID=123:

  -----------------------
  | key0.CNID=123 | ... |
  -----------------------

A change in hfs_brec_insert() makes hfs_brec_update_parent() work
correctly by preventing it from getting fd->record=-1 value from
__hfs_brec_find().

Along the way, I removed duplicate code with unification of the if
condition.  The resulting code is equivalent to the original code
because node is never 0.

Also hfs_brec_update_parent() will now return an error after getting a
negative fd->record value.  However, the return value of
hfs_brec_update_parent() is not checked anywhere in the file and I'm
leaving it unchanged by this patch.  brec.c lacks error checking after
some other calls too, but this issue is of less importance than the one
being fixed by this patch.

Signed-off-by: Sergei Antonov <saproj@gmail.com>
Cc: Joe Perches <joe@perches.com>
Reviewed-by: Vyacheslav Dubeyko <slava@dubeyko.com>
Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net>
Cc: Anton Altaparmakov <aia21@cam.ac.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25 16:20:31 -07:00
Taesoo Kim
3d5d472cf5 fs/affs/file.c: unlock/release page on error
When affs_bread_ino() fails, correctly unlock the page and release the
page cache with proper error value.  All write_end() should
unlock/release the page that was locked by write_beg().

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Cc: Fabian Frederick <fabf@skynet.be>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-03-25 16:20:31 -07:00
Chris Mason
fc4c3c872f Merge branch 'cleanups-post-3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1
Signed-off-by: Chris Mason <clm@fb.com>

Conflicts:
	fs/btrfs/disk-io.c
2015-03-25 10:52:48 -07:00
Chris Mason
9deed229fa Merge branch 'cleanups-for-4.1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into for-linus-4.1 2015-03-25 10:43:16 -07:00
Vivien Didelot
d8bf8c92e8 sysfs: Only accept read/write permissions for file attributes
For sysfs file attributes, only read and write permissions make sense.
Mask provided attribute permissions accordingly and send a warning
to the console if invalid permission bits are set.

This patch is originally from Guenter [1] and includes the fixup
explained in the thread, that is printing permissions in octal format
and limiting the scope of attributes to SYSFS_PREALLOC | 0664.

[1] https://lkml.org/lkml/2015/1/19/599

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:27:57 +01:00
Guenter Roeck
da4759c73b sysfs: Use only return value from is_visible for the file mode
Up to now, is_visible can only be used to either remove visibility
of a file entirely or to add permissions, but not to reduce permissions.
This makes it impossible, for example, to use DEVICE_ATTR_RW to define
file attributes and reduce permissions to read-only.

This behavior is undesirable and unnecessarily complicates code which
needs to reduce permissions; instead of just returning the desired
permissions, it has to ensure that the permissions in the attribute
variable declaration only reflect the minimal permissions ever needed.

Change semantics of is_visible to only use the permissions returned
from it instead of oring the returned value with the hard-coded
permissions.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2015-03-25 13:27:57 +01:00
Sheng Yong
235c362bd0 UBIFS: extend debug/message capabilities
In the case where we have more than one volumes on different UBI
devices, it may be not that easy to tell which volume prints the
messages.  Add ubi number and volume id in ubifs_msg/warn/error
to help debug. These two values are passed by struct ubifs_info.

For those where ubifs_info is not initialized yet, ubifs_* is
replaced by pr_*. For those where ubifs_info is not avaliable,
ubifs_info is passed to the calling function as a const parameter.

The output looks like,

[   95.444879] UBIFS (ubi0:1): background thread "ubifs_bgt0_1" started, PID 696
[   95.484688] UBIFS (ubi0:1): UBIFS: mounted UBI device 0, volume 1, name "test1"
[   95.484694] UBIFS (ubi0:1): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[   95.484699] UBIFS (ubi0:1): FS size: 30220288 bytes (28 MiB, 238 LEBs), journal size 1523712 bytes (1 MiB, 12 LEBs)
[   95.484703] UBIFS (ubi0:1): reserved for root: 1427378 bytes (1393 KiB)
[   95.484709] UBIFS (ubi0:1): media format: w4/r0 (latest is w4/r0), UUID 40DFFC0E-70BE-4193-8905-F7D6DFE60B17, small LPT model
[   95.489875] UBIFS (ubi1:0): background thread "ubifs_bgt1_0" started, PID 699
[   95.529713] UBIFS (ubi1:0): UBIFS: mounted UBI device 1, volume 0, name "test2"
[   95.529718] UBIFS (ubi1:0): LEB size: 126976 bytes (124 KiB), min./max. I/O unit sizes: 2048 bytes/2048 bytes
[   95.529724] UBIFS (ubi1:0): FS size: 19808256 bytes (18 MiB, 156 LEBs), journal size 1015809 bytes (0 MiB, 8 LEBs)
[   95.529727] UBIFS (ubi1:0): reserved for root: 935592 bytes (913 KiB)
[   95.529733] UBIFS (ubi1:0): media format: w4/r0 (latest is w4/r0), UUID EEB7779D-F419-4CA9-811B-831CAC7233D4, small LPT model

[  954.264767] UBIFS error (ubi1:0 pid 756): ubifs_read_node: bad node type (255 but expected 6)
[  954.367030] UBIFS error (ubi1:0 pid 756): ubifs_read_node: bad node at LEB 0:0, LEB mapping status 1

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-03-25 11:08:41 +02:00
Fabian Frederick
8a87dc55f7 UBIFS: simplify returns
Directly return recover_head() and ubifs_leb_unmap()
instead of storing value in err and testing it.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-03-25 11:08:41 +02:00
Yannick Guerrini
d3f9db00d0 UBIFS: Fix trivial typos in comments
Change 'comress' to 'compress'
Change 'inteval' to 'interval'
Change 'disabe' to 'disable'
Change 'nenver' to 'never'

Signed-off-by: Yannick Guerrini <yguerrini@tomshardware.fr>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-03-25 11:08:41 +02:00
Sheng Yong
2c84599ca4 UBIFS: do not write master node if need recovery
The commits 781c571 ("UBIFS: intialize LPT earlier") and 0980119 ("UBIFS:
fix-up free space earlier") move some initialization before marking the
master node dirty. But the modification changes the conditions of writing
master.

If unclean umount happens, ubifs may fail when mounting. But trying to
mount it will write new master nodes on the flash. This is useless but
increasing sqnum. So check need_recovery before writing master node, and
don't create new master node if filesystem needs recovery.

The behavour of the bug shows at:
http://lists.infradead.org/pipermail/linux-mtd/2015-February/057712.html

Signed-off-by: Sheng Yong <shengyong1@huawei.com>
Reviewed-by: Ben Gardiner <ben.l.gardiner@gmail.com>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-03-25 11:08:40 +02:00
Taesoo Kim
9401a795c6 UBIFS: fix incorrect unlocking handling
When ubifs_init_security() fails, 'ui_mutex' is incorrectly
unlocked and incorrectly restores 'i_size'. Fix this.

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Fixes: d7f0b70d30 ("UBIFS: Add security.* XATTR support for the UBIFS")
Reviewed-by: Ben Shelton <ben.shelton@ni.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
2015-03-25 11:08:40 +02:00
Dave Chinner
a448f8f1b7 Merge branch 'fallocate-insert-range' into for-next 2015-03-25 15:12:53 +11:00
Dave Chinner
2b93681f59 Merge branch 'xfs-misc-fixes-for-4.1-2' into for-next
Conflicts:
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/xfs_inode.c
2015-03-25 15:12:30 +11:00
Namjae Jeon
a904b1ca57 xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
This patch implements fallocate's FALLOC_FL_INSERT_RANGE for XFS.

1) Make sure that both offset and len are block size aligned.
2) Update the i_size of inode by len bytes.
3) Compute the file's logical block number against offset. If the computed
   block number is not the starting block of the extent, split the extent
   such that the block number is the starting block of the extent.
4) Shift all the extents which are lying bewteen [offset, last allocated extent]
   towards right by len bytes. This step will make a hole of len bytes
   at offset.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:08:56 +11:00
Namjae Jeon
dd46c78778 fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
FALLOC_FL_INSERT_RANGE command is the opposite command of
FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add
some data in the middle of file.

FALLOC_FL_INSERT_RANGE will create space for writing new data within
a file after shifting extents to right as given length. This command
also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that
operations need to be filesystem block boundary aligned and cannot
cross the current EOF.

Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com>
Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:07:05 +11:00
Joe Perches
5e9383f97e xfs: Fix incorrect positive ENOMEM return
added a positive error return value.

This value filters up through the return layers and should be
negative as the other return values are in the same function.

Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 15:00:24 +11:00
Byoungyoung Lee
20dafeefac xfs: xfs_mru_cache_insert() should use GFP_NOFS
xfs_mru_cache_insert() can be called from within transaction context
during block allocation like so:

write(2)
  ....
    xfs_get_blocks
      xfs_iomap_write_direct
        start transaction
        xfs_bmapi_write
          xfs_bmapi_allocate
            xfs_bmap_btalloc
              xfs_bmap_btalloc_filestreams
                xfs_filestream_new_ag
                  xfs_filestream_pick_ag
                    xfs_mru_cache_insert
                      radix_tree_preload(GFP_KERNEL)

In this case, GFP_KERNEL is incorrect and can potentially lead to
deadlocks in memory reclaim. It should use GFP_NOFS allocations to
avoid lock recursion problems.

[dchinner: rewrote commit message]

Signed-off-by: Byoungyoung Lee <blee@gatech.edu>
Signed-off-by: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:57:53 +11:00
Scott Wood
65dd297ac2 xfs: %pF is only for function pointers
Use %pS for actual addresses, otherwise you'll get bad output
on arches like ppc64 where %pF expects a function descriptor.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:56:21 +11:00
Fabian Frederick
29916df08d xfs: fix shadow warning in xfs_da3_root_split()
Use icnodehdr for struct xfs_da3_icnode_hdr instead of nodehdr
(already declared above).

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:55:25 +11:00
Fabian Frederick
86aaf02e57 xfs: use bool instead of int in xfs_rename()
new_parent and src_is_directory are only used in 0/1 context.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:54:53 +11:00
Eric Sandeen
b26384dc52 xfs: fix NULL pointer dereference in xfs_filestream_lookup_ag()
If xfs_filestream_get_parent() fails, we have a null pip,
goto out, and attempt to IRELE(NULL).  This causes a null
pointer dereference and BUG().

Fix this by directly returning NULLAGNUMBER in this case.

Reported-by: Adrien Nader <adrien@notk.org>
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:54:25 +11:00
Dave Chinner
d64588ca28 xfs: remove xfs_bmap_sanity_check()
This code is redundant now that we have verifiers that sanity check
the buffers as they are read from disk.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:53:48 +11:00
Dave Chinner
d41bb03444 Merge branch 'xfs-rename-whiteout' into for-next
Conflicts:
	fs/xfs/xfs_inode.c
2015-03-25 14:29:13 +11:00
Dave Chinner
7dcf5c3e45 xfs: add RENAME_WHITEOUT support
Whiteouts are used by overlayfs -  it has a crazy convention that a
whiteout is a character device inode with a major:minor of 0:0.
Because it's not documented anywhere, here's an example of what
RENAME_WHITEOUT does on ext4:

# echo foo > /mnt/scratch/foo
# echo bar > /mnt/scratch/bar
# ls -l /mnt/scratch
total 24
-rw-r--r-- 1 root root     4 Feb 11 20:22 bar
-rw-r--r-- 1 root root     4 Feb 11 20:22 foo
drwx------ 2 root root 16384 Feb 11 20:18 lost+found
# src/renameat2 -w /mnt/scratch/foo /mnt/scratch/bar
# ls -l /mnt/scratch
total 20
-rw-r--r-- 1 root root     4 Feb 11 20:22 bar
c--------- 1 root root  0, 0 Feb 11 20:23 foo
drwx------ 2 root root 16384 Feb 11 20:18 lost+found
# cat /mnt/scratch/bar
foo
#

In XFS rename terms, the operation that has been done is that source
(foo) has been moved to the target (bar), which is like a nomal
rename operation, but rather than the source being removed, it have
been replaced with a whiteout.

We can't allocate whiteout inodes within the rename transaction due
to allocation being a multi-commit transaction: rename needs to
be a single, atomic commit. Hence we have several options here, form
most efficient to least efficient:

    - use DT_WHT in the target dirent and do no whiteout inode
      allocation.  The main issue with this approach is that we need
      hooks in lookup to create a virtual chardev inode to present
      to userspace and in places where we might need to modify the
      dirent e.g. unlink.  Overlayfs also needs to be taught about
      DT_WHT. Most invasive change, lowest overhead.

    - create a special whiteout inode in the root directory (e.g. a
      ".wino" dirent) and then hardlink every new whiteout to it.
      This means we only need to create a single whiteout inode, and
      rename simply creates a hardlink to it. We can use DT_WHT for
      these, though using DT_CHR means we won't have to modify
      overlayfs, nor anything in userspace. Downside is we have to
      look up the whiteout inode on every operation and create it if
      it doesn't exist.

    - copy ext4: create a special whiteout chardev inode for every
      whiteout.  This is more complex than the above options because
      of the lack of atomicity between inode creation and the rename
      operation, requiring us to create a tmpfile inode and then
      linking it into the directory structure during the rename. At
      least with a tmpfile inode crashes between the create and
      rename doesn't leave unreferenced inodes or directory
      pollution around.

By far the simplest thing to do in the short term is to copy ext4.
While it is the most inefficient way of supporting whiteouts, but as
an initial implementation we can simply reuse existing functions and
add a small amount of extra code the the rename operation.

When we get full whiteout support in the VFS (via the dentry cache)
we can then look to supporting DT_WHT method outlined as the first
method of supporting whiteouts. But until then, we'll stick with
what overlayfs expects us to be: dumb and stupid.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
2015-03-25 14:08:08 +11:00
Dave Chinner
eeacd3217b xfs: make xfs_cross_rename() complete fully
Now that xfs_finish_rename() exists, there is no reason for
xfs_cross_rename() to return to xfs_rename() to finish off the
rename transaction. Drive the completion code into
xfs_cross_rename() and handle all errors there so as to simplify
the xfs_rename() code.

Further, push the rename exchange target_ip check to early in the
rename code so as to make the error handling easy and obviously
correct.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:08:07 +11:00
Dave Chinner
310606b0c7 xfs: factor out xfs_finish_rename()
Rather than use a jump label for the final transaction commit in
the rename, factor it into a simple helper function and call it
appropriately. This slightly reduces the spaghetti nature of
xfs_rename.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:06:07 +11:00
Dave Chinner
445883e813 xfs: cleanup xfs_rename error handling
The jump labels are ambiguous and unclear and some of the error
paths are used inconsistently. Rules for error jumps are:

- use out_trans_cancel for unmodified transaction context
- use out_bmap_cancel on ENOSPC errors
- use out_trans_abort when transaction is likely to be dirty.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:05:43 +11:00
Dave Chinner
95afcf5c7b xfs: clean up inode locking for RENAME_WHITEOUT
When doing RENAME_WHITEOUT, we now have to lock 5 inodes into the
rename transaction. This means we need to update
xfs_sort_for_rename() and xfs_lock_inodes() to handle up to 5
inodes. Because of the vagaries of rename, this means we could have
anywhere between 3 and 5 inodes locked into the transaction....

While xfs_lock_inodes() does not need anything other than an assert
telling us we are passing more inodes that we ever thought we should
see, it could do with a logic rework to remove all the indenting.
This is not a functional change - it just makes the code a lot
easier to read.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
2015-03-25 14:03:32 +11:00
Al Viro
6e8a1f8741 switch path_init() to struct filename
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:16 -04:00
Al Viro
668696dcbb switch path_mountpoint() to struct filename
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:15 -04:00
Al Viro
5eb6b495c6 switch path_lookupat() to struct filename
all callers were passing it ->name of some struct filename

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:15 -04:00
Al Viro
94b5d2621a getname_flags(): clean up a bit
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2015-03-24 17:19:14 -04:00
Hari Bathini
ae011d2e48 pstore: Add pstore type id for PPC64 opal nvram partition
This patch adds a new PPC64 partition type to be used for opal
specific nvram partition. A new partition type is needed as none
of the existing type matches this partition type.

Signed-off-by: Hari Bathini <hbathini@linux.vnet.ibm.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-03-23 14:06:10 +11:00
Linus Torvalds
4541c22605 Driver core fixes for 4.0-rc5
Here are two bugfixes for things reported.  One regression in kernfs,
 and another issue fixed in the LZ4 code that was fixed in the "upstream"
 codebase that solves a reported kernel crash.
 
 Both have been in linux-next for a while.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iEYEABECAAYFAlUOl6YACgkQMUfUDdst+ylRjQCfT46/we0U/S38/sTNYtElQXUI
 ZEsAnAwozBj6nIYOydxaKnA0BcujsSGz
 =unYd
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core fixes from Greg KH:
 "Here are two bugfixes for things reported.  One regression in kernfs,
  and another issue fixed in the LZ4 code that was fixed in the
  "upstream" codebase that solves a reported kernel crash

  Both have been in linux-next for a while"

* tag 'driver-core-4.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  LZ4 : fix the data abort issue
  kernfs: handle poll correctly on 'direct_read' files.
2015-03-22 12:07:47 -07:00
Dominique Martinet
5c4086b8de fs/9p: Initialize status in v9fs_file_do_lock.
If p9_client_lock_dotl returns an error, status is possibly never filled
but will be used in the following switch.
Initializing it to P9_LOCK_ERROR makes sur we will return an error and
cleanup (and not hit the default case).

Signed-off-by: Dominique Martinet <dominique.martinet@cea.fr>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2015-03-21 19:30:31 -07:00
Linus Torvalds
521d474631 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "Most of these are fixing extent reservation accounting, or corners
  with tree writeback during commit.

  Josef's set does add a test, which isn't strictly a fix, but it'll
  keep us from making this same mistake again"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: fix outstanding_extents accounting in DIO
  Btrfs: add sanity test for outstanding_extents accounting
  Btrfs: just free dummy extent buffers
  Btrfs: account merges/splits properly
  Btrfs: prepare block group cache before writing
  Btrfs: fix ASSERT(list_empty(&cur_trans->dirty_bgs_list)
  Btrfs: account for the correct number of extents for delalloc reservations
  Btrfs: fix merge delalloc logic
  Btrfs: fix comp_oper to get right order
  Btrfs: catch transaction abortion after waiting for it
  btrfs: fix sizeof format specifier in btrfs_check_super_valid()
2015-03-21 10:53:37 -07:00
Linus Torvalds
0d122f7430 Merge branch 'for-4.0' of git://linux-nfs.org/~bfields/linux
Pull nfsd bufix from Bruce Fields:
 "This is a fix for a crash easily triggered by 4.1 activity to a server
  built with CONFIG_NFSD_PNFS.

  There are some more bugfixes queued up that I intend to pass along
  next week, but this is the most critical"

* 'for-4.0' of git://linux-nfs.org/~bfields/linux:
  Subject: nfsd: don't recursively call nfsd4_cb_layout_fail
2015-03-21 10:41:15 -07:00
Taesoo Kim
2bd50fb3d4 cifs: potential memory leaks when parsing mnt opts
For example, when mount opt is redundently specified
(e.g., "user=A,user=B,user=C"), kernel kept allocating new key/val
with kstrdup() and overwrite previous ptr (to be freed).

Althouhg mount.cifs in userspace performs a bit of sanitization
(e.g., forcing one user option), current implementation is not
robust. Other options such as iocharset and domainanme are similarly
vulnerable.

Signed-off-by: Taesoo Kim <tsgatesv@gmail.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 12:01:50 -05:00
David Disseldorp
e1e9bda22d cifs: fix use-after-free bug in find_writable_file
Under intermittent network outages, find_writable_file() is susceptible
to the following race condition, which results in a user-after-free in
the cifs_writepages code-path:

Thread 1                                        Thread 2
========                                        ========

inv_file = NULL
refind = 0
spin_lock(&cifs_file_list_lock)

// invalidHandle found on openFileList

inv_file = open_file
// inv_file->count currently 1

cifsFileInfo_get(inv_file)
// inv_file->count = 2

spin_unlock(&cifs_file_list_lock);

cifs_reopen_file()                            cifs_close()
// fails (rc != 0)                            ->cifsFileInfo_put()
                                       spin_lock(&cifs_file_list_lock)
                                       // inv_file->count = 1
                                       spin_unlock(&cifs_file_list_lock)

spin_lock(&cifs_file_list_lock);
list_move_tail(&inv_file->flist,
      &cifs_inode->openFileList);
spin_unlock(&cifs_file_list_lock);

cifsFileInfo_put(inv_file);
->spin_lock(&cifs_file_list_lock)

  // inv_file->count = 0
  list_del(&cifs_file->flist);
  // cleanup!!
  kfree(cifs_file);

  spin_unlock(&cifs_file_list_lock);

spin_lock(&cifs_file_list_lock);
++refind;
// refind = 1
goto refind_writable;

At this point we loop back through with an invalid inv_file pointer
and a refind value of 1. On second pass, inv_file is not overwritten on
openFileList traversal, and is subsequently dereferenced.

Signed-off-by: David Disseldorp <ddiss@suse.de>
Reviewed-by: Jeff Layton <jlayton@samba.org>
CC: <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 10:56:27 -05:00
Sachin Prabhu
2477bc58d4 cifs: smb2_clone_range() - exit on unhandled error
While attempting to clone a file on a samba server, we receive a
STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
isn't handled in smb2_clone_range(). We end up looping in the while loop
making same call to the samba server over and over again.

The proposed fix is to exit and return the error value when encountered
with an unhandled error.

Cc: <stable@vger.kernel.org>
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2015-03-21 10:56:22 -05:00
Kinglong Mee
a1420384e3 NFSD: Put exports after nfsd4_layout_verify fail
Fix commit 9cf514ccfa (nfsd: implement pNFS operations).

Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2015-03-20 16:15:42 -04:00