Commit graph

29510 commits

Author SHA1 Message Date
David S. Miller
8a2cf062b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-29 12:51:17 -05:00
Al Viro
541880d9a2 do_coredump(): get rid of pt_regs argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-29 00:01:25 -05:00
Al Viro
71613c3b87 get rid of pt_regs argument of ->load_binary()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:38 -05:00
Al Viro
3c456bfc4b get rid of pt_regs argument of search_binary_handler()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:38 -05:00
Al Viro
835ab32dff get rid of pt_regs argument of do_execve_common()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:37 -05:00
Al Viro
da3d4c5fa5 get rid of pt_regs argument of do_execve()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:37 -05:00
Al Viro
d03d26e58f make compat_do_execve() static, lose pt_regs argument
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:53:37 -05:00
Al Viro
c4144670fd kill daemonize()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-11-28 21:49:02 -05:00
Theodore Ts'o
4a092d7379 ext4: rationalize ext4_extents.h inclusion
Previously, ext4_extents.h was being included at the end of ext4.h,
which was bad for a number of reasons: (a) it was not being included
in the expected place, and (b) it caused the header to be included
multiple times.  There were #ifdef's to prevent this from causing any
problems, but it still was unnecessary.

By moving the function declarations that were in ext4_extents.h to
ext4.h, which is standard practice for where the function declarations
for the rest of ext4.h can be found, we can remove ext4_extents.h from
being included in ext4.h at all, and then we can only include
ext4_extents.h where it is needed in ext4's source files.

It should be possible to move a few more things into ext4.h, and
further reduce the number of source files that need to #include
ext4_extents.h, but that's a cleanup for another day.

Reported-by: Sachin Kamat <sachin.kamat@linaro.org>
Reported-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28 13:03:30 -05:00
Vahram Martirosyan
766f44d46a ext4: fixed potential NULL dereference in ext4_calculate_overhead()
The memset operation before check can cause a BUG if the memory
allocation failed.  Since we are using get_zeroed_age, there is no
need to use memset anyway.

Found by the Spruce system in cooperation with the KEDR Framework.

Signed-off-by: Vahram Martirosyan <vmartirosyan@linuxtesting.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28 12:44:16 -05:00
Lukas Czerner
06348679c9 ext4: simple cleanup in fiemap codepath
This commit is simple cleanup of fiemap codepath which has not been
included in previous commit to make the changes clearer. In this commit
we rename cbex variable to newex in ext4_fill_fiemap_extents() because
callback is no longer present

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28 12:33:22 -05:00
Lukas Czerner
91dd8c1144 ext4: prevent race while walking extent tree for fiemap
Currently ext4_ext_walk_space() only takes i_data_sem for read when
searching for the extent at given block with ext4_ext_find_extent().
Then it drops the lock and the extent tree can be changed at will.
However later on we're searching for the 'next' extent, but the extent
tree might already have changed, so the information might not be
accurate.

In fact we can hit BUG_ON(end <= start) if the extent got inserted into
the tree after the one we found and before the block we were searching
for. This has been reproduced by running xfstests 225 in loop on s390x
architecture, but theoretically we could hit this on any other
architecture as well, but probably not as often.

Moreover the extent currently in delayed allocation might be allocated
after we search the extent tree and before we search extent status tree
delayed buffers resulting in those delayed buffers being completely
missed, even though completely written and allocated.

We fix all those problems in several steps:

 1. remove unnecessary callback indirection
 2. rename functions
        ext4_ext_walk_space -> ext4_fill_fiemap_extents
        ext4_ext_fiemap_cb -> ext4_find_delayed_extent
 3. move fiemap_fill_next_extent() into ext4_fill_fiemap_extents()
 4. hold the i_data_sem for:
        ext4_ext_find_extent()
        ext4_ext_next_allocated_block()
        ext4_find_delayed_extent()
 5. call fiemap_fill_next_extent after releasing the i_data_sem
 6. move path reinitialization into the critical section.

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2012-11-28 12:32:26 -05:00
Frederic Weisbecker
e80d0a1ae8 cputime: Rename thread_group_times to thread_group_cputime_adjusted
We have thread_group_cputime() and thread_group_times(). The naming
doesn't provide enough information about the difference between
these two APIs.

To lower the confusion, rename thread_group_times() to
thread_group_cputime_adjusted(). This name better suggests that
it's a version of thread_group_cputime() that does some stabilization
on the raw cputime values. ie here: scale on top of CFS runtime
stats and bound lower value for monotonicity.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-11-28 17:07:57 +01:00
Pavel Shilovsky
c772aa92b6 CIFS: Fix wrong buffer pointer usage in smb_set_file_info
Commit 6bdf6dbd66 caused a regression
in setattr codepath that leads to files with wrong attributes.

Signed-off-by: Pavel Shilovsky <piastry@etersoft.ru>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-28 10:02:46 -06:00
Jeff Layton
3a98b86143 cifs: fix writeback race with file that is growing
Commit eddb079deb created a regression in the writepages codepath.
Previously, whenever it needed to check the size of the file, it did so
by consulting the inode->i_size field directly. With that patch, the
i_size was fetched once on entry into the writepages code and that value
was used henceforth.

If the file is changing size though (for instance, if someone is writing
to it or has truncated it), then that value is likely to be wrong. This
can lead to data corruption. Pages past the EOF at the time that the
writepages call was issued may be silently dropped and ignored because
cifs_writepages wrongly assumes that the file must have been truncated
in the interim.

Fix cifs_writepages to properly fetch the size from the inode->i_size
field instead to properly account for this possibility.

Original bug report is here:

    https://bugzilla.kernel.org/show_bug.cgi?id=50991

Reported-and-Tested-by: Maxim Britov <ungifted01@gmail.com>
Reviewed-by: Suresh Jayaraman <sjayaraman@suse.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
2012-11-27 13:46:12 -06:00
Linus Torvalds
2844a48706 Merge branch 'akpm' (Fixes from Andrew)
Merge misc fixes from Andrew Morton:
 "8 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (8 patches)
  futex: avoid wake_futex() for a PI futex_q
  watchdog: using u64 in get_sample_period()
  writeback: put unused inodes to LRU after writeback completion
  mm: vmscan: check for fatal signals iff the process was throttled
  Revert "mm: remove __GFP_NO_KSWAPD"
  proc: check vma->vm_file before dereferencing
  UAPI: strip the _UAPI prefix from header guards during header installation
  include/linux/bug.h: fix sparse warning related to BUILD_BUG_ON_INVALID
2012-11-26 18:33:33 -08:00
Linus Torvalds
87726c334b Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext3 regression fix from Jan Kara:
 "Fix an ext3 regression introduced during 3.7 merge window.  It leads
  to deadlock if you stress the filesystem in the right way (luckily
  only if blocksize < pagesize)."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  jbd: Fix lock ordering bug in journal_unmap_buffer()
2012-11-26 17:42:07 -08:00
Jan Kara
4eff96dd52 writeback: put unused inodes to LRU after writeback completion
Commit 169ebd9013 ("writeback: Avoid iput() from flusher thread")
removed iget-iput pair from inode writeback.  As a side effect, inodes
that are dirty during iput_final() call won't be ever added to inode LRU
(iput_final() doesn't add dirty inodes to LRU and later when the inode
is cleaned there's noone to add the inode there).  Thus inodes are
effectively unreclaimable until someone looks them up again.

The practical effect of this bug is limited by the fact that inodes are
pinned by a dentry for long enough that the inode gets cleaned.  But
still the bug can have nasty consequences leading up to OOM conditions
under certain circumstances.  Following can easily reproduce the
problem:

  for (( i = 0; i < 1000; i++ )); do
    mkdir $i
    for (( j = 0; j < 1000; j++ )); do
      touch $i/$j
      echo 2 > /proc/sys/vm/drop_caches
    done
  done

then one needs to run 'sync; ls -lR' to make inodes reclaimable again.

We fix the issue by inserting unused clean inodes into the LRU after
writeback finishes in inode_sync_complete().

Signed-off-by: Jan Kara <jack@suse.cz>
Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: <stable@vger.kernel.org>		[3.5+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26 17:41:24 -08:00
Stanislav Kinsbursky
05f564849d proc: check vma->vm_file before dereferencing
Commit 7b540d0646 ("proc_map_files_readdir(): don't bother with
grabbing files") switched proc_map_files_readdir() to use @f_mode
directly instead of grabbing @file reference, but same time the test for
@vm_file presence was lost leading to nil dereference.  The patch brings
the test back.

The all proc_map_files feature is CONFIG_CHECKPOINT_RESTORE wrapped
(which is set to 'n' by default) so the bug doesn't affect regular
kernels.

The regression is 3.7-rc1 only as far as I can tell.

[gorcunov@openvz.org: provided changelog]
Signed-off-by: Stanislav Kinsbursky <skinsbursky@parallels.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-26 17:41:24 -08:00
Josh Triplett
1f20dfdaed sysfs: Mark sysfs_attr_ns static
Nothing outside of fs/sysfs/file.c references this function, so mark it static.

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2012-11-26 16:25:36 -08:00
Seiji Aguchi
755d4fe465 efi_pstore: Add a sequence counter to a variable name
[Issue]

Currently, a variable name, which identifies each entry, consists of type, id and ctime.
But if multiple events happens in a short time, a second/third event may fail to log because
efi_pstore can't distinguish each event with current variable name.

[Solution]

A reasonable way to identify all events precisely is introducing a sequence counter to
the variable name.

The sequence counter has already supported in a pstore layer with "oopscount".
So, this patch adds it to a variable name.
Also, it is passed to read/erase callbacks of platform drivers in accordance with
the modification of the variable name.

  <before applying this patch>
 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-12345678

  type:0
  id:1
  ctime:12345678

 If multiple events happen in a short time, efi_pstore can't distinguish them because
 variable names are same among them.

  <after applying this patch>

 it can be distinguishable by adding a sequence counter as follows.

 a variable name of first event: dump-type0-1-1-12345678
 a variable name of Second event: dump-type0-1-2-12345678

  type:0
  id:1
  sequence counter: 1(first event), 2(second event)
  ctime:12345678

In case of a write callback executed in pstore_console_write(), "0" is added to
an argument of the write callback because it just logs all kernel messages and
doesn't need to care about multiple events.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Mike Waychison <mikew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-11-26 16:07:44 -08:00
Seiji Aguchi
a9efd39cd5 efi_pstore: Add ctime to argument of erase callback
[Issue]

Currently, a variable name, which is used to identify each log entry, consists of type,
id and ctime. But an erase callback does not use ctime.

If efi_pstore supported just one log, type and id were enough.
However, in case of supporting multiple logs, it doesn't work because
it can't distinguish each entry without ctime at erasing time.

 <Example>

 As you can see below, efi_pstore can't differentiate first event from second one without ctime.

 a variable name of first event: dump-type0-1-12345678
 a variable name of second event: dump-type0-1-23456789

  type:0
  id:1
  ctime:12345678, 23456789

[Solution]

This patch adds ctime to an argument of an erase callback.

It works across reboots because ctime of pstore means the date that the record was originally stored.
To do this, efi_pstore saves the ctime to variable name at writing time and passes it to pstore
at reading time.

Signed-off-by: Seiji Aguchi <seiji.aguchi@hds.com>
Acked-by: Mike Waychison <mikew@google.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
2012-11-26 16:02:12 -08:00
Dave Chinner
7c4cebe8e0 xfs: inode allocation should use unmapped buffers.
Inode buffers do not need to be mapped as inodes are read or written
directly from/to the pages underlying the buffer. This fixes a
regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
default behaviour").

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-26 16:01:31 -06:00
David S. Miller
24bc518a68 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/wireless/iwlwifi/pcie/tx.c

Minor iwlwifi conflict in TX queue disabling between 'net', which
removed a bogus warning, and 'net-next' which added some status
register poking code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-25 12:49:17 -05:00
Linus Torvalds
35f95d228e Most important part of this is that it fixes a regression in Samsung
NAND chip detection, introduced by some rework which went into 3.7. The
 initial fix wasn't quite complete, so it's in two parts. In fact the
 first part is committed twice (Artem committed his own copy of the same
 patch) and I've merged Artem's tree into mine which already had that fix.
 
 I'd have recommitted that to make it somewhat cleaner, but figured by
 this point in the release cycle it was better to merge *exactly* the
 commits which have been in linux-next.
 
 If I'd recommitted, I'd also omit the sparse warning fix. But it's there,
 and it's harmless — just marking one function as 'static' in onenand code.
 
 This also includes a couple more fixes for stable: an AB-BA deadlock in
 JFFS2, and an invalid range check in slram.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iEYEABECAAYFAlCwEIsACgkQdwG7hYl686NfZgCfSYFA2q8yp7jEMdDaxpFPuuDm
 FFMAoI3V27BpWxRab6GylYh8erHp9ful
 =Wo+T
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6

Pull MTD fixes from David Woodhouse:
 "The most important part of this is that it fixes a regression in
  Samsung NAND chip detection, introduced by some rework which went into
  3.7.  The initial fix wasn't quite complete, so it's in two parts.  In
  fact the first part is committed twice (Artem committed his own copy
  of the same patch) and I've merged Artem's tree into mine which
  already had that fix.

  I'd have recommitted that to make it somewhat cleaner, but figured by
  this point in the release cycle it was better to merge *exactly* the
  commits which have been in linux-next.

  If I'd recommitted, I'd also omit the sparse warning fix.  But it's
  there, and it's harmless — just marking one function as 'static' in
  onenand code.

  This also includes a couple more fixes for stable: an AB-BA deadlock
  in JFFS2, and an invalid range check in slram."

* tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6:
  mtd: nand: fix Samsung SLC detection regression
  mtd: nand: fix Samsung SLC NAND identification regression
  jffs2: Fix lock acquisition order bug in jffs2_write_begin
  mtd: onenand: Make flexonenand_set_boundary static
  mtd: slram: invalid checking of absolute end address
  mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions()
  mtd: nand: fix Samsung SLC NAND identification regression
2012-11-23 15:12:17 -10:00
Jan Kara
25389bb207 jbd: Fix lock ordering bug in journal_unmap_buffer()
Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

CC: stable@vger.kernel.org # Wherever commit 09e05d48 was taken
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-23 15:17:18 +01:00
Bob Peterson
1e2d9d44f3 GFS2: Set gl_object during inode create
This patch fixes a cluster coherency problem that occurs when one
node creates a file, does several writes, then a different node
tries to write to the same file. When the inode's glock is demoted,
the inode wasn't synced to the media properly because the gl_object
wasn't set. Later, the flush daemon noticed the uncommitted data
and tried to flush it, only to discover the glock was no longer locked
properly in exclusive mode. That caused an assert withdraw.

Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
2012-11-21 14:49:21 +00:00
Linus Torvalds
ca6215dfc7 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull reiserfs and ext3 fixes from Jan Kara:
 "Fixes of reiserfs deadlocks when quotas are enabled (locking there was
  completely busted by BKL conversion) and also one small ext3 fix in
  the trim interface."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
  ext3: Avoid underflow of in ext3_trim_fs()
  reiserfs: Move quota calls out of write lock
  reiserfs: Protect reiserfs_quota_write() with write lock
  reiserfs: Protect reiserfs_quota_on() with write lock
  reiserfs: Fix lock ordering during remount
2012-11-20 18:48:25 -10:00
Christoph Hellwig
0e446be448 xfs: add CRC checks to the log
Implement CRCs for the log buffers.  We re-use a field in
struct xlog_rec_header that was used for a weak checksum of the
log buffer payload in debug builds before.

The new checksumming uses the crc32c checksum we will use elsewhere
in XFS, and also protects the record header and addition cycle data.

Due to this there are some interesting changes in xlog_sync, as we
need to do the cycle wrapping for the split buffer case much earlier,
as we would touch the buffer after generating the checksum otherwise.

The CRC calculation is always enabled, even for non-CRC filesystems,
as adding this CRC does not change the log format. On non-CRC
filesystems, only issue an alert if a CRC mismatch is found and
allow recovery to continue - this will act as an indicator that
log recovery problems are a result of log corruption. On CRC enabled
filesystems, however, log recovery will fail.

Note that existing debug kernels will write a simple checksum value
to the log, so the first time this is run on a filesystem taht was
last used on a debug kernel it will through CRC mismatch warning
errors. These can be ignored.

Initially based on a patch from Dave Chinner, then modified
significantly by Christoph Hellwig.  Modified again by Dave Chinner
to get to this version.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-19 20:18:41 -06:00
Christoph Hellwig
bc02e8693d xfs: add CRC infrastructure
- add a mount feature bit for CRC enabled filesystems
 - add some helpers for generating and verifying the CRCs
 - add a copy_uuid helper

The checksumming helpers are loosely based on similar ones in sctp,
all other bits come from Dave Chinner.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-19 20:11:24 -06:00
Lukas Czerner
ae49eeec78 ext3: Avoid underflow of in ext3_trim_fs()
Currently if len argument in ext3_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:36:12 +01:00
Jan Kara
7af1168693 reiserfs: Move quota calls out of write lock
Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:34:33 +01:00
Jan Kara
361d94a338 reiserfs: Protect reiserfs_quota_write() with write lock
Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:34:33 +01:00
Jan Kara
b9e06ef2e8 reiserfs: Protect reiserfs_quota_on() with write lock
In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:34:32 +01:00
Jan Kara
3bb3e1fc47 reiserfs: Fix lock ordering during remount
When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara <jack@suse.cz>
2012-11-19 21:34:32 +01:00
Adam Buchbinder
b3834be5c4 various: Fix spelling of "asynchronous" in comments.
"Asynchronous" is misspelled in some comments. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:32:13 +01:00
Adam Buchbinder
48fc7f7e78 Fix misspellings of "whether" in comments.
"Whether" is misspelled in various comments across the tree; this
fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:31:35 +01:00
Masanari Iida
02582e9bcc treewide: fix typo of "suport" in various comments and Kconfig
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-11-19 14:16:09 +01:00
Eric W. Biederman
73f7ef4359 sysctl: Pass useful parameters to sysctl permissions
- Current is implicitly avaiable so passing current->nsproxy isn't useful.
- The ctl_table_header is needed to find how the sysctl table is connected
  to the rest of sysctl.
- ctl_table_root is avaiable in the ctl_table_header so no need to it.

With these changes it becomes possible to write a version of
net_sysctl_permission that takes into account the network namespace of
the sysctl table, an important feature in extending the user namespace.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-18 20:30:55 -05:00
Al Viro
3587b1b097 fanotify: fix FAN_Q_OVERFLOW case of fanotify_read()
If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event
metadata will not contain a valid file descriptor, but
copy_event_to_user() didn't check for that, and unconditionally does a
fd_install() on the file descriptor.

Which in turn will cause a BUG_ON() in __fd_install().

Introduced by commit 352e3b2492 ("fanotify: sanitize failure exits in
copy_event_to_user()")

Mea culpa - missed that path ;-/

Reported-by: Alex Shi <lkml.alex@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-18 09:30:00 -10:00
Linus Torvalds
8d938105e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc VFS fixes from Al Viro:
 "Remove a bogus BUG_ON() that can trigger spuriously + alpha bits of
  do_mount() constification I'd missed during the merge window."

This pull request came in a week ago, I missed it for some reason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  kill bogus BUG_ON() in do_close_on_exec()
  missing const in alpha callers of do_mount()
2012-11-18 09:13:48 -10:00
Linus Torvalds
d28d3730fd xfs: bugfixes for 3.7-rc7
- fix attr tree double split corruption
 - fix broken error handling in xfs_vm_writepage
 - drop buffer io reference when a bad bio is built
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.10 (GNU/Linux)
 
 iQIcBAABAgAGBQJQp7sfAAoJENaLyazVq6ZOWHwP/2WTlenvM74i8HDa/nYW8KTC
 EubCZ6X1C7LPTV9tm9YUpKZ1VtI1O+OmuGcSmWdBKSMMoBVNyKvWXvrJeVKBVtXV
 sQ/jh1zCiPYzt9DfxGuarkw8Uy5qKNOYrbEAK1WwPMeOsDODYncfmTm+A/VYMeTt
 bWOjaxFd5QQOMuf0x9NO/keZc84R5l51ezYxA7HyYa5XvV/MDmLLVL0IhuSTFKyw
 oOiQMp0hby4zsJg6nqu/eINmdlgBIw+32m8aMSB2jreUQm4yvt0CY7M3Zq6sPmsM
 2tC6cFonPw31FBBu9jvv9h5wNz7McyzxtZBS0+zDV+7K0UrIyxWm1BhzZIXoXzLz
 vHwc4gnZV8nOP/g34aftHLYYRD3ZJhG8mX5AdBRzlWWqDSFvYVEq+1evHrv8kk4l
 coTapzimNnR3aJ16qdP1M0gExKO9nrGVqrRi8ndLNbxLpxC9mFG7CfJBQPMumukX
 G8pTV1wQvqONHDNlN4mxqMBHN0d9dGp5xjYQ0Q92/siIA1C5szjCwTHekKNrP6Ol
 7xd+nO7Xcgj7Uwaakv31paqOSAGhla6H5jvxPF2A54hZWQqlp88QpChLt3LFPxwh
 tEYTEf1zRoaoCS4TD3zMYTLY+9cXvUybSIf3hbgns+JMYHJtuZdzbvcaXE6Wl4Jr
 6esA5fsBFP1J2/EzpLof
 =depY
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs

Pull xfs bugfixes from Ben Myers:

 - fix attr tree double split corruption

 - fix broken error handling in xfs_vm_writepage

 - drop buffer io reference when a bad bio is built

* tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs:
  xfs: drop buffer io reference when a bad bio is built
  xfs: fix broken error handling in xfs_vm_writepage
  xfs: fix attr tree double split corruption
2012-11-18 08:29:34 -10:00
Ingo Molnar
ec05a2311c Merge branch 'sched/urgent' into sched/core
Merge in fixes before we queue up dependent bits, to avoid conflicts.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-11-18 09:34:44 +01:00
Maxime Bizon
b042e47491 pstore/ram: Fix undefined usage of rounddown_pow_of_two(0)
record_size / console_size / ftrace_size can be 0 (this is how you disable
the feature), but rounddown_pow_of_two(0) is undefined. As suggested by
Kees Cook, use !is_power_of_2() as a condition to call
rounddown_pow_of_two and avoid its undefined behavior on the value 0. This
issue has been present since commit 1894a253 (ramoops: Move to
fs/pstore/ram.c).

Cc: stable@vger.kernel.org
Signed-off-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Florian Fainelli <ffainelli@freebox.fr>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
2012-11-17 17:40:57 -08:00
Dave Chinner
d69043c42d xfs: drop buffer io reference when a bad bio is built
Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

cc: <stable@vger.kernel.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-17 09:36:57 -06:00
Dave Chinner
3daed8bc3e xfs: fix broken error handling in xfs_vm_writepage
When we shut down the filesystem, it might first be detected in
writeback when we are allocating a inode size transaction. This
happens after we have moved all the pages into the writeback state
and unlocked them. Unfortunately, if we fail to set up the
transaction we then abort writeback and try to invalidate the
current page. This then triggers are BUG() in block_invalidatepage()
because we are trying to invalidate an unlocked page.

Fixing this is a bit of a chicken and egg problem - we can't
allocate the transaction until we've clustered all the pages into
the IO and we know the size of it (i.e. whether the last block of
the IO is beyond the current EOF or not). However, we don't want to
hold pages locked for long periods of time, especially while we lock
other pages to cluster them into the write.

To fix this, we need to make a clear delineation in writeback where
errors can only be handled by IO completion processing. That is,
once we have marked a page for writeback and unlocked it, we have to
report errors via IO completion because we've already started the
IO. We may not have submitted any IO, but we've changed the page
state to indicate that it is under IO so we must now use the IO
completion path to report errors.

To do this, add an error field to xfs_submit_ioend() to pass it the
error that occurred during the building on the ioend chain. When
this is non-zero, mark each ioend with the error and call
xfs_finish_ioend() directly rather than building bios. This will
immediately push the ioends through completion processing with the
error that has occurred.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-17 09:35:42 -06:00
Dave Chinner
42e2976f13 xfs: fix attr tree double split corruption
In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2.  The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty.  The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

     +--------+     +--------+
     |   L1   |     |   L2   |
     | fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |
     | fsp: 0 |     | fsp: N |
     |--------|     |--------|
     | attr A |     | attr 1 |
     |--------|     |--------|
     | attr B |     | attr 2 |
     |--------|     |--------|
     ..........     ..........
     |--------|     |--------|
     | attr X |     | attr n |
     +--------+     +--------+

So now we go to replace X, and see that L1:fsp = 0 - it is full so
we can't insert Y in the same leaf. So we record the the location of
attribute X so we can track it for later use, then we split L1 into
L1 and L3 and reblance across the two leafs. We end with:

     +--------+     +--------+     +--------+
     |   L1   |     |   L3   |     |   L2   |
     | fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|
     | attr A |     | attr X |     | attr 1 |
     |--------|     +--------+     |--------|
     | attr B |                    | attr 2 |
     |--------|                    |--------|
     ..........                    ..........
     |--------|                    |--------|
     | attr W |                    | attr n |
     +--------+                    +--------+

And we track that the original attribute is now at L3:0.

We then try to insert Y into L1 again, and find that there isn't
enough room because the new attribute is larger than the old one.
Hence we have to split again to make room for Y. We end up with
this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     + INCOMP +     +--------+     |--------|
     | attr B |     +--------+                    | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

And now we have the new (incomplete) attribute @ L4:0, and the
original attribute at L3:0. At this point, the first transaction is
committed, and we move to the flipping of the flags.

This is where we are supposed to end up with this:

     +--------+     +--------+     +--------+     +--------+
     |   L1   |     |   L4   |     |   L3   |     |   L2   |
     | fwd: 4 |---->| fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
     | bwd: 0 |<----| bwd: 1 |<----| bwd: 4 |<----| bwd: 3 |
     | fsp: M |     | fsp: J |     | fsp: J |     | fsp: N |
     |--------|     |--------|     |--------|     |--------|
     | attr A |     | attr Y |     | attr X |     | attr 1 |
     |--------|     +--------+     + INCOMP +     |--------|
     | attr B |                    +--------+     | attr 2 |
     |--------|                                   |--------|
     ..........                                   ..........
     |--------|                                   |--------|
     | attr W |                                   | attr n |
     +--------+                                   +--------+

But that doesn't happen properly - the attribute tracking indexes
are not pointing to the right locations. What we end up with is both
the old attribute to be removed pointing at L4:0 and the new
attribute at L4:1.  On a debug kernel, this assert fails like so:

XFS: Assertion failed: args->index2 < be16_to_cpu(leaf2->hdr.count), file: fs/xfs/xfs_attr_leaf.c, line: 2725

because the new attribute location does not exist. On a production
kernel, this goes unnoticed and the code proceeds ahead merrily and
removes L4 because it thinks that is the block that is no longer
needed. This leaves the hash index node pointing to entries
L1, L4 and L2, but only blocks L1, L3 and L2 to exist. Further, the
leaf level sibling list is L1 <-> L4 <-> L2, but L4 is now free
space, and so everything is busted. This corruption is caused by the
removal of the old attribute triggering a join - it joins everything
correctly but then frees the wrong block.

xfs_repair will report something like:

bad sibling back pointer for block 4 in attribute fork for inode 131
problem with attribute contents in inode 131
would clear attr fork
bad nblocks 8 for inode 131, would reset to 3
bad anextents 4 for inode 131, would reset to 0

The problem lies in the assignment of the old/new blocks for
tracking purposes when the double leaf split occurs. The first split
tries to place the new attribute inside the current leaf (i.e.
"inleaf == true") and moves the old attribute (X) to the new block.
This sets up the old block/index to L1:X, and newly allocated
block to L3:0. It then moves attr X to the new block and tries to
insert attr Y at the old index. That fails, so it splits again.

With the second split, the rebalance ends up placing the new attr in
the second new block - L4:0 - and this is where the code goes wrong.
What is does is it sets both the new and old block index to the
second new block. Hence it inserts attr Y at the right place (L4:0)
but overwrites the current location of the attr to replace that is
held in the new block index (currently L3:0). It over writes it with
L4:1 - the index we later assert fail on.

Hopefully this table will show this in a foramt that is a bit easier
to understand:

Split		old attr index		new attr index
		vanilla	patched		vanilla	patched
before 1st	L1:26	L1:26		N/A	N/A
after 1st	L3:0	L3:0		L1:26	L1:26
after 2nd	L4:0	L3:0		L4:1	L4:0
                ^^^^			^^^^
		wrong			wrong

The fix is surprisingly simple, for all this analysis - just stop
the rebalance on the out-of leaf case from overwriting the new attr
index - it's already correct for the double split case.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Ben Myers <bpm@sgi.com>
2012-11-17 09:34:13 -06:00
Hannes Reinecke
53f21a8ea1 pstore/ram: Fixup section annotations
The compiler complained about missing section annotations.
Fix it.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Colin Cross <ccross@android.com>
Cc: Tony Luck <tony.luck@intel.com>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
2012-11-16 18:42:06 -08:00
Greg Kroah-Hartman
1e619a1bf9 Merge 3.7-rc6 into tty-next 2012-11-16 18:26:00 -08:00
David Rientjes
fa0cbbf145 mm, oom: reintroduce /proc/pid/oom_adj
This is mostly a revert of 01dc52ebdf ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.

It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels.  It simply scales the value linearly when /proc/pid/oom_score_adj
is written.

The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt.  We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.

Reported-by: Artem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-11-16 10:15:35 -08:00