Commit graph

27421 commits

Author SHA1 Message Date
Josef Bacik
0885ef5b56 Btrfs: do not do filemap_write_and_wait_range in fsync
We already do the btrfs_wait_ordered_range which will do this for us, so
just remove this call so we don't call it twice.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:29 -04:00
Josef Bacik
551ebb2d34 Btrfs: remove useless waiting and extra filemap work
In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
because compression does strange things and then waiting.  Then we look up
ordered extents and if we find any we will always schedule_timeout(); once
and then loop back around and do it all again.  We will even check to see if
there is delalloc pages on this range and loop again.  So this patch gets
rid of the multipe fdata_write() calls and just does
filemap_write_and_wait().  In the case of compression we will still find the
ordered extents and start those individually if we need to so that is ok,
but in the normal buffered case we avoid all this weird overhead.

Then in the case of the schedule_timeout(1), we don't need it.  All callers
either 1) don't care, they just want to make sure what they just wrote maeks
it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
in which case it will lock and check for ordered extents _anyway_ so get
back to them as quickly as possible.  The delaloc check is simply not
needed, this only catches the case where we write to the file again since
doing the filemap_write_and_wait() and if the caller truly cares about that
it will take care of everything itself.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
d7dbe9e7f6 Btrfs: fix compile warnings in extent_io.c
These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,
Btrfs: fix compile warnings in extent_io.c

These warnings are bogus since we will always have at least one page in an
eb, but to make the compiler happy just set ret = 0 in these two cases.
Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:28 -04:00
Josef Bacik
30f8fe3e47 Btrfs: cache no acl on new inodes
When running compilebench I noticed we were spending some time looking up
acls on new inodes, which shouldn't be happening since there were no acls.
This is because when we init acls on the inode after creating them we don't
cache the fact there are no acls if there aren't any.  Doing this adds a
little bit of a bump to my compilebench runs.  Thanks,
Btrfs: cache no acl on new inodes

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Josef Bacik
0c4d2d95d0 Btrfs: use i_version instead of our own sequence
We've been keeping around the inode sequence number in hopes that somebody
would use it, but nobody uses it and people actually use i_version which
serves the same purpose, so use i_version where we used the incore inode's
sequence number and that way the sequence is updated properly across the
board, and not just in file write.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>
2012-05-30 10:23:27 -04:00
Jan Schmidt
20b297d620 Btrfs: tree mod log sanity checks in join_transaction
When a fresh transaction begins, the tree mod log must be clean. Users of
the tree modification log must ensure they never span across transaction
boundaries.

We reset the sequence to 0 in this safe situation to make absolutely sure
overflow can't happen.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:36 +02:00
Jan Schmidt
19ae4e8133 Btrfs: fs_info variable for join_transaction
Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:35 +02:00
Jan Schmidt
8445f61cad Btrfs: use the tree modification log for backref resolving
This enables backref resolving on life trees while they are changing. This
is a prerequisite for quota groups and just nice to have for everything
else.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:34 +02:00
Jan Schmidt
5d9e75c41d Btrfs: add btrfs_search_old_slot
The tree modification log together with the current state of the tree gives
a consistent, old version of the tree. btrfs_search_old_slot is used to
search through this old version and return old (dummy!) extent buffers.
Naturally, this function cannot do any tree modifications.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:33 +02:00
Jan Schmidt
f3ea38da3e Btrfs: add del_ptr and insert_ptr modifications to the tree mod log
Record all relevant modifications to block pointers in the tree mod log so
that we can rewind them later on for backref walking.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:32 +02:00
Jan Schmidt
f230475e62 Btrfs: put all block modifications into the tree mod log
When running functions that can make changes to the internal trees
(e.g. btrfs_search_slot), we check if somebody may be interested in the
block we're currently modifying. If so, we record our modification to be
able to rewind it later on.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:29 +02:00
Jan Schmidt
bd989ba359 Btrfs: add tree modification log functions
The tree mod log will log modifications made fs-tree nodes. Most
modifications are done by autobalance of the tree. Such changes are recorded
as long as a block entry exists. When released, the log is cleaned.

With the tree modification log, it's possible to reconstruct a consistent
old state of the tree. This is required to do backref walking on a busy
file system.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
2012-05-30 15:17:01 +02:00
Al Viro
1676765238 get rid of idiotic misplaced __kernel_mode_t in ncfps kernel-private data structure
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:42 -04:00
Andi Kleen
962830df36 brlocks/lglocks: API cleanups
lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

In preparation, this patch changes the API to look more like normal
function calls with pointers, not magic macros.

The patch is rather large because I move over all users in one go to keep
it bisectable.  This impacts the VFS somewhat in terms of lines changed.
But no actual behaviour change.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:41 -04:00
Andi Kleen
eea62f831b brlocks/lglocks: turn into functions
lglocks and brlocks are currently generated with some complicated macros
in lglock.h.  But there's no reason to not just use common utility
functions and put all the data into a common data structure.

Since there are at least two users it makes sense to share this code in a
library.  This is also easier maintainable than a macro forest.

This will also make it later possible to dynamically allocate lglocks and
also use them in modules (this would both still need some additional, but
now straightforward, code)

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:41 -04:00
Al Viro
ea022dfb3c ocfs: simplify symlink handling
seeing that "fast" symlinks still get allocation + copy, we might as
well simply switch them to pagecache-based variant of ->follow_link();
just need an appropriate ->readpage() for them...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:40 -04:00
Al Viro
408bd629ba get rid of pointless allocations and copying in ecryptfs_follow_link()
switch to generic_readlink(), while we are at it

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:40 -04:00
Al Viro
28fe3c1963 hpfs: assorted endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro
77ee26e44c hpfs: annotate ea
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro
46287aa652 hpfs: annotate struct hpfs_dirent
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:39 -04:00
Al Viro
6ce2bbba52 hpfs: annotate struct anode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro
2b9f1cc29b hpfs: annotate struct fnode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro
ddc19e6e04 hpfs: annotate btree nodes, get rid of bitfields mess
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:38 -04:00
Al Viro
39413c6046 hpfs: annotate struct dnode
little-endians...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro
52576da354 hpfs: bitmaps are little-endian
annotate properly...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro
c4c995430a hpfs: get rid of bitfields in struct fnode
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:37 -04:00
Al Viro
4085e155b1 hpfs: get rid of bitfields endianness wanking in extended_attribute
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Randy Dunlap
185553b224 fs: fix inode.c kernel-doc warnings
Fix kernel-doc warnings in fs/inode.c:

Warning(fs/inode.c:1493): No description found for parameter 'path'
Warning(fs/inode.c:1493): Excess function parameter 'mnt' description in 'touch_atime'
Warning(fs/inode.c:1493): Excess function parameter 'dentry' description in 'touch_atime'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Al Viro
de5e2b3628 hpfs: endianness bugs
a couple of le32 and le16 used with wrong le..._to_cpu(), plus
idiotic use of le32_to_cpu() on 1-bit bitfield

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:36 -04:00
Al Viro
528c032764 btrfs: trivial endianness annotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro
1db5df98fa ocfs2: kill endianness abuses in blockcheck.c
ocfs2_block_check is for little-endian contents; if we just want to
its fields converted to host-endian in a couple of functions, just
put those values into local u32 and u16...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro
f6a5690324 ocfs2: deal with __user misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:35 -04:00
Al Viro
8515841086 ocfs2: trivial endianness misannotations
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Al Viro
66f8f50920 affs: bury unused macros
... unused since 2.4.4.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Al Viro
af569596a9 kill v9fs_dentry_from_dir_inode()
In *all* callers we have a dentry of child of that directory.
Just use ->d_parent of that one, for fsck sake...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:34 -04:00
Sage Weil
c862868bb4 ceph: move encode_fh to new API
Use parent_inode has a flag for whether nfsd wants a connectable fh, but
generate one opportunistically so that we can take advantage of the
additional info in there.

Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro
b0b0382bb4 ->encode_fh() API change
pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:33 -04:00
Al Viro
6d42e7e9f6 ubifs: use generic_fillattr()
don't open-code it...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
77ba78776e xfs: switch to proper __bitwise type for KM_... flags
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
c217a2a004 switch utimes() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:32 -04:00
Al Viro
0aa2ee5f0a switch statfs to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro
bdc689594b switch flock to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:31 -04:00
Al Viro
20ba5d736f switch signalfd4() to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
545ec2c794 switch fcntl to fget_raw_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
7449af1e8b switch xattr syscalls to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:30 -04:00
Al Viro
863ced7fe7 switch readdir/getdents to fget_light/fput_light
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Al Viro
c2bd6c11cd switch do_fsync() to fget_light()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-05-29 23:28:29 -04:00
Linus Torvalds
7d36014b97 Merge branch 'akpm' (Andrew's patch-bomb)
Merge patches through Andrew Morton:
 "180 patches - err 181 - listed below:

   - most of MM.  I held back the (large) "memcg: add hugetlb extension"
     series because a bunfight has recently broken out.

   - leds.  After this, Bryan Wu will be handling drivers/leds/

   - backlight

   - lib/

   - rtc"

* emailed from Andrew Morton <akpm@linux-foundation.org>: (181 patches)
  drivers/rtc/rtc-s3c.c: fix compiler warning
  drivers/rtc/rtc-tegra.c: clean up probe/remove routines
  drivers/rtc/rtc-pl031.c: remove RTC timer interrupt handling
  drivers/rtc/rtc-lpc32xx.c: add device tree support
  drivers/rtc/rtc-m41t93.c: don't let get_time() reset M41T93_FLAG_OF
  rtc: ds1307: add trickle charger support
  rtc: ds1307: remove superfluous initialization
  rtc: rename CONFIG_RTC_MXC to CONFIG_RTC_DRV_MXC
  drivers/rtc/Kconfig: place RTC_DRV_IMXDI and RTC_MXC under "on-CPU RTC drivers"
  drivers/rtc/rtc-pcf8563.c: add RTC_VL_READ/RTC_VL_CLR ioctl feature
  rtc: add ioctl to get/clear battery low voltage status
  drivers/rtc/rtc-ep93xx.c: convert to use module_platform_driver()
  rtc/spear: add Device Tree probing capability
  lib/vsprintf.c: "%#o",0 becomes '0' instead of '00'
  radix-tree: fix preload vector size
  spinlock_debug: print kallsyms name for lock
  vsprintf: fix %ps on non symbols when using kallsyms
  lib/bitmap.c: fix documentation for scnprintf() functions
  lib/string_helpers.c: make arrays static
  lib/test-kstrtox.c: mark const init data with __initconst instead of __initdata
  ...
2012-05-29 18:05:31 -07:00
David Rientjes
a7f638f999 mm, oom: normalize oom scores to oom_score_adj scale only for userspace
The oom_score_adj scale ranges from -1000 to 1000 and represents the
proportion of memory available to the process at allocation time.  This
means an oom_score_adj value of 300, for example, will bias a process as
though it was using an extra 30.0% of available memory and a value of
-350 will discount 35.0% of available memory from its usage.

The oom killer badness heuristic also uses this scale to report the oom
score for each eligible process in determining the "best" process to
kill.  Thus, it can only differentiate each process's memory usage by
0.1% of system RAM.

On large systems, this can end up being a large amount of memory: 256MB
on 256GB systems, for example.

This can be fixed by having the badness heuristic to use the actual
memory usage in scoring threads and then normalizing it to the
oom_score_adj scale for userspace.  This results in better comparison
between eligible threads for kill and no change from the userspace
perspective.

Suggested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:24 -07:00
Hugh Dickins
17cf28afea mm/fs: remove truncate_range
Remove vmtruncate_range(), and remove the truncate_range method from
struct inode_operations: only tmpfs ever supported it, and tmpfs has now
converted over to using the fallocate method of file_operations.

Update Documentation accordingly, adding (setlease and) fallocate lines.
And while we're in mm.h, remove duplicate declarations of shmem_lock() and
shmem_file_setup(): everyone is now using the ones in shmem_fs.h.

Based-on-patch-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Cong Wang <amwang@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-05-29 16:22:23 -07:00