Commit graph

16167 commits

Author SHA1 Message Date
Hidetoshi Seto
d5b7c78e97 sched: Remove task_{u,s,g}time()
Now all task_{u,s}time() pairs are replaced by task_times().
And task_gtime() is too simple to be an inline function.

Cleanup them all.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
LKML-Reference: <4B0E16D1.70902@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 12:59:20 +01:00
Hidetoshi Seto
d180c5bcce sched: Introduce task_times() to replace task_{u,s}time() pair
Functions task_{u,s}time() are called in pair in almost all
cases.  However task_stime() is implemented to call task_utime()
from its inside, so such paired calls run task_utime() twice.

It means we do heavy divisions (div_u64 + do_div) twice to get
utime and stime which can be obtained at same time by one set
of divisions.

This patch introduces a function task_times(*tsk, *utime,
*stime) to retrieve utime and stime at once in better, optimized
way.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Spencer Candland <spencer@bluehost.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Balbir Singh <balbir@in.ibm.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
LKML-Reference: <4B0E16AE.906@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 12:59:19 +01:00
Ingo Molnar
16bc67edeb Merge branch 'sched/urgent' into sched/core
Merge reason: Pick up fixes that did not make it into .32.0

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-26 10:50:42 +01:00
Vivek Goyal
d9449ce35a Fix regression in direct writes performance due to WRITE_ODIRECT flag removal
There seems to be a regression in direct write path due to following
commit in for-2.6.33 branch of block tree.

commit 1af60fbd75
Author: Jeff Moyer <jmoyer@redhat.com>
Date:   Fri Oct 2 18:56:53 2009 -0400

    block: get rid of the WRITE_ODIRECT flag

Marking direct writes as WRITE_SYNC_PLUG instead of WRITE_ODIRECT, sets
the NOIDLE flag in bio and hence in request. This tells CFQ to not expect
more request from the queue and not idle on it (despite the fact that
queue's think time is less and it is not seeky).

So direct writers lose big time when competing with sequential readers.

Using fio, I have run one direct writer and two sequential readers and
following are the results with 2.6.32-rc7 kernel and with for-2.6.33
branch.

Test
====
1 direct writer and 2 sequential reader running simultaneously.

[global]
directory=/mnt/sdc/fio/
runtime=10

[seqwrite]
rw=write
size=4G
direct=1

[seqread]
rw=read
size=2G
numjobs=2

2.6.32-rc7
==========
direct writes: aggrb=2,968KB/s
readers	     : aggrb=101MB/s

for-2.6.33 branch
=================
direct write: aggrb=19KB/s
readers	      aggrb=137MB/s

This patch brings back the WRITE_ODIRECT flag, with the difference that we
don't set the BIO_RW_UNPLUG flag so that device is not unplugged after
submission of request and an explicit unplug from submitter is required.

That way we fix the jeff's issue of not enough merging taking place in aio
path as well as make sure direct writes get their fair share.

After the fix
=============
for-2.6.33 + fix
----------------
direct writes: aggrb=2,728KB/s
reads: aggrb=103MB/s

Thanks
Vivek

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-26 09:46:46 +01:00
Ilya Loginov
2d4dc890b5 block: add helpers to run flush_dcache_page() against a bio and a request's pages
Mtdblock driver doesn't call flush_dcache_page for pages in request.  So,
this causes problems on architectures where the icache doesn't fill from
the dcache or with dcache aliases.  The patch fixes this.

The ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE symbol was introduced to avoid
pointless empty cache-thrashing loops on architectures for which
flush_dcache_page() is a no-op.  Every architecture was provided with this
flush pages on architectires where ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE is
equal 1 or do nothing otherwise.

See "fix mtd_blkdevs problem with caches on some architectures" discussion
on LKML for more information.

Signed-off-by: Ilya Loginov <isloginov@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Peter Horton <phorton@bitbox.co.uk>
Cc: "Ed L. Cashin" <ecashin@coraid.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-26 09:16:19 +01:00
Steve French
2f81e752da [CIFS] Fix sparse warning
Also update CHANGES file

Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-25 00:11:31 +00:00
Steve French
cea6234395 [CIFS] Duplicate data on appending to some Samba servers
SMB writes are sent with a starting offset and length. When the server
supports the newer SMB trans2 posix open (rather than using the SMB
NTCreateX) a file can be opened with SMB_O_APPEND flag, and for that
case Samba server assumes that the offset sent in SMBWriteX is unneeded
since the write should go to the end of the file - which can cause
problems if the write was cached (since the beginning part of a
page could be written twice by the client mm).  Jeff suggested that
masking the flag on posix open on the client is easiest for the time
being. Note that recent Samba server also had an unrelated problem with
SMB NTCreateX and append (see samba bugzilla bug number 6898) which
should not affect current Linux clients (unless cifs Unix Extensions
are disabled).

The cifs client did not send the O_APPEND flag on posix open
before 2.6.29 so the fix is unneeded on early kernels.

In the future, for the non-cached case (O_DIRECT, and forcedirectio mounts)
it would be possible and useful to send O_APPEND on posix open (for Windows
case: FILE_APPEND_DATA but not FILE_WRITE_DATA on SMB NTCreateX) but for
cached writes although the vfs sets the offset to end of file it
may fragment a write across pages - so we can't send O_APPEND on
open (could result in sending part of a page twice).

CC: Stable <stable@kernel.org>
Reviewed-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24 22:52:13 +00:00
Steve French
8e6c0332d5 [CIFS] fix oops in cifs_lookup during net boot
Fixes bugzilla.kernel.org bug number 14641

Lookup called during network boot (network root filesystem
for diskless workstation) has case where nd is null in
lookup.  This patch fixes that in cifs_lookup.

(Shirish noted that 2.6.30 and 2.6.31 stable need the same check)

Signed-off-by: Shirish Pargaonkar <shirishp@us.ibm.com>
Acked-by: Jeff Layton <jlayton@redhat.com>
Tested-by:  Vladimir Stavrinov <vs@inist.ru>
CC: Stable <stable@kernel.org>
Signed-off-by: Steve French <sfrench@us.ibm.com>
2009-11-24 22:17:59 +00:00
Wu Fengguang
3f0ca30985 ext4: remove unused parameter wbc from __ext4_journalled_writepage()
CC: Jan Kara <jack@suse.cz> 
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 11:15:44 -05:00
Akira Fujita
ac48b0a1d0 ext4: move_extent_per_page() cleanup
Integrate duplicate lines (acquire/release semaphore and invalidate
extent cache in move_extent_per_page()) into mext_replace_branches(),
to reduce source and object code size.

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 10:31:56 -05:00
Kazuya Mio
446aaa6e7e ext4: initialize moved_len before calling ext4_move_extents()
The move_extent.moved_len is used to pass back the number of exchanged
blocks count to user space.  Currently the caller must clear this
field; but we spend more code space checking for this requirement than
simply zeroing the field ourselves, so let's just make life easier for
everyone all around.

Signed-off-by: Kazuya Mio <k-mio@sx.jp.nec.com>
Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 10:28:48 -05:00
Akira Fujita
94d7c16cbb ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT
At the beginning of ext4_move_extent(), we call
ext4_discard_preallocations() to discard inode PAs of orig and donor
inodes.  But in the following case, blocks can be double freed, so
move ext4_discard_preallocations() to the end of ext4_move_extents().

1. Discard inode PAs of orig and donor inodes with
   ext4_discard_preallocations() in ext4_move_extents().

   orig : [ DATA1 ]
   donor: [ DATA2 ]

2. While data blocks are exchanging between orig and donor inodes, new
   inode PAs is created to orig by other process's block allocation.
   (Since there are semaphore gaps in ext4_move_extents().)  And new
   inode PAs is used partially (2-1).

   2-1 Create new inode PAs to orig inode
   orig : [ DATA1 | used PA1 | free PA1 ]
   donor: [ DATA2 ]

3. Donor inode which has old orig inode's blocks is deleted after
   EXT4_IOC_MOVE_EXT finished (3-1, 3-2).  So the block bitmap
   corresponds to old orig inode's blocks are freed.

   3-1 After EXT4_IOC_MOVE_EXT finished
   orig : [ DATA2 |  free PA1 ]
   donor: [ DATA1 |  used PA1 ]

   3-2 Delete donor inode
   orig : [ DATA2 |  free PA1 ]
   donor: [ FREE SPACE(DATA1) | FREE SPACE(used PA1) ]

4. The double-free of blocks is occurred, when close() is called to
   orig inode.  Because ext4_discard_preallocations() for orig inode
   frees used PA1 and free PA1, though used PA1 is already freed in 3.

   4-1 Double-free of blocks is occurred
   orig : [ DATA2 |  FREE SPACE(free PA1) ]
   donor: [ FREE SPACE(DATA1) | DOUBLE FREE(used PA1) ]

Signed-off-by: Akira Fujita <a-fujita@rs.jp.nec.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 10:19:57 -05:00
Christoph Hellwig
774888bcd6 UBIFS: remove manual O_SYNC handling
generic_file_aio_write already calls into ->fsync to handle O_SYNC/O_DSYNC.
Remove the duplicate call to ubifs_sync_wbufs_by_inode which is already
covered by ubifs_fsync.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-11-24 08:18:55 +02:00
Corentin Chary
9722324e65 UBIFS: support mounting of UBI volume character devices
This patch makes it possible to mount UBI character device
nodes, and use something like:

$ mount -t ubifs /dev/ubi_volume_name /mnt/ubifs

instead of the old restrictive 'nodev' semantics:

$ mount -t ubifs ubi0_0 /mnt/ubifs

[Comments and the patch were amended a bit by Artem]

Signed-off-by: Corentin Chary <corentincj@iksaif.net>
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
2009-11-24 08:18:54 +02:00
Benjamin Herrenschmidt
1e43bee9c7 Merge commit 'origin/master' into next 2009-11-24 17:16:30 +11:00
Tetsuo Handa
fe542cf59b LSM: Move security_path_chmod()/security_path_chown() to after mutex_lock().
We should call security_path_chmod()/security_path_chown() after mutex_lock()
in order to avoid races.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: James Morris <jmorris@namei.org>
2009-11-24 08:49:26 +11:00
Karel Zak
87038c2d5b partitions: read whole sector with EFI GPT header
The size of EFI GPT header is not static, but whole sector is
allocated for the header. The HeaderSize field must be greater
than 92 (= sizeof(struct gpt_header) and must be less than or
equal to the logical block size.

It means we have to read whole sector with the header, because the
header crc32 checksum is calculated according to HeaderSize.

For more details see UEFI standard (version 2.3, May 2009):
  - 5.3.1 GUID Format overview, page 93
  - Table 13. GUID Partition Table Header, page 96

Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-23 09:29:58 +01:00
Karel Zak
7d13af3279 partitions: use sector size for EFI GPT
Currently, kernel uses strictly 512-byte sectors for EFI GPT parsing.
That's wrong.

UEFI standard (version 2.3, May 2009, 5.3.1 GUID Format overview, page
95) defines that LBA is always based on the logical block size. It
means bdev_logical_block_size() (aka BLKSSZGET) for Linux.

This patch removes static sector size from EFI GPT parser.

The problem is reproducible with the latest GNU Parted:

 # modprobe scsi_debug dev_size_mb=50 sector_size=4096

  # ./parted /dev/sdb print
  Model: Linux scsi_debug (scsi)
  Disk /dev/sdb: 52.4MB
  Sector size (logical/physical): 4096B/4096B
  Partition Table: gpt

  Number  Start   End     Size    File system  Name     Flags
   1      24.6kB  3002kB  2978kB               primary
   2      3002kB  6001kB  2998kB               primary
   3      6001kB  9003kB  3002kB               primary

  # blockdev --rereadpt /dev/sdb
  # dmesg | tail -1
   sdb: unknown partition table      <---- !!!

with this patch:

  # blockdev --rereadpt /dev/sdb
  # dmesg | tail -1
   sdb: sdb1 sdb2 sdb3

Signed-off-by: Karel Zak <kzak@redhat.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-11-23 09:29:13 +01:00
Theodore Ts'o
9084d47197 ext4: use ext4_data_block_valid() in ext4_free_blocks()
The block validity framework does a more comprehensive set of checks,
and it saves object code space to use the ext4_data_block_valid() than
the limited open-coded version that had been in ext4_free_blocks().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 20:48:42 -05:00
Theodore Ts'o
1585d8d89a ext4: add check for wraparound in ext4_data_block_valid()
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 20:48:34 -05:00
Theodore Ts'o
e6362609b6 ext4: call ext4_forget() from ext4_free_blocks()
Add the facility for ext4_forget() to be called from
ext4_free_blocks().  This simplifies the code in a large number of
places, and centralizes most of the work of calling ext4_forget() into
a single place.

Also fix a bug in the extents migration code; it wasn't calling
ext4_forget() when releasing the indirect blocks during the
conversion.  As a result, if the system cashed during or shortly after
the extents migration, and the released indirect blocks get reused as
data blocks, the journal replay would corrupt the data blocks.  With
this new patch, fixing this bug was as simple as adding the
EXT4_FREE_BLOCKS_FORGET flags to the call to ext4_free_blocks().

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
2009-11-23 07:17:05 -05:00
Theodore Ts'o
4433871130 ext4: fold ext4_free_blocks() and ext4_mb_free_blocks()
ext4_mb_free_blocks() is only called by ext4_free_blocks(), and the
latter function doesn't really do much.  So merge the two functions
together, such that ext4_free_blocks() is now found in
fs/ext4/mballoc.c.  This saves about 200 bytes of compiled text space.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 07:44:56 -05:00
Theodore Ts'o
b7e57e7c2a ext4: fold ext4_journal_forget() into ext4_forget()
Convert the last two callers of ext4_journal_forget() to use
ext4_forget() instead, and then fold ext4_journal_forget() into
ext4_forget().  This reduces are code complexity and shortens our call
stack.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 21:00:13 -05:00
Theodore Ts'o
e4684b3fbb ext4: fold ext4_journal_revoke() into ext4_forget()
The only caller of ext4_journal_revoke() is ext4_forget(), so we can
fold ext4_journal_revoke() into ext4_forget() to simplify the code and
shorten the call stack.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-24 11:05:59 -05:00
Theodore Ts'o
d6797d14b1 ext4: move ext4_forget() to ext4_jbd2.c
The ext4_forget() function better belongs in ext4_jbd2.c.  This will
allow us to do some cleanup of the ext4_journal_revoke() and
ext4_journal_forget() functions, as well as giving us better error
reporting since we can report the caller of ext4_forget() when things
go wrong.

Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
2009-11-22 20:52:12 -05:00
David Howells
4fa9f4ede8 FS-Cache: Provide nop fscache_stat_d() if CONFIG_FSCACHE_STATS=n
Provide nop fscache_stat_d() macro if CONFIG_FSCACHE_STATS=n lest errors like
the following occur:

	fs/fscache/cache.c: In function 'fscache_withdraw_cache':
	fs/fscache/cache.c:386: error: implicit declaration of function 'fscache_stat_d'
	fs/fscache/cache.c:386: error: 'fscache_n_cop_sync_cache' undeclared (first use in this function)
	fs/fscache/cache.c:386: error: (Each undeclared identifier is reported only once
	fs/fscache/cache.c:386: error: for each function it appears in.)
	fs/fscache/cache.c:392: error: 'fscache_n_cop_dissociate_pages' undeclared (first use in this function)

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:44 +00:00
David Howells
1c2ea8a2c0 SLOW_WORK: Fix GFS2 to #include <linux/module.h> before using THIS_MODULE
GFS2 has been altered to pass THIS_MODULE to slow_work_register_user(), but
hasn't been altered to #include <linux/module.h> to provide it, resulting in
the following error:

	fs/gfs2/recovery.c:596: error: 'THIS_MODULE' undeclared here (not in a function)

Add the missing #include.

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:40 +00:00
David Howells
0109d7e614 SLOW_WORK: Fix CIFS to pass THIS_MODULE to slow_work_register_user()
As of the patch:

	SLOW_WORK: Wait for outstanding work items belonging to a module to clear

	Wait for outstanding slow work items belonging to a module to clear
	when unregistering that module as a user of the facility.  This
	prevents the put_ref code of a work item from being taken away before
	it returns.

slow_work_register_user() takes a module pointer as an argument.  CIFS must now
pass THIS_MODULE as that argument, lest the following error be observed:

	fs/cifs/cifsfs.c: In function 'init_cifs':
	fs/cifs/cifsfs.c:1040: error: too few arguments to function 'slow_work_register_user'

Signed-off-by: David Howells <dhowells@redhat.com>
2009-11-20 21:50:36 +00:00
Frederic Weisbecker
1d2c6cfd40 kill-the-bkl/reiserfs: turn GFP_ATOMIC flag to GFP_NOFS in reiserfs_get_block()
GFP_ATOMIC was used in reiserfs_get_block to not lose the Bkl so that
nobody can modify the tree in the middle of its work. Now that we
kicked out the bkl, we can use a more friendly flag. We use GFP_NOFS
here because we already hold the reiserfs lock.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Alexander Beregalov <a.beregalov@gmail.com>
Cc: Laurent Riffard <laurent.riffard@free.fr>
Cc: Thomas Gleixner <tglx@linutronix.de>
2009-11-20 18:25:02 +01:00
Linus Torvalds
931ed94430 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
  ocfs2: Trivial cleanup of jbd compatibility layer removal
  ocfs2: Refresh documentation
  ocfs2: return f_fsid info in ocfs2_statfs()
  ocfs2: duplicate inline data properly during reflink.
  ocfs2: Move ocfs2_complete_reflink to the right place.
  ocfs2: Return -EINVAL when a device is not ocfs2.
2009-11-19 20:29:05 -08:00
Ryusuke Konishi
0234576d04 nilfs2: add norecovery mount option
This adds "norecovery" mount option which disables temporal write
access to read-only mounts or snapshots during mount/recovery.
Without this option, write access will be even performed for those
types of mounts; the temporal write access is needed to mount root
file system read-only after an unclean shutdown.

This option will be helpful when user wants to prevent any write
access to the device.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Eric Sandeen <sandeen@redhat.com>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
a057d2c011 nilfs2: add helper to get if volume is in a valid state
This adds a helper function, nilfs_valid_fs() which returns if nilfs
is in a valid state or not.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
f50a4c8149 nilfs2: move recovery completion into load_nilfs function
Although mount recovery of nilfs is integrated in load_nilfs()
procedure, the completion of recovery was isolated from the procedure
and performed at the end of the fill_super routine.

This was somewhat confusing since the recovery is needed for the nilfs
object, not for a super block instance.

To resolve the inconsistency, this will integrate the recovery
completion into load_nilfs().

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:52 +09:00
Ryusuke Konishi
050b4142c9 nilfs2: apply readahead for recovery on mount
This inserts readahead in the recovery code.  The readahead request is
issued per segment while searching the latest super root block.

This will shorten mount time after unclean unmount.  A measurement
shows the recovery time was reduced by more than 60 percent:

 e.g. real  0m11.586s -> 0m3.918s  (x 2.96)

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
f021759d74 nilfs2: clean up get/put function of a segment usage
This eliminates obsolete nilfs_get_sufile_get_segment_usage() and
nilfs_set_sufile_segment_usage() from sufile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
071ec54dd7 nilfs2: move routine to set segment usage into sufile
This adds nilfs_sufile_set_segment_usage() function in sufile to
replace direct access to the sufile metadata in log writer code.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
61a189e9c6 nilfs2: move routine marking segment usage dirty into sufile
This adds nilfs_sufile_mark_dirty() function in sufile to replace
nilfs_touch_segusage() function in log writer code.  This is a
preparation for the further cleanup which will move out low level
sufile operations in the log writer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
70622a2091 nilfs2: insert cache operation in palloc get block routines
This implements cache operation in get block routines of palloc code:
nilfs_palloc_get_desc_block(), nilfs_palloc_get_bitmap_block(), and
nilfs_palloc_get_entry_block().

This will complete the palloc cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:51 +09:00
Ryusuke Konishi
49fa7a5902 nilfs2: add palloc cache to ifile
This adds the palloc cache to ifile.  The palloc cache is allocated on
the extended region of nilfs_mdt_info struct.  The struct
nilfs_ifile_info defines the extended on memory structure of ifile.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
c3ea56c800 nilfs2: flush palloc cache before manipulating data pages of GC dat
Data pages in gcdat metadata file (i.e. the secondary DAT for GC), are
cleared or even moved back to the normal DAT when a shot of garbage
collection was done.

Buffer heads held by the palloc cache of gcdat must be cleared before
these page cache manipulation.  This adds nilfs_palloc_clear_cache()
to ensure this.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
8908b2f70b nilfs2: add palloc cache to dat
This adds the palloc cache to DAT file.  The palloc cache is allocated
on the extended region of nilfs_mdt_info struct.  The struct
nilfs_dat_info defines the extended on memory structure of DAT.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
db38d5ad32 nilfs2: add cache framework for persistent object allocator
This adds setup and cleanup routines of the persistent object
allocator cache.

According to ftrace analyses, accessing buffers of the DAT file
suffers indispensable overhead many times.  To mitigate the overhead,
This introduce cache framework for the persistent object allocator
(palloc) which the DAT file and ifile are using.

struct nilfs_palloc_cache represents the cache object per metadata
file using palloc.

The cache is initialized through nilfs_palloc_setup_cache() and
destroyed by nilfs_palloc_destroy_cache(); callers of the former
function will be added to individual allocators of DAT and ifile on
successive patches.

nilfs_palloc_destroy_cache() will be called from nilfs_mdt_destroy()
if the cache is attached to a metadata file.  A companion function
nilfs_palloc_clear_cache() is provided to allow releasing buffer head
references independently with the cleanup task.  This adjunctive
function will be used before invalidating pages of metadata file with
the cache.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
141bbdba9c nilfs2: unfold nilfs_palloc_block_get_bitmap function
This expands a trivial address calculation in the function into its
every callsite. This expansion improves readability of the callers.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
1376e931b7 nilfs2: eliminate nilfs_btnode_get function
This removes the obsolete nilfs_btnode_get() function and makes
nilfs_btree_get_block() directly call nilfs_btnode_submit_block().

This expansion will provide better opportunity for code optimization.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
75f65edfcc nilfs2: remove newblk argument from nilfs_btnode_submit_block
This removes the obsolete argument from nilfs_btnode_submit_block().
This will complete separating a create function of btree node.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:50 +09:00
Ryusuke Konishi
45f4910bc0 nilfs2: use nilfs_btnode_create_block function
This displaces nilfs_btnode_get() use to create new btree node block
with nilfs_btnode_create_block.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
d501d73689 nilfs2: separate function for creating new btree node block
Adds a separate routine for creating a btree node block.  This is a
preparation to reduce the depth of function calls during submitting
btree node buffer.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
b34a65069c nilfs2: avoid readahead on metadata file for create mode
This turns off readhead action of metadata file if nilfs_mdt_get_block
function was called with a create flag.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
ef7d4757a5 nilfs2: simplify nilfs_sufile_get_ncleansegs function
Previously, this function took an status code to return possible error
codes.  The ("nilfs2: add local variable to cache the number of clean
segments") patch removed the possibility to return errors.

So, this simplifies the function definition to make it directly return
the number of clean segments.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00
Ryusuke Konishi
aa474a2201 nilfs2: add local variable to cache the number of clean segments
This makes it possible for sufile to get the number of clean segments
faster.

Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
2009-11-20 10:05:49 +09:00