* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
fs/dlm: Drop unnecessary null test
dlm: use genl_register_family_with_ops()
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
udf: super.c Fix warning: variable 'sbi' set but not used
udf: remove duplicated #include
* git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6:
[DNS RESOLVER] Minor typo correction
DNS: Fixes for the DNS query module
cifs: Include linux/err.h for IS_ERR and PTR_ERR
DNS: Make AFS go to the DNS for AFSDB records for unknown cells
DNS: Separate out CIFS DNS Resolver code
cifs: account for new creduid=0x%x parameter in spnego upcall string
cifs: reduce false positives with inode aliasing serverino autodisable
CIFS: Make cifs_convert_address() take a const src pointer and a length
cifs: show features compiled in as part of DebugData
cifs: update README
Fix up trivial conflicts in fs/cifs/cifsfs.c due to workqueue changes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
workqueue: mark init_workqueues() as early_initcall()
workqueue: explain for_each_*cwq_cpu() iterators
fscache: fix build on !CONFIG_SYSCTL
slow-work: kill it
gfs2: use workqueue instead of slow-work
drm: use workqueue instead of slow-work
cifs: use workqueue instead of slow-work
fscache: drop references to slow-work
fscache: convert operation to use workqueue instead of slow-work
fscache: convert object to use workqueue instead of slow-work
workqueue: fix how cpu number is stored in work->data
workqueue: fix mayday_mask handling on UP
workqueue: fix build problem on !CONFIG_SMP
workqueue: fix locking in retry path of maybe_create_worker()
async: use workqueue for worker pool
workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
workqueue: implement unbound workqueue
workqueue: prepare for WQ_UNBOUND implementation
libata: take advantage of cmwq and remove concurrency limitations
workqueue: fix worker management invocation without pending works
...
Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
Currently, o2net_accept_one() is allowed to accept a connection from
listening node itself, such a fake connection will not be successfully
established due to no handshake detected afterwards, and later end up
with triggering connecting worker in a loop.
We're going to fix this by treating such connection request as 'invalid',
since we've got no chance of requesting connection from a node to itself
in a OCFS2 cluster.
The fix doesn't hurt user's scan for o2net-listener, it always gets a
successful connection from userpace.
Signed-off-by: Tristan Ye <tristan.ye@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
When we need to take both dlm_domain_lock and dlm->spinlock, we should take
them in order of: dlm_domain_lock then dlm->spinlock.
There is pathes disobey this order. That is calling dlm_lockres_put() with
dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at
the ref and dlm_put() locks on dlm_domain_lock.
Fix:
Don't grab/put the dlm when the initialising/releasing lockres.
That grab is not required because we don't call dlm_unregister_domain()
based on refcount.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
In the following situation, there remains an incorrect bit in refmap on the
recovery master. Finally the recovery master will fail at purging the lockres
due to the incorrect bit in refmap.
1) node A has no interest on lockres A any longer, so it is purging it.
2) the owner of lockres A is node B, so node A is sending de-ref message
to node B.
3) at this time, node B crashed. node C becomes the recovery master. it recovers
lockres A(because the master is the dead node B).
4) node A migrated lockres A to node C with a refbit there.
5) node A failed to send de-ref message to node B because it crashed. The failure
is ignored. no other action is done for lockres A any more.
For mormal, re-send the deref message to it to recovery master can fix it. Well,
ignoring the failure of deref to the original master and not recovering the lockres
to recovery master has the same effect. And the later is simpler.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Hi,
Thanks a lot for all the review and comments so far;) I'd like to send
the improved (V4) version of this patch.
This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2
and Samba integration using scenario, the symptom is several smbd
processes will be hung under heavy workload. Finally we found out it
is the nested PR lock calling that leads to this deadlock:
node1 node2
gr PR
|
V
PR(EX)---> BAST:OCFS2_LOCK_BLOCKED
|
V
rq PR
|
V
wait=1
After requesting the 2nd PR lock, the process "smbd" went into D
state. It can only be woken up when the 1st PR lock's RO holder equals
zero. There should be an ocfs2_inode_unlock in the calling path later
on, which can decrement the RO holder. But since it has been in
uninterruptible sleep, the unlock function has no chance to be called.
The related stack trace is:
smbd D ffff8800013d0600 0 9522 5608 0x00000000
ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98
ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340
ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340
Call Trace:
[<ffffffff80350425>] schedule_timeout+0x175/0x210
[<ffffffff8034f580>] wait_for_common+0xf0/0x210
[<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2]
[<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2]
[<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2]
[<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2]
[<ffffffff800e3507>] acl_permission_check+0x57/0xb0
[<ffffffff800e357d>] generic_permission+0x1d/0xc0
[<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2]
[<ffffffff800e3f65>] inode_permission+0x45/0x100
[<ffffffff800d86b3>] sys_chdir+0x53/0x90
[<ffffffff80007458>] system_call_fastpath+0x16/0x1b
[<00007f34a4ef6927>] 0x7f34a4ef6927
For details, please see:
https://bugzilla.novell.com/show_bug.cgi?id=614332 and
http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278
Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Acked-by: Mark Fasheh <mfasheh@suse.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
The refcount record calculation in ocfs2_calc_refcount_meta_credits
is too optimistic that we can always allocate contiguous clusters
and handle an already existed refcount rec as a whole. Actually
because of file system fragmentation, we may have the chance to split
a refcount record into 3 parts during the transaction. So consider
the worst case in record calculation.
Cc: stable@kernel.org
Signed-off-by: Tao Ma <tao.ma@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
This patch fixes two problems in dlm_run_purgelist
1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge
the same lockres instead of trying the next lockres.
2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock
before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres.
spinlock is reacquired but in this window lockres can get reused. This leads
to BUG.
This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge
next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the
lockres spinlock protecting it from getting reused.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: Sunil Mushran <sunil.mushran@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
When we have to take both dlm->master_lock and lockres->spinlock,
take them in order
lockres->spinlock and then dlm->master_lock.
The patch fixes a violation of the rule.
We can simply move taking dlm->master_lock to where we have dropped res->spinlock
since when we access res->state and free mle memory we don't need master_lock's
protection.
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Cc: stable@kernel.org
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Setting the acl while creating a new inode depends on
the error codes of posix_acl_create_masq. This patch fix
a issue of overwriting the error codes of it.
Reported-by: Pawel Zawora <pzawora@gmail.com>
Cc: <stable@kernel.org> [ .33, .34 ]
Signed-off-by: Tiger Yang <tiger.yang@oracle.com>
Signed-off-by: Joel Becker <joel.becker@oracle.com>
Whe the first inode for a bdi is marked dirty, we wake up the bdi thread which
should take care of the periodic background write-out. However, the write-out
will actually start only 'dirty_writeback_interval' centisecs later, so we can
delay the wake-up.
This change was requested by Nick Piggin who pointed out that if we delay the
wake-up, we weed out 2 unnecessary contex switches, which matters because
'__mark_inode_dirty()' is a hot-path function.
This patch introduces a new function - 'bdi_wakeup_thread_delayed()', which
sets up a timer to wake-up the bdi thread and returns. So the wake-up is
delayed.
We also delete the timer in bdi threads just before writing-back. And
synchronously delete it when unregistering bdi. At the unregister point the bdi
does not have any users, so no one can arm it again.
Since now we take 'bdi->wb_lock' in the timer, which can execute in softirq
context, we have to use 'spin_lock_bh()' for 'bdi->wb_lock'. This patch makes
this change as well.
This patch also moves the 'bdi_wb_init()' function down in the file to avoid
forward-declaration of 'bdi_wakeup_thread_delayed()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Finally, we can get rid of unnecessary wake-ups in bdi threads, which are very
bad for battery-driven devices.
There are two types of activities bdi threads do:
1. process bdi works from the 'bdi->work_list'
2. periodic write-back
So there are 2 sources of wake-up events for bdi threads:
1. 'bdi_queue_work()' - submits bdi works
2. '__mark_inode_dirty()' - adds dirty I/O to bdi's
The former already has bdi wake-up code. The latter does not, and this patch
adds it.
'__mark_inode_dirty()' is hot-path function, but this patch adds another
'spin_lock(&bdi->wb_lock)' there. However, it is taken only in rare cases when
the bdi has no dirty inodes. So adding this spinlock should be fine and should
not affect performance.
This patch makes sure bdi threads and the forker thread do not wake-up if there
is nothing to do. The forker thread will nevertheless wake up at least every
5 min. to check whether it has to kill a bdi thread. This can also be optimized,
but is not worth it.
This patch also tidies up the warning about unregistered bid, and turns it from
an ugly crocodile to a simple 'WARN()' statement.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently, bdi threads can decide to exit if there were no useful activities
for 5 minutes. However, this causes nasty races: we can easily oops in the
'bdi_queue_work()' if the bdi thread decides to exit while we are waking it up.
And even if we do not oops, but the bdi tread exits immediately after we wake
it up, we'd lose the wake-up event and have an unnecessary delay (up to 5 secs)
in the bdi work processing.
This patch makes the forker thread to be the central place which not only
creates bdi threads, but also kills them if they were inactive long enough.
This better design-wise.
Another reason why this change was done is to prepare for the further changes
which will prevent the bdi threads from waking up every 5 sec and wasting
power. Indeed, when the task does not wake up periodically anymore, it won't be
able to exit either.
This patch also moves the the 'wake_up_bit()' call from the bdi thread to the
forker thread as well. So now the forker thread sets the BDI_pending bit, then
forks the task or kills it, then clears the bit and wakes up the waiting
process.
The only process which may wain on the bit is 'bdi_wb_shutdown()'. This
function was changed as well - now it first removes the bdi from the
'bdi_list', then waits on the 'BDI_pending' bit. Once it wakes up, it is
guaranteed that the forker thread won't race with it, because the bdi is not
visible. Note, the forker thread sets the 'BDI_pending' bit under the
'bdi->wb_lock' which is essential for proper serialization.
And additionally, when we change 'bdi->wb.task', we now take the
'bdi->work_lock', to make sure that we do not lose wake-ups which we otherwise
would when raced with, say, 'bdi_queue_work()'.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently bdi threads use local variable 'last_active' which stores last time
when the bdi thread did some useful work. Move this local variable to 'struct
bdi_writeback'. This is just a preparation for the further patches which will
make the forker thread decide when bdi threads should be killed.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The forker thread removes bdis from 'bdi_list' before forking the bdi thread.
But this is wrong for at least 2 reasons.
Reason #1: if we temporary remove a bdi from the list, we may miss works which
would otherwise be given to us.
Reason #2: this is racy; indeed, 'bdi_wb_shutdown()' expects that bdis are
always in the 'bdi_list' (see 'bdi_remove_from_list()'), and when
it races with the forker thread, it can shut down the bdi thread
at the same time as the forker creates it.
This patch makes sure the forker thread never removes bdis from 'bdi_list'
(which was suggested by Christoph Hellwig).
In order to make sure that we do not race with 'bdi_wb_shutdown()', we have to
hold the 'bdi_lock' while walking the 'bdi_list' and setting the 'BDI_pending'
flag.
NOTE! The error path is interesting. Currently, when we fail to create a bdi
thread, we move the bdi to the tail of 'bdi_list'. But if we never remove the
bdi from the list, we cannot move it to the tail either, because then we can
mess up the RCU readers which walk the list. And also, we'll have the race
described above in "Reason #2".
But I not think that adding to the tail is any important so I just do not do
that.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Currently, bdi threads ('bdi_writeback_thread()') can lose wake-ups. For
example, if 'bdi_queue_work()' is executed after the bdi thread have had
finished 'wb_do_writeback()' but before it called
'schedule_timeout_interruptible()'.
To fix this issue, we have to check whether we have works to process after we
have changed the task state to 'TASK_INTERRUPTIBLE'.
This patch also clean-ups handling of the cases when 'dirty_writeback_interval'
is zero or non-zero.
Additionally, this patch also removes unneeded 'list_empty_careful()' call.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The write-back code mixes words "thread" and "task" for the same things. This
is not a big deal, but still an inconsistency.
hch: a convention I tend to use and I've seen in various places
is to always use _task for the storage of the task_struct pointer,
and thread everywhere else. This especially helps with having
foo_thread for the actual thread and foo_task for a global
variable keeping the task_struct pointer
This patch renames:
* 'bdi_add_default_flusher_task()' -> 'bdi_add_default_flusher_thread()'
* 'bdi_forker_task()' -> 'bdi_forker_thread()'
because bdi threads are 'bdi_writeback_thread()', so these names are more
consistent.
This patch also amends commentaries and makes them refer the forker and bdi
threads as "thread", not "task".
Also, while on it, make 'bdi_add_default_flusher_thread()' declaration use
'static void' instead of 'void static' and make checkpatch.pl happy.
Signed-off-by: Artem Bityutskiy <Artem.Bityutskiy@nokia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
83ba7b07 cleans up the writeback.
So we don't use wb any more in get_next_work_item.
Let's remove unnecessary argument.
CC: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the
pipe. In __generic_file_splice_read(), however, it causes an EAGAIN
if the page is currently being read.
This makes it impossible to write an application that only wants
failure if the pipe is full. For example if the same process is
handling both ends of a pipe and isn't otherwise able to determine
whether a splice to the pipe will fill it or not.
We could make the read non-blocking on O_NONBLOCK or some other splice
flag, but for now this is the simplest fix.
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The open and release block_device_operations are currently
called with the BKL held. In order to change that, we must
first make sure that all drivers that currently rely
on this have no regressions.
This blindly pushes the BKL into all .open and .release
operations for all block drivers to prepare for the
next step. The drivers can subsequently replace the BKL
with their own locks or remove it completely when it can
be shown that it is not needed.
The functions blkdev_get and blkdev_put are the only
remaining users of the big kernel lock in the block
layer, besides a few uses in the ioctl code, none
of which need to serialize with blkdev_{get,put}.
Most of these two functions is also under the protection
of bdev->bd_mutex, including the actual calls to
->open and ->release, and the common code does not
access any global data structures that need the BKL.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Tracing high level background writeback events is good, but it doesn't
give the entire picture. Add visibility into write throttling to catch IO
dispatched by foreground throttling of processing dirtying lots of pages.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Trace queue/sched/exec parts of the writeback loop. This provides
insight into when and why flusher threads are scheduled to run. e.g
a sync invocation leaves traces like:
sync-[...]: writeback_queue: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
flush-8:0-[...]: writeback_exec: bdi 8:0: sb_dev 8:1 nr_pages=7712 sync_mode=0 kupdate=0 range_cyclic=0 background=0
This also lays the foundation for adding more writeback tracing to
provide deeper insight into the whole writeback path.
The original tracing code is from Jens Axboe, though this version is
a rewrite as a result of the code being traced changing
significantly.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
No real bugs I believe, just some dead code, and some
shut up code.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Eric Paris <eparis@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Move all code for the writeback thread into fs/fs-writeback.c instead of
splitting it over two functions in two files.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
The wb_list member of struct backing_device_info always has exactly one
element. Just use the direct bdi->wb pointer instead and simplify some
code.
Also remove bdi_task_init which is now trivial to prepare for the next
patch.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.
Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
A barrier request should by defintion have priority in get_request
and let the queue be unplugged immediately as it's blocking all forward
progress due to the queue draining.
Most filesystems already get this implicitly by the way how submit_bh
treats the buffer_ordered flag, and gfs2 sets it explicitly. But btrfs
and XFS are still forgetting to set the flag, as is blkdev_issue_flush
and some places in DM/MD.
For XFS on metadata heavy workloads this gives a consistent speedup
in the 2-3% range.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Commit f9d7562fdb "nfsd4: share file
descriptors between stateid's" didn't correctly account for O_RDWR opens.
Symptoms include leaked files, resulting in failures to unmount and/or
warnings about orphaned inodes on reboot.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
It's harmless to set this after the server is created, but also
ineffective, since the value is only used at the time of
svc_create_pooled(). So fail the attempt, in keeping with the pattern
set by write_versions, write_{lease,grace}time and write_recoverydir.
(This could break userspace that tried to write to nfsd/max_block_size
between setting up sockets and starting the server. However, such code
wouldn't have worked anyway, and I don't know of any examples--rpc.nfsd
in nfs-utils, probably the only user of the interface, doesn't do that.)
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Commit 59db4a0c10 "nfsd: move more into
nfsd_startup()" inadvertently moved nfsd_versions after
nfsd_create_svc(). On older distributions using an rpc.nfsd that does
not explicitly set the list of nfsd versions, this results in
svc-create_pooled() being called with an empty versions array. The
resulting incomplete initialization leads to a NULL dereference in
svc_process_common() the first time a client accesses the server.
Move nfsd_reset_versions() back before the svc_create_pooled(); this
time, put it closer to the svc_create_pooled() call, to make this
mistake more difficult in the future.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
If a callback is retried at nfsd4_cb_recall_done() due to
some error, the returned rpc reply crashes here:
@@ -514,6 +514,7 @@ decode_cb_sequence(struct xdr_stream *xdr, struct nfsd4_cb_sequence *res,
u32 dummy;
__be32 *p;
+ BUG_ON(!res);
if (res->cbs_minorversion == 0)
return 0;
[BUG_ON added for demonstration]
This is because the nfsd4_cb_done_sequence() has NULLed out
the task->tk_msg.rpc_resp pointer.
Also eventually the rpc would use the new slot without making
sure it is free by calling nfsd41_cb_setup_sequence().
This problem was introduced by a 4.1 protocol addition patch:
[0421b5c5] nfsd41: Backchannel: Implement cb_recall over NFSv4.1
Which was overlooking the possibility of an RPC callback retries.
For not-4.1 case redoing the _prepare is harmless.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
We must create the server before we can call init_socks or check the
number of threads.
Symptoms were a NULL pointer dereference in nfsd_svc(). Problem
identified by Jeff Layton.
Also fix a minor cleanup-on-error case in nfsd_startup().
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (28 commits)
driver core: device_rename's new_name can be const
sysfs: Remove owner field from sysfs struct attribute
powerpc/pci: Remove owner field from attribute initialization in PCI bridge init
regulator: Remove owner field from attribute initialization in regulator core driver
leds: Remove owner field from attribute initialization in bd2802 driver
scsi: Remove owner field from attribute initialization in ARCMSR driver
scsi: Remove owner field from attribute initialization in LPFC driver
cgroupfs: create /sys/fs/cgroup to mount cgroupfs on
Driver core: Add BUS_NOTIFY_BIND_DRIVER
driver core: fix memory leak on one error path in bus_register()
debugfs: no longer needs to depend on SYSFS
sysfs: Fix one more signature discrepancy between sysfs implementation and docs.
sysfs: fix discrepancies between implementation and documentation
dcdbas: remove a redundant smi_data_buf_free in dcdbas_exit
dmi-id: fix a memory leak in dmi_id_init error path
sysfs: sysfs_chmod_file's attr can be const
firmware: Update hotplug script
Driver core: move platform device creation helpers to .init.text (if MODULE=n)
Driver core: reduce duplicated code for platform_device creation
Driver core: use kmemdup in platform_device_add_resources
...
Mark it as 'experimental' instead, since in practice, NFSv4.1 should now be
relatively stable.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Add a flag so we know if we mounted the NFS server using the legacy
binary interface. If we used the legacy interface, then we should not
show the mountd options.
Signed-off-by: Bryan Schumaker <bjschuma@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
The delegation is protected by RCU now, so we need to replace the
nfsi->rwsem protection with an rcu protected section.
Reported-by: Fred Isaman <iisaman@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
tracing/kprobes: unregister_trace_probe needs to be called under mutex
perf: expose event__process function
perf events: Fix mmap offset determination
perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
perf, powerpc: Convert the FSL driver to use local64_t
perf tools: Don't keep unreferenced maps when unmaps are detected
perf session: Invalidate last_match when removing threads from rb_tree
perf session: Free the ref_reloc_sym memory at the right place
x86,mmiotrace: Add support for tracing STOS instruction
perf, sched migration: Librarize task states and event headers helpers
perf, sched migration: Librarize the GUI class
perf, sched migration: Make the GUI class client agnostic
perf, sched migration: Make it vertically scrollable
perf, sched migration: Parameterize cpu height and spacing
perf, sched migration: Fix key bindings
perf, sched migration: Ignore unhandled task states
perf, sched migration: Handle ignored migrate out events
perf: New migration tool overview
tracing: Drop cpparg() macro
perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
...
Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
Revert "net: Make accesses to ->br_port safe for sparse RCU"
mce: convert to rcu_dereference_index_check()
net: Make accesses to ->br_port safe for sparse RCU
vfs: add fs.h to define struct file
lockdep: Add an in_workqueue_context() lockdep-based test function
rcu: add __rcu API for later sparse checking
rcu: add an rcu_dereference_index_check()
tree/tiny rcu: Add debug RCU head objects
mm: remove all rcu head initializations
fs: remove all rcu head initializations, except on_stack initializations
powerpc: remove all rcu head initializations
Make /dev/console get initialised before any initialisation routine that
invokes modprobe because if modprobe fails, it's going to want to open
/dev/console, presumably to write an error message to.
The problem with that is that if the /dev/console driver is not yet
initialised, the chardev handler will call request_module() to invoke
modprobe, which will fail, because we never compile /dev/console as a
module.
This will lead to a modprobe loop, showing the following in the kernel
log:
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
This can happen, for example, when the built in md5 module can't find
the built in cryptomgr module (because the latter fails to initialise).
The md5 module comes before the call to tty_init(), presumably because
'crypto' comes before 'drivers' alphabetically.
Fix this by calling tty_init() from chrdev_init().
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update compression types supported and add some help text for
the LZO Kconfig option.
Also add missing "default n" line and make some trivial whitespace
cleanups too.
Signed-off-by: Phillip Lougher <phillip@lougher.demon.co.uk>
Normally, if the Fb cap bit is being revoked, we queue an async writeback.
If there is no dirty data but we still hold the cap, this leaves the
client sitting around doing nothing until the cap timeouts expire and the
cap is released on its own (as it would have been without the revocation).
Instead, only queue writeback if the bit is actually used (i.e., we have
dirty data). If not, we can reply to the revocation immediately.
Signed-off-by: Sage Weil <sage@newdream.net>
sysfs_chmod_file doesn't change the attribute it operates on, so this
attribute can be marked const.
Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
If the bitmap block on disk is bad, ext4_mb_load_buddy() returns an
error. This error is returned to the caller,
ext4_mb_regular_allocator() and then to ext4_mb_new_blocks(). But
ext4_mb_new_blocks() did not check for the return value of
ext4_mb_regular_allocator() and would repeatedly try to load the
bitmap block. The fix simply catches the return value and exits out of
the 'repeat' loop after cleanup.
We also take the opportunity to clean up the error handling in
ext4_mb_new_blocks().
Google-Bug-Id: 2853530
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
- sys_io_destroy(): acutually return -EINVAL if the context pointed to
is invalidIndex: linux-2.6.33-rc4/fs/aio.c
- sys_io_getevents(): An argument specifying timeout is not `when',
but `timeout'.
- sys_io_getevents(): Should describe what is returned if this syscall
succeeds.
Signed-off-by: Satoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
In data=journal mode, we still use block_write_begin() to prepare page for
writing. This function can occasionally mark buffer dirty which violates
journalling assumptions - when a buffer is part of a transaction, it should be
dirty and a buffer can be already part of a forget list of some transaction
when block_write_begin() gets called. This violation of journalling assumptions
then results in "JBD: Spotted dirty metadata buffer..." warnings.
In fact, temporary dirtying the buffer while the page is still locked does not
really cause problems to the journalling because we won't write the buffer
until the page gets unlocked. So we just have to make sure to clear dirty bits
before unlocking the page.
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Jan Kara <jack@suse.cz>