This patch removes try_module_get race in elevator_find.
try_module_get should always be called with the spinlock protecting
what the module init/cleanup routines register/unregister to held. In
the case of elevators, we should be holding elv_list to avoid it going
away between spin_unlock_irq and try_module_get.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
disk stat when "now" is different from disk->stamp. Otherwise, we
are again needlessly adding zero to the stats.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
struct gendisk has these two fields: stamp, stamp_idle. Update to
stamp_idle is always in sync with stamp and they are always the same.
Therefore, it does not add any value in having two fields tracking
same timestamp. Suggest to remove it.
Also, we should only update gendisk stats with non-zero value.
Advantage is that we don't have to needlessly calculate memory address,
and then add zero to the content.
Signed-off-by: Ken Chen <kenneth.w.chen@intel.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Simplify user_mad.c code in a few places, and convert from kmalloc() +
memset() to kzalloc(). This also fixes a theoretical race window by
not accessing packet->length after posting the send buffer (the send
could complete and packet could be freed before we get to the return
statement at the end of ib_umad_write()).
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The conversion of user_mad.c to the new MAD send API was slightly off:
in a few places, we used packet->msg instead of packet->msg->mad when
referring to the actual data buffer, which ended up corrupting the
underlying data structure and crashing when we free an invalid pointer.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Optimise attribute revalidation when hardlinking. Add post-op attributes
for the directory and the original inode.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
"Optional" means that the close call will not fail if the getattr
at the end of the compound fails.
If it does succeed, try to refresh inode attributes.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since the directory attributes change every time we CREATE a file,
we might as well pick up the new directory attributes in the same
compound.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
nfs_lookup() used to consult a lookup cache before trying an actual wire
lookup operation. The lookup cache would be invalid, of course, if the
parent directory's mtime had changed, so nfs_lookup performed an inode
revalidation on the parent.
Since nfs_lookup() doesn't use a cache anymore, the revalidation is no
longer necessary. There are cases where it will generate a lot of
unnecessary GETATTR traffic.
See http://bugzilla.linux-nfs.org/show_bug.cgi?id=9
Test-plan:
Use lndir and "rm -rf" and watch for excess GETATTR traffic or application
level errors.
Signed-off-by: Chuck Lever <cel@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Since we almost always call nfs_end_data_update() after we called
nfs_refresh_inode(), we now end up marking the inode metadata
as needing revalidation immediately after having updated it.
This patch rearranges things so that we mark the inode as needing
revalidation _before_ we call nfs_refresh_inode() on those operations
that need it.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Allow nfs_refresh_inode() also to update attributes on the inode if the
RPC call was sent after the last call to nfs_update_inode().
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Don't try to access not-present CPUs. Conservative governor will always
oops on SMP without this fix.
Fixes http://bugzilla.kernel.org/show_bug.cgi?id=4781
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Commit id 6142891a0c
Andi Kleen reports that it seems to break things for some people,
and since it's purely a small optimization, revert it for now.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
In arch/ia64/kernel/ptrace.c there is a test for a peek or poke of a
register image (in register backing storage).
The test can be unnecessarily long (and occurs while holding the tasklist_lock).
Especially long on a large system with thousands of active tasks.
The ptrace caller (presumably a debugger) specifies the pid of
its target and an address to peek or poke. But the debugger could be
attached to several tasks.
The idea of find_thread_for_addr() is to find whether the target address
is in the RBS for any of those tasks.
Currently it searches the thread-list of the target pid. If that search
does not find a match, and the shared mm-struct's user count indicates
that there are other tasks sharing this address space (a rare occurrence),
a search is made of all the tasks in the system.
Another approach can drastically shorten this procedure.
It depends upon the fact that in order to peek or poke from/to any task,
the debugger must first attach to that task. And when it does, the
attached task is made a child of the debugger (is chained to its children list).
Therefore we can search just the debugger's children list.
Signed-off-by: Cliff Wickman <cpw@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
flush_tlb_all() can be a scaling issue on large SGI Altix systems
since it uses the global call_lock and always executes on all cpus.
When a process enters flush_tlb_range() to purge TLBs for another
process, it is possible to avoid flush_tlb_all() and instead allow
sn2_global_tlb_purge() to purge TLBs only where necessary.
This patch modifies flush_tlb_range() so that this case can be handled
by platform TLB purge functions and updates ia64_global_tlb_purge()
accordingly. sn2_global_tlb_purge() now calculates the region register
value from the mm argument introduced with this patch.
Signed-off-by: Dean Roe <roe@sgi.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Add some initial support for detecting and reporting catastrophic
errors reported by Mellanox HCAs. We start a periodic timer which
polls the catastrophic error reporting buffer in device memory. If an
error is detected, we dump the contents of the buffer for port-mortem
debugging, and report a fatal asynchronous error to higher levels.
In the future we can try to recover from these errors by resetting the
device, but this will require some work in higher-level code as well.
Let's get this in now, so that we at least get catastrophic errors
reported in logs.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This bug is responsible for causing the infamous "Treason uncloaked"
messages that's been popping up everywhere since the printk was added.
It has usually been blamed on foreign operating systems. However,
some of those reports implicate Linux as both systems are running
Linux or the TCP connection is going across the loopback interface.
In fact, there really is a bug in the Linux TCP header prediction code
that's been there since at least 2.1.8. This bug was tracked down with
help from Dale Blount.
The effect of this bug ranges from harmless "Treason uncloaked"
messages to hung/aborted TCP connections. The details of the bug
and fix is as follows.
When snd_wnd is updated, we only update pred_flags if
tcp_fast_path_check succeeds. When it fails (for example,
when our rcvbuf is used up), we will leave pred_flags with
an out-of-date snd_wnd value.
When the out-of-date pred_flags happens to match the next incoming
packet we will again hit the fast path and use the current snd_wnd
which will be wrong.
In the case of the treason messages, it just happens that the snd_wnd
cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore
when a zero-window packet comes in we incorrectly conclude that the
window is non-zero.
In fact if the peer continues to send us zero-window pure ACKs we
will continue making the same mistake. It's only when the peer
transmits a zero-window packet with data attached that we get a
chance to snap out of it. This is what triggers the treason
message at the next retransmit timeout.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>
This just makes sure that a thread's expiry times can't get reset after
it clears them in do_exit.
This is what allowed us to re-introduce the stricter BUG_ON() check in
a362f463a6.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
This reverts commit 3de463c7d9.
Roland has another patch that allows us to leave the BUG_ON() in place
by just making sure that the condition it tests for really is always
true.
That goes in next.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
There's a silly off-by-one error in the code that updates the expiration
of posix CPU timers, causing them to not be properly updated when they
hit exactly on their expiration time (which should be the normal case).
This causes them to then fire immediately again, and only _then_ get
properly updated.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
I've seen similar failure on alpha.
Obviously, someone forgot to convert sg->handle stuff for
PCI gart case.
Signed-off-by: Dave Airlie <airlied@linux.ie>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Convert nanoseconds to microseconds correctly.
Spotted by Steve Dickson <SteveD@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fsck_hfs reveals lots of temporary files accumulating in the hidden
directory "\000\000\000HFS+ Private Data". According to the HFS+
documentation these are files which are unlinked while in use. However,
there may be a bug in the Linux hfsplus implementation which causes this to
happen even when the files are not in use. It looks like the "opencnt"
field is never initialized as (I think) it should be in hfsplus_read_inode.
This means that a file can appear to be still in use when in fact it has
been closed. This patch seems to fix it for me.
Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Although this message is having the intended effect of causing wireless
driver maintainers to upgrade their code, I never should have merged this
patch in its present form. Leading to tons of bug reports and unhappy
users.
Some wireless apps poll for statistics regularly, which leads to a printk()
every single time they ask for stats. That's a little bit _too_ much of a
reminder that the driver is using an old API.
Change this to printing out the message once, per kernel boot.
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
With CONFIG_SMP=n:
*** Warning: "cpu_online_map" [drivers/firmware/dcdbas.ko] undefined!
due to set_cpus_allowed().
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The mpic interrupt controller driver (used on G5 and early pSeries among
others) has a bug where it doesn't get the right virtual address for the
timer registers. It causes the driver to poke at the MMIO space of
whatever has been mapped just next to it (ouch !) when initializing and
causes boot failures on some IBM machines.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
The NUMA counters in struct per_cpu_pageset (linux/mmzone.h) are never
cleared today. This works ok for CPU 0 on NUMA machines because
boot_pageset[] is already zero, but for other CPU:s this results in
uninitialized counters.
Signed-off-by: Magnus Damm <magnus@valinux.co.jp>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>