The function tracing recursion self test should not crash
the machine if the resursion test fails. If it detects that
the function tracing is recursing when it should not be, then
bail, don't go into an infinite recursive loop.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If one of the function tracers set by the global ops is not recursion
safe, it can still be called directly without the added recursion
supplied by the ftrace infrastructure.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The test that checks function recursion does things differently
if the arch does not support all ftrace features. But that really
doesn't make a difference with how the test runs, and either way
the count variable should be 2 at the end.
Currently the test wrongly fails for archs that don't support all
the ftrace features.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's a race condition between the setting of a new tracer and
the update of the max trace buffers (the swap). When a new tracer
is added, it sets current_trace to nop_trace before disabling
the old tracer. At this moment, if the old tracer uses update_max_tr(),
the update may trigger the warning against !current_trace->use_max-tr,
as nop_trace doesn't have that set.
As update_max_tr() requires that interrupts be disabled, we can
add a check to see if current_trace == nop_trace and bail if it
does. Then when disabling the current_trace, set it to nop_trace
and run synchronize_sched(). This will make sure all calls to
update_max_tr() have completed (it was called with interrupts disabled).
As a clean up, this commit also removes shrinking and recreating
the max_tr buffer if the old and new tracers both have use_max_tr set.
The old way use to always shrink the buffer, and then expand it
for the next tracer. This is a waste of time.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Synchronous requet_module() from an async worker can lead to deadlock
because module init path may invoke async_synchronize_full(). The
async worker waits for request_module() to complete and the module
loading waits for the async task to finish. This bug happened in the
block layer because of default elevator auto-loading.
Block layer has been updated not to do default elevator auto-loading
and it has been decided to disallow synchronous request_module() from
async workers.
Trigger WARN_ON_ONCE() on synchronous request_module() from async
workers.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
If cgroup_create() failed and cgroup_destroy_locked() is called to
do cleanup, we'll see a bunch of warnings:
cgroup_addrm_files: failed to remove 2MB.limit_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.max_usage_in_bytes, err=-2
cgroup_addrm_files: failed to remove 2MB.failcnt, err=-2
cgroup_addrm_files: failed to remove prioidx, err=-2
cgroup_addrm_files: failed to remove ifpriomap, err=-2
...
We failed to remove those files, because cgroup_create() has failed
before creating those cgroup files.
To fix this, we simply don't warn if cgroup_rm_file() can't find the
cft entry.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Commit 083b804c4d ("async: use workqueue for worker pool") made it
possible that async jobs are moved from pending to running out-of-order.
While pending async jobs will be queued and dispatched for execution in
the same order, nothing guarantees they'll enter "1) move self to the
running queue" of async_run_entry_fn() in the same order.
Before the conversion, async implemented its own worker pool. An async
worker, upon being woken up, fetches the first item from the pending
list, which kept the executing lists sorted. The conversion to
workqueue was done by adding work_struct to each async_entry and async
just schedules the work item. The queueing and dispatching of such work
items are still in order but now each worker thread is associated with a
specific async_entry and moves that specific async_entry to the
executing list. So, depending on which worker reaches that point
earlier, which is non-deterministic, we may end up moving an async_entry
with larger cookie before one with smaller one.
This broke __lowest_in_progress(). running->domain may not be properly
sorted and is not guaranteed to contain lower cookies than pending list
when not empty. Fix it by ensuring sort-inserting to the running list
and always looking at both pending and running when trying to determine
the lowest cookie.
Over time, the async synchronization implementation became quite messy.
We better restructure it such that each async_entry is linked to two
lists - one global and one per domain - and not move it when execution
starts. There's no reason to distinguish pending and running. They
behave the same for synchronization purposes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
is placed on a function mcount/nop location, and the arch supports it,
instead of adding a breakpoint, kprobes will register a function callback
as that is much more efficient.
The function tracer requires to update modules before they run, and
uses the module notifier to do so. But if something else in the module
notifiers registers a kprobe at one of these locations, before ftrace
can get to it, then the system could fail.
The function tracer must be initialized early, otherwise module notifiers
that probe will only work by chance.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQ/ZaaAAoJEOdOSU1xswtMQsoH/02S5ngidMgIXwYCGzwxR6Ym
f2krSz76rBkFn9ivFPJFfq2Wi05UhhVfFr4gpKzoK+/rBqwVkV5tRIQ4AhJ1/Q3d
wCXB2mcRI6ky/kB/ts8q6gjj+rlJ18NgZf79TA5y5q7FEYc6DvB2dZ+vE+rvCnXk
q/Bx6q4phEdLbVqet+Ga36qzi9u+pW0P+ntZmi0EQLVnz9p+mtdVaRz32qKp0FYi
XhHtPMHQDZoOu8utPNtPcfSOZp1sNN8tXqjEAJ4/Ba54badUfg4WJ+RXGPsYH+4T
lOwpVv7Xp0Sp2b1HivEVfCs1bx2RNzluZisPahBEwdBv+XssvqMbAfG+X3EVk3w=
=6EYI
-----END PGP SIGNATURE-----
Merge tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Kprobes now uses the function tracer if it can. That is, if a probe
is placed on a function mcount/nop location, and the arch supports it,
instead of adding a breakpoint, kprobes will register a function
callback as that is much more efficient.
The function tracer requires to update modules before they run, and
uses the module notifier to do so. But if something else in the
module notifiers registers a kprobe at one of these locations, before
ftrace can get to it, then the system could fail.
The function tracer must be initialized early, otherwise module
notifiers that probe will only work by chance."
* tag 'trace-3.8-rc4-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Be first to run code modification on modules
wake_up_process() should never wakeup a TASK_STOPPED/TRACED task.
Change it to use TASK_NORMAL and add the WARN_ON().
TASK_ALL has no other users, probably can be killed.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
putreg() assumes that the tracee is not running and pt_regs_access() can
safely play with its stack. However a killed tracee can return from
ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
that debugger can actually read/modify the kernel stack until the tracee
does SAVE_REST again.
set_task_blockstep() can race with SIGKILL too and in some sense this
race is even worse, the very fact the tracee can be woken up breaks the
logic.
As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
call, this ensures that nobody can ever wakeup the tracee while the
debugger looks at it. Not only this fixes the mentioned problems, we
can do some cleanups/simplifications in arch_ptrace() paths.
Probably ptrace_unfreeze_traced() needs more callers, for example it
makes sense to make the tracee killable for oom-killer before
access_process_vm().
While at it, add the comment into may_ptrace_stop() to explain why
ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
Reported-by: Salman Qazi <sqazi@google.com>
Reported-by: Suleiman Souhlal <suleiman@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
As trace_clock is used by other things besides tracing, and it
does not require anything from trace.h, it is best not to include
the header file in trace_clock.c.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cleanup and preparation for the next change.
signal_wake_up(resume => true) is overused. None of ptrace/jctl callers
actually want to wakeup a TASK_WAKEKILL task, but they can't specify the
necessary mask.
Turn signal_wake_up() into signal_wake_up_state(state), reintroduce
signal_wake_up() as a trivial helper, and add ptrace_signal_wake_up()
which adds __TASK_TRACED.
This way ptrace_signal_wake_up() can work "inside" ptrace_request()
even if the tracee doesn't have the TASK_WAKEKILL bit set.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.
(commit e6e1e2593 "tracing: Remove lock_depth from event entry")
Because this broke userspace, it was determined that we must
keep those 4 bytes around.
(commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")
This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.
It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.
With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.
As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Move SAVE_REGS support flag into Kconfig and rename
it to CONFIG_DYNAMIC_FTRACE_WITH_REGS. This also introduces
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS which indicates
the architecture depending part of ftrace has a code
that saves full registers.
On the other hand, CONFIG_DYNAMIC_FTRACE_WITH_REGS indicates
the code is enabled.
Link: http://lkml.kernel.org/r/20120928081516.3560.72534.stgit@ltc138.sdl.hitachi.co.jp
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add the file max_graph_depth to the debug tracing directory that lets
the user define the depth of the function graph.
A very useful operation is to set the depth to 1. Then it traces only
the first function that is called when entering the kernel. This can
be used to determine what system operations interrupt a process.
For example, to work on NOHZ processes (single tasks running without
a timer tick), if any interrupt goes off and preempts that task, this
code will show it happening.
# cd /sys/kernel/debug/tracing
# echo 1 > max_graph_depth
# echo function_graph > current_tracer
# cat per_cpu/cpu/<cpu-of-process>/trace
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
There's now a check in tracing_reset_online_cpus() if the buffer is
allocated or NULL. No need to do a check before calling it with max_tr.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
max_tr->buffer could be NULL in the tracing_reset{_online_cpus}. In this
case, a NULL pointer dereference happens, so we should return immediately
from these functions.
Note, the current code does not call tracing_reset*() with max_tr when
its buffer is NULL, but future code will. This patch is needed to prevent
the future code from crashing.
Link: http://lkml.kernel.org/r/20121219070234.31200.93863.stgit@liselsia
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Some functions in the syscall tracing is used only locally to
the file, but they are labeled global. Convert them to static functions.
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Without this patch, we can register a uprobe event for a directory.
Enabling such a uprobe event would anyway fail.
Example:
$ echo 'p /bin:0x4245c0' > /sys/kernel/debug/tracing/uprobe_events
However dirctories cannot be valid targets for uprobe.
Hence verify if the target is a regular file during the probe
registration.
Link: http://lkml.kernel.org/r/20130103004212.690763002@goodmis.org
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
[ cleaned up whitespace and removed redundant IS_DIR() check ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf().
As well-known, the value of &buffer is equal to &buffer[0].
so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast.
Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com
Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The original ring-buffer code had special checks at the start
of rb_advance_iter() and instead of repeating them again at the
end of the function if a certain condition existed, I just did
a recursive call to rb_advance_iter() because the special condition
would cause rb_advance_iter() to return early (after the checks).
But as things have changed, the special checks no longer exist
and the only thing done for the special_condition is to call
rb_inc_iter() and return. Instead of doing a confusing recursive call,
just call rb_inc_iter instead.
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
If some other kernel subsystem has a module notifier, and adds a kprobe
to a ftrace mcount point (now that kprobes work on ftrace points),
when the ftrace notifier runs it will fail and disable ftrace, as well
as kprobes that are attached to ftrace points.
Here's the error:
WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Hardware name: Bochs
Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Call Trace:
[<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffffa0180000>] ? 0xffffffffa017ffff
[<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
[<ffffffff810fd189>] ftrace_bug+0x239/0x280
[<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
[<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
[<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
[<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
[<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
[<ffffffff810c2a23>] sys_init_module+0x73/0x220
[<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
---[ end trace 9ef46351e53bbf80 ]---
ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
actual: cc:bb:d2:4b:e1
A kprobe was added to the init_once() function in the fat module on load.
But this happened before ftrace could have touched the code. As ftrace
didn't run yet, the kprobe system had no idea it was a ftrace point and
simply added a breakpoint to the code (0xcc in the cc:bb:d2:4b:e1).
Then when ftrace went to modify the location from a call to mcount/fentry
into a nop, it didn't see a call op, but instead it saw the breakpoint op
and not knowing what to do with it, ftrace shut itself down.
The solution is to simply give the ftrace module notifier the max priority.
This should have been done regardless, as the core code ftrace modification
also happens very early on in boot up. This makes the module modification
closer to core modification.
Link: http://lkml.kernel.org/r/20130107140333.593683061@goodmis.org
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Reported-by: Frank Ch. Eigler <fche@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1fb9341ac3 made our locking in
load_module more complicated: we grab the mutex once to insert the
module in the list, then again to upgrade it once it's formed.
Since the locking is self-contained, it's neater to do this in
separate functions.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Fix up all callers as they were before, with make one change: an
unsigned module taints the kernel, but doesn't turn off lockdep.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Commit 1fb9341ac3 ("module: put modules in list much earlier") moved
some of the module initialization code around, and in the process
changed the exit paths too. But for the duplicate export symbol error
case the change made the ddebug_cleanup path jump to after the module
mutex unlock, even though it happens with the mutex held.
Rusty has some patches to split this function up into some helper
functions, hopefully the mess of complex goto targets will go away
eventually.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
problem introduced recently by kvm id changes.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQIcBAABAgAGBQJQ/IaJAAoJENkgDmzRrbjxOjAQAIrI9+Jo3Lsxk1v9gXeo9xn2
ST4LNv7/oW2+3NFBOkKsGVpcXe1JtGySIXyx9k+dELPa5xe4Rs4HE3pHQj/VoEx8
FKz3oUXSHkuh+paKuFXvZ2u/z0/FI99GmqHPObvGQ4iS3hTXAibzO83yYYPxwApq
Zq4kof/dAcVVPLm8fGVAMPA2Rbh/WmjDfrIv8gv71QkDjtRLzcr40VIgky5cvu7V
FWcBl4/DVoKkGnDPsLDhLK9QGqgBGhFIlNIcVX4Jv50DiCibOyzdjeUXYxMftoGr
Rw56hHwGpPdqbRIjBkR071vIl/mlXTmxIv+d77vZNBin2MIBwAzCQXo8I1/HojCK
/wKhI+RFj0J5DaDo/BTB80cmI3X2oah5sRUebW6vd9HjunhFFndg4mVeDNPa0E0+
F72xWlj79BjdIOuD06TLg6Tg2klL49nC8bUc0wrsh6onEjhd9v7Cp/X/rxi5cKYW
eEv3oLkKwUHoheF9gBlpnT0Yyl/HpFe+nemblzj/ybRKnk4A5vtJqV9eZnqoOS16
lgIkKOpgXT9dzSom2EL/f4sMCeLLYC44DQwOvxNKt/BdMY0r5y8OLaJORXQGfEDF
Ztvu2G8PmELxV0B3JZcGR/zOcKxpOBsrGoVn0/EQIul3A/0C0ID7i5zwJAyX6LP7
V+6vyF2eHMf10tB0rbfB
=SpOo
-----END PGP SIGNATURE-----
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module fixes and a virtio block fix from Rusty Russell:
"Various minor fixes, but a slightly more complex one to fix the
per-cpu overload problem introduced recently by kvm id changes."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
module: put modules in list much earlier.
module: add new state MODULE_STATE_UNFORMED.
module: prevent warning when finit_module a 0 sized file
virtio-blk: Don't free ida when disk is in use
Pull misc syscall fixes from Al Viro:
- compat syscall fixes (discussed back in December)
- a couple of "make life easier for sigaltstack stuff by reducing
inter-tree dependencies"
- fix up compiler/asmlinkage calling convention disagreement of
sys_clone()
- misc
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
sys_clone() needs asmlinkage_protect
make sure that /linuxrc has std{in,out,err}
x32: fix sigtimedwait
x32: fix waitid()
switch compat_sys_wait4() and compat_sys_waitid() to COMPAT_SYSCALL_DEFINE
switch compat_sys_sigaltstack() to COMPAT_SYSCALL_DEFINE
CONFIG_GENERIC_SIGALTSTACK build breakage with asm-generic/syscalls.h
Ensure that kernel_init_freeable() is not inlined into non __init code
The ia64 function "thread_matches()" has no users since commit
e868a55c2a ("[IA64] remove find_thread_for_addr()"). Remove it.
This allows us to make ptrace_check_attach() static to kernel/ptrace.c,
which is good since we'll need to change the semantics of it and fix up
all the callers.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This function queries whether %current is an async worker executing an
async item. This will be used to implement warning on synchronous
request_module() from async workers.
Signed-off-by: Tejun Heo <tj@kernel.org>
This will be used to implement an inline function to query whether
%current is a workqueue worker and, if so, allow determining which
work item it's executing.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Workqueue wants to expose more interface internal to kernel/. Instead
of adding a new header file, repurpose kernel/workqueue_sched.h.
Rename it to workqueue_internal.h and add include protector.
This patch doesn't introduce any functional changes.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
This is to fix up a build problem with a wireless driver due to the
dynamic-debug patches in this branch.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
worker and needs wq_worker_sleeping/waking_up() invoked on it for
concurrency management. As rescuers never participate in concurrency
management, PF_WQ_WORKER wasn't set on them.
There's a need for an interface which can query whether %current is
executing a work item and if so which. Such interface requires a way
to identify all tasks which may execute work items and PF_WQ_WORKER
will be used for that. As all normal workers always have PF_WQ_WORKER
set, we only need to add it to rescuers.
As rescuers start with WORKER_PREP but never clear it, it's always
NOT_RUNNING and there's no need to worry about it interfering with
concurrency management even if PF_WQ_WORKER is set; however, unlike
normal workers, rescuers currently don't have its worker struct as
kthread_data(). It uses the associated workqueue_struct instead.
This is problematic as wq_worker_sleeping/waking_up() expect struct
worker at kthread_data().
This patch adds worker->rescue_wq and start rescuer kthreads with
worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Allow drivers such as intel_powerclamp to use these apis for
turning on/off ticks during idle.
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
If the default iosched is built as module, the kernel may deadlock
while trying to load the iosched module on device probe if the probing
was running off async. This is because async_synchronize_full() at
the end of module init ends up waiting for the async job which
initiated the module loading.
async A modprobe
1. finds a device
2. registers the block device
3. request_module(default iosched)
4. modprobe in userland
5. load and init module
6. async_synchronize_full()
Async A waits for modprobe to finish in request_module() and modprobe
waits for async A to finish in async_synchronize_full().
Because there's no easy to track dependency once control goes out to
userland, implementing properly nested flushing is difficult. For
now, make module init perform async_synchronize_full() iff module init
has queued async jobs as suggested by Linus.
This avoids the described deadlock because iosched module doesn't use
async and thus wouldn't invoke async_synchronize_full(). This is
hacky and incomplete. It will deadlock if async module loading nests;
however, this works around the known problem case and seems to be the
best of bad options.
For more details, please refer to the following thread.
http://thread.gmane.org/gmane.linux.kernel/1420814
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Alex Riesen <raa.lkml@gmail.com>
Tested-by: Ming Lei <ming.lei@canonical.com>
Tested-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Make the persistent clock check a kernel config option, so that some
platform can explicitely select it, also make CONFIG_RTC_HCTOSYS and
RTC_SYSTOHC depend on its non-existence, which could prevent the
persistent clock and RTC code from doing similar thing twice during
system's init/suspend/resume phases.
If the CONFIG_HAS_PERSISTENT_CLOCK=n, then no change happens for kernel
which still does the persistent clock check in timekeeping_init().
Cc: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
[jstultz: Added dependency for RTC_SYSTOHC as well]
Signed-off-by: John Stultz <john.stultz@linaro.org>
In current kernel, there are several places which need to check
whether there is a persistent clock for the platform. Current check
is done by calling the read_persistent_clock() and validating its
return value.
So one optimization is to do the check only once in timekeeping_init(),
and use a flag persistent_clock_exist to record it.
v2: Add a has_persistent_clock() helper function, as suggested by John.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Stultz <john.stultz@linaro.org>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The clock_adj call returns the clock state on success, which may be a
non-zero value (e.g. TIME_INS), but the modified timex data is copied
back to the user only when zero value (TIME_OK) was returned. Fix the
condition to copy the data also with positive return values.
Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The purpose of this option is to allow ARM/etc systems that rely on the
class RTC subsystem to have the same kind of automatic NTP based
synchronization that we have on PC platforms. Today ARM does not
implement update_persistent_clock and makes extensive use of the class
RTC system.
When enabled CONFIG_RTC_SYSTOHC will provide a generic
rtc_update_persistent_clock that stores the current time in the RTC and
is intended complement the existing CONFIG_RTC_HCTOSYS option that loads
the RTC at boot.
Like with RTC_HCTOSYS the platform's update_persistent_clock is used
first, if it works. Platforms with mixed class RTC and non-RTC drivers
need to return ENODEV when class RTC should be used. Such an update for
PPC is included in this patch.
Long term, implementations of update_persistent_clock should migrate to
proper class RTC drivers and use CONFIG_RTC_SYSTOHC instead.
Tested on ARM kirkwood and PPC405
Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
The pstore RAM backend can get called during resume, and must be defensive
against a suspended time source. Expose getnstimeofday logic that returns
an error instead of a WARN. This can be detected and the timestamp can
be zeroed out.
Reported-by: Doug Anderson <dianders@chromium.org>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Anton Vorontsov <anton.vorontsov@linaro.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Conflicts:
Documentation/networking/ip-sysctl.txt
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
Both conflicts were simply overlapping context.
A build fix for qlcnic is in here too, simply removing the added
devinit annotations which no longer exist.
Signed-off-by: David S. Miller <davem@davemloft.net>
proc_cpuset_show() has a spurious -EINVAL assignment which does
nothing. Remove it.
This patch doesn't make any functional difference.
tj: Rewrote patch description.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
to tracing_on" caused two regressions.
1) The irqs off latency tracer no longer starts if tracing_on is off
when the tracer is set, and then tracing_on is enabled. The tracing_on
file needs the hook that tracing_enabled had to enable tracers if they
request it (call the tracer's start() method).
2) That commit had a separate change that really should have been a
separate patch, but it must have been added accidently with the -a
option of git commit. But as the change is still related to the commit
it wasn't noticed in review. That change, changed the way blocking is
done by the trace_pipe file with respect to the tracing_on settings.
I've been told that this change breaks current userspace, and this
specific change is being reverted.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
iQEcBAABAgAGBQJQ9MZ/AAoJEOdOSU1xswtMtVcH/00HZv5RqIyMvy+3xhqkQuT7
eqP7VpW1nqrpvzYqZz2G/x0CNtCa+ufpzYrcGJWoiNe7cOP8hYWuCR+rLzhHev+a
7x1jZgVGWNCnLvC339PRu+65QpLt0qmWUR0w/F+93Acrdx9LrFtnpH9OgjbgM8m2
5BJVHVBE3vuGdGFwRWPJuEOy62RFxsqlD2MhgXlXyCTUJPQso/3Ef+ft4inJKQ2r
Ffi3PlD3j3TPtSaPPCit72zYqmstvrUsgl0PWjVCsWhhTOA/ZQzlKak0S/uLqT9x
tCqJYFER2SaYx77klRMN0lbXXt6teue0WZnmGZuUQUANGpbalVTQQ4xlxAr34Uc=
=ZBYA
-----END PGP SIGNATURE-----
Merge tag 'trace-3.8-rc3-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing regression fixes from Steven Rostedt:
"The clean up patch commit 0fb9656d95 "tracing: Make tracing_enabled
be equal to tracing_on" caused two regressions.
1) The irqs off latency tracer no longer starts if tracing_on is off
when the tracer is set, and then tracing_on is enabled. The
tracing_on file needs the hook that tracing_enabled had to enable
tracers if they request it (call the tracer's start() method).
2) That commit had a separate change that really should have been a
separate patch, but it must have been added accidently with the -a
option of git commit. But as the change is still related to the
commit it wasn't noticed in review. That change, changed the way
blocking is done by the trace_pipe file with respect to the
tracing_on settings. I've been told that this change breaks
current userspace, and this specific change is being reverted."
* tag 'trace-3.8-rc3-regression-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix regression of trace_pipe
tracing: Fix regression with irqsoff tracer and tracing_on file
Nothing's protected by RCU in rebind_subsystems(), and I can't think
of a reason why it is needed.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
These 2 syncronize_rcu()s make attaching a task to a cgroup
quite slow, and it can't be ignored in some situations.
A real case from Colin Cross: Android uses cgroups heavily to
manage thread priorities, putting threads in a background group
with reduced cpu.shares when they are not visible to the user,
and in a foreground group when they are. Some RPCs from foreground
threads to background threads will temporarily move the background
thread into the foreground group for the duration of the RPC.
This results in many calls to cgroup_attach_task.
In cgroup_attach_task() it's task->cgroups that is protected by RCU,
and put_css_set() calls kfree_rcu() to free it.
If we remove this synchronize_rcu(), there can be threads in RCU-read
sections accessing their old cgroup via current->cgroups with
concurrent rmdir operation, but this is safe.
# time for ((i=0; i<50; i++)) { echo $$ > /mnt/sub/tasks; echo $$ > /mnt/tasks; }
real 0m2.524s
user 0m0.008s
sys 0m0.004s
With this patch:
real 0m0.004s
user 0m0.004s
sys 0m0.000s
tj: These synchronize_rcu()s are utterly confused. synchornize_rcu()
necessarily has to come between two operations to guarantee that
the changes made by the former operation are visible to all rcu
readers before proceeding to the latter operation. Here,
synchornize_rcu() are at the end of attach operations with nothing
beyond it. Its only effect would be delaying completion of
write(2) to sysfs tasks/procs files until all rcu readers see the
change, which doesn't mean anything.
Signed-off-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Colin Cross <ccross@google.com>
Clockevent cleanup series from Shawn Guo.
Resolved move/change conflict in mach-pxa/time.c due to the sys_timer
cleanup.
* clocksource/cleanup:
clocksource: use clockevents_config_and_register() where possible
ARM: use clockevents_config_and_register() where possible
clockevents: export clockevents_config_and_register for module use
+ sync to Linux 3.8-rc3
Signed-off-by: Olof Johansson <olof@lixom.net>
Conflicts:
arch/arm/mach-pxa/time.c