Impact: Restructure WARN_ONs in ring_buffer.c
The current WARN_ON macros in ring_buffer.c are quite ugly.
This patch cleans them up and uses a single RB_WARN_ON that returns
the value of the condition. This allows the caller to abort the
function if the condition is true.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: enable/disable ring buffer recording API added
Several kernel developers have requested that there be a way to stop
recording into the ring buffers with a simple switch that can also
be enabled from userspace. This patch addes a new kernel API to the
ring buffers called:
tracing_on()
tracing_off()
When tracing_off() is called, all ring buffers will not be able to record
into their buffers.
tracing_on() will enable the ring buffers again.
These two act like an on/off switch. That is, there is no counting of the
number of times tracing_off or tracing_on has been called.
A new file is added to the debugfs/tracing directory called
tracing_on
This allows for userspace applications to also flip the switch.
echo 0 > debugfs/tracing/tracing_on
disables the tracing.
echo 1 > /debugfs/tracing/tracing_on
enables it.
Note, this does not disable or enable any tracers. It only sets or clears
a flag that needs to be set in order for the ring buffers to write to
their buffers. It is a global flag, and affects all ring buffers.
The buffers start out with tracing_on enabled.
There are now three flags that control recording into the buffers:
tracing_on: which affects all ring buffer tracers.
buffer->record_disabled: which affects an allocated buffer, which may be set
if an anomaly is detected, and tracing is disabled.
cpu_buffer->record_disabled: which is set by tracing_stop() or if an
anomaly is detected. tracing_start can not reenable this if
an anomaly occurred.
The userspace debugfs/tracing/tracing_enabled is implemented with
tracing_stop() but the user space code can not enable it if the kernel
called tracing_stop().
Userspace can enable the tracing_on even if the kernel disabled it.
It is just a switch used to stop tracing if a condition was hit.
tracing_on is not for protecting critical areas in the kernel nor is
it for stopping tracing if an anomaly occurred. This is because userspace
can reenable it at any time.
Side effect: With this patch, I discovered a dead variable in ftrace.c
called tracing_on. This patch removes it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
* 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: handle HRTIMER_CB_IRQSAFE_UNLOCKED correctly from softirq context
nohz: disable tick_nohz_kick_tick() for now
irq: call __irq_enter() before calling the tick_idle_check
x86: HPET: enter hpet_interrupt_handler with interrupts disabled
x86: HPET: read from HPET_Tn_CMP() not HPET_T0_CMP
x86: HPET: convert WARN_ON to WARN_ON_ONCE
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: release buddies on yield
fix for account_group_exec_runtime(), make sure ->signal can't be freed under rq->lock
sched: clean up debug info
Impact: serialize reader accesses to individual CPU ring buffers
The code in the ring buffer expects only one reader at a time, but currently
it puts that requirement on the caller. This is not strong enough, and this
patch adds a "reader_lock" that serializes the access to the reader API
of the ring buffer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: improve CPU time accounting of tasks under the cpu accounting controller
Add hierarchical accounting to cpu accounting controller and include
cpuacct documentation.
Currently, while charging the task's cputime to its accounting group,
the accounting group hierarchy isn't updated. This patch charges the cputime
of a task to its accounting group and all its parent accounting groups.
Reported-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Reviewed-by: Paul Menage <menage@google.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If an invalid (large) capability is requested the capabilities system
may panic as it is dereferencing an array of fixed (short) length. Its
possible (and actually often happens) that the capability system
accidentally stumbled into a valid memory region but it also regularly
happens that it hits invalid memory and BUGs. If such an operation does
get past cap_capable then the selinux system is sure to have problems as
it already does a (simple) validity check and BUG. This is known to
happen by the broken and buggy firegl driver.
This patch cleanly checks all capable calls and BUG if a call is for an
invalid capability. This will likely break the firegl driver for some
situations, but it is the right thing to do. Garbage into a security
system gets you killed/bugged
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Andrew G. Morgan <morgan@kernel.org>
Signed-off-by: James Morris <jmorris@namei.org>
Clear buddies on yield, so that the buddy rules don't schedule them
despite them being placed right-most.
This fixed a performance regression with yield-happy binary JVMs.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Lin Ming <ming.m.lin@intel.com>
actual capbilities being added/removed. This patch adds a new record type
which emits the target pid and the eff, inh, and perm cap sets.
example output if you audit capset syscalls would be:
type=SYSCALL msg=audit(1225743140.465:76): arch=c000003e syscall=126 success=yes exit=0 a0=17f2014 a1=17f201c a2=80000000 a3=7fff2ab7f060 items=0 ppid=2160 pid=2223 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=1 comm="setcap" exe="/usr/sbin/setcap" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1322] msg=audit(1225743140.465:76): pid=0 cap_pi=ffffffffffffffff cap_pp=ffffffffffffffff cap_pe=ffffffffffffffff
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
non-zero pE we will crate a new audit record which contains the entire set
of known information about the executable in question, fP, fI, fE, fversion
and includes the process's pE, pI, pP. Before and after the bprm capability
are applied. This record type will only be emitted from execve syscalls.
an example of making ping use fcaps instead of setuid:
setcap "cat_net_raw+pe" /bin/ping
type=SYSCALL msg=audit(1225742021.015:236): arch=c000003e syscall=59 success=yes exit=0 a0=1457f30 a1=14606b0 a2=1463940 a3=321b770a70 items=2 ppid=2929 pid=2963 auid=0 uid=500 gid=500 euid=500 suid=500 fsuid=500 egid=500 sgid=500 fsgid=500 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=UNKNOWN[1321] msg=audit(1225742021.015:236): fver=2 fp=0000000000002000 fi=0000000000000000 fe=1 old_pp=0000000000000000 old_pi=0000000000000000 old_pe=0000000000000000 new_pp=0000000000002000 new_pi=0000000000000000 new_pe=0000000000002000
type=EXECVE msg=audit(1225742021.015:236): argc=2 a0="ping" a1="127.0.0.1"
type=CWD msg=audit(1225742021.015:236): cwd="/home/test"
type=PATH msg=audit(1225742021.015:236): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fe=1 cap_fver=2
type=PATH msg=audit(1225742021.015:236): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
records of any file that has file capabilities set. Files which do not
have fcaps set will not have different PATH records.
An example audit record if you run:
setcap "cap_net_admin+pie" /bin/bash
/bin/bash
type=SYSCALL msg=audit(1225741937.363:230): arch=c000003e syscall=59 success=yes exit=0 a0=2119230 a1=210da30 a2=20ee290 a3=8 items=2 ppid=2149 pid=2923 auid=0 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=3 comm="ping" exe="/bin/ping" subj=unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023 key=(null)
type=EXECVE msg=audit(1225741937.363:230): argc=2 a0="ping" a1="www.google.com"
type=CWD msg=audit(1225741937.363:230): cwd="/root"
type=PATH msg=audit(1225741937.363:230): item=0 name="/bin/ping" inode=49256 dev=fd:00 mode=0104755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ping_exec_t:s0 cap_fp=0000000000002000 cap_fi=0000000000002000 cap_fe=1 cap_fver=2
type=PATH msg=audit(1225741937.363:230): item=1 name=(null) inode=507915 dev=fd:00 mode=0100755 ouid=0 ogid=0 rdev=00:00 obj=system_u:object_r:ld_so_t:s0
Signed-off-by: Eric Paris <eparis@redhat.com>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
Impact: extend /proc/sched_debug info
Since the statistics of a group entity isn't exported directly from the
kernel, it becomes difficult to obtain some of the group statistics.
For example, the current method to obtain exec time of a group entity
is not always accurate. One has to read the exec times of all
the tasks(/proc/<pid>/sched) in the group and add them. This method
fails (or becomes difficult) if we want to collect stats of a group over
a duration where tasks get created and terminated.
This patch makes it easier to obtain group stats by directly including
them in /proc/sched_debug. Stats like group exec time would help user
programs (like LTP) to accurately measure the group fairness.
An example output of group stats from /proc/sched_debug:
cfs_rq[3]:/3/a/1
.exec_clock : 89.598007
.MIN_vruntime : 0.000001
.min_vruntime : 256300.970506
.max_vruntime : 0.000001
.spread : 0.000000
.spread0 : -25373.372248
.nr_running : 0
.load : 0
.yld_exp_empty : 0
.yld_act_empty : 0
.yld_both_empty : 0
.yld_count : 4474
.sched_switch : 0
.sched_count : 40507
.sched_goidle : 12686
.ttwu_count : 15114
.ttwu_local : 11950
.bkl_count : 67
.nr_spread_over : 0
.shares : 0
.se->exec_start : 113676.727170
.se->vruntime : 1592.612714
.se->sum_exec_runtime : 89.598007
.se->wait_start : 0.000000
.se->sleep_start : 0.000000
.se->block_start : 0.000000
.se->sleep_max : 0.000000
.se->block_max : 0.000000
.se->exec_max : 1.000282
.se->slice_max : 1.999750
.se->wait_max : 54.981093
.se->wait_sum : 217.610521
.se->wait_count : 50
.se->load.weight : 2
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Acked-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Acked-by: Dhaval Giani <dhaval@linux.vnet.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix incorrect locking triggered during hotplug-intense stress-tests
While migrating the the CB_IRQSAFE_UNLOCKED timers during a cpu-offline,
we queue them on the cb_pending list, so that they won't go
stale.
Thus, when the callbacks of the timers run from the softirq context,
they could run into potential deadlocks, since these callbacks
assume that they're running with irq's disabled, thereby annoying
lockdep!
Fix this by emulating hardirq context while running these callbacks from
the hrtimer softirq.
=================================
[ INFO: inconsistent lock state ]
2.6.27 #2
--------------------------------
inconsistent {in-hardirq-W} -> {hardirq-on-W} usage.
ksoftirqd/0/4 [HC0[0]:SC1[1]:HE1:SE0] takes:
(&rq->lock){++..}, at: [<c011db84>] sched_rt_period_timer+0x9e/0x1fc
{in-hardirq-W} state was registered at:
[<c014103c>] __lock_acquire+0x549/0x121e
[<c0107890>] native_sched_clock+0x88/0x99
[<c013aa12>] clocksource_get_next+0x39/0x3f
[<c0139abc>] update_wall_time+0x616/0x7df
[<c0141d6b>] lock_acquire+0x5a/0x74
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c047ed45>] _spin_lock+0x1c/0x45
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c0121724>] scheduler_tick+0x3a/0x18d
[<c012c436>] update_process_times+0x3a/0x44
[<c013c044>] tick_periodic+0x63/0x6d
[<c013c062>] tick_handle_periodic+0x14/0x5e
[<c010568c>] timer_interrupt+0x44/0x4a
[<c0150c9f>] handle_IRQ_event+0x13/0x3d
[<c0151c14>] handle_level_irq+0x79/0xbd
[<c0105634>] do_IRQ+0x69/0x7d
[<c01041e4>] common_interrupt+0x28/0x30
[<c047007b>] aac_probe_one+0x1a3/0x3f3
[<c047ec2d>] _spin_unlock_irqrestore+0x36/0x39
[<c01512b4>] setup_irq+0x1be/0x1f9
[<c065d70b>] start_kernel+0x259/0x2c5
[<ffffffff>] 0xffffffff
irq event stamp: 50102
hardirqs last enabled at (50102): [<c047ebf4>] _spin_unlock_irq+0x20/0x23
hardirqs last disabled at (50101): [<c047edc2>] _spin_lock_irq+0xa/0x4b
softirqs last enabled at (50088): [<c0128ba6>] do_softirq+0x37/0x4d
softirqs last disabled at (50099): [<c0128ba6>] do_softirq+0x37/0x4d
other info that might help us debug this:
no locks held by ksoftirqd/0/4.
stack backtrace:
Pid: 4, comm: ksoftirqd/0 Not tainted 2.6.27 #2
[<c013f6cb>] print_usage_bug+0x13e/0x147
[<c013fef5>] mark_lock+0x493/0x797
[<c01410b1>] __lock_acquire+0x5be/0x121e
[<c0141d6b>] lock_acquire+0x5a/0x74
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c047ed45>] _spin_lock+0x1c/0x45
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c011db84>] sched_rt_period_timer+0x9e/0x1fc
[<c01210fd>] finish_task_switch+0x41/0xbd
[<c0107890>] native_sched_clock+0x88/0x99
[<c011dae6>] sched_rt_period_timer+0x0/0x1fc
[<c0136dda>] run_hrtimer_pending+0x54/0xe5
[<c011dae6>] sched_rt_period_timer+0x0/0x1fc
[<c0128afb>] __do_softirq+0x7b/0xef
[<c0128ba6>] do_softirq+0x37/0x4d
[<c0128c12>] ksoftirqd+0x56/0xc5
[<c0128bbc>] ksoftirqd+0x0/0xc5
[<c0134649>] kthread+0x38/0x5d
[<c0134611>] kthread+0x0/0x5d
[<c0104477>] kernel_thread_helper+0x7/0x10
=======================
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add new tracing plugin which can trace full (entry+exit) function calls
This tracer uses the low level function return ftrace plugin to
measure the execution time of the kernel functions.
The first field is the caller of the function, the second is the
measured function, and the last one is the execution time in
nanoseconds.
- v3:
- HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
should enable it.
- ftrace_return_stub becomes ftrace_stub.
- CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
- Return traces printing can be used for other tracers on trace.c
- Adapt to the new tracing API (no more ctrl_update callback)
- Correct the check of "disabled" during insertion.
- Minor changes...
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: add infrastructure for function-return tracing
Add low level support for ftrace return tracing.
This plug-in stores return addresses on the thread_info structure of
the current task.
The index of the current return address is initialized when the task
is the first one (init) and when a process forks (the child). It is
not needed when a task does a sys_execve because after this syscall,
it still needs to return on the kernel functions it called.
Note that the code of return_to_handler has been suggested by Steven
Rostedt as almost all of the ideas of improvements in this V3.
For purpose of security, arch/x86/kernel/process_32.c is not traced
because __switch_to() changes the current task during its execution.
That could cause inconsistency in the stored return address of this
function even if I didn't have any crash after testing with tracing on
this function enabled.
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
This patch replaces most of the BUG_ONs in the ring_buffer code with
RB_WARN_ON variants. It adds some more variants as needed for the
replacement. This lets the buffer die nicely and still warn the user.
One BUG_ON remains in the code, and that is because it detects a
bad pointer passed in by the calling function, and not a bug by
the ring buffer code itself.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: stop ftrace_special from recursion
The ftrace_special is used to help debug areas of the kernel.
Because of this, if it is put in certain locations, the fact that
it allows recursion can become a problem if the kernel developer
using does not realize that.
This patch changes ftrace_special to not allow recursion into itself
to make it more robust.
It also changes from preempt disable interrupts disable to prevent
any loss of trace entries.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup, change .config option name
We had this ugly config name for a long time for hysteric raisons.
Rename it to a saner name.
We still cannot get rid of it completely, until /proc/<pid>/stack
usage replaces WCHAN usage for good.
We'll be able to do that in the v2.6.29/v2.6.30 timeframe.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix hang/crash on ia64 under high load
This is ugly, but the simplest patch by far.
Unlike other similar routines, account_group_exec_runtime() could be
called "implicitly" from within scheduler after exit_notify(). This
means we can race with the parent doing release_task(), we can't just
check ->signal != NULL.
Change __exit_signal() to do spin_unlock_wait(&task_rq(tsk)->lock)
before __cleanup_signal() to make sure ->signal can't be freed under
task_rq(tsk)->lock. Note that task_rq_unlock_wait() doesn't care
about the case when tsk changes cpu/rq under us, this should be OK.
Thanks to Ingo who nacked my previous buggy patch.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Reported-by: Doug Chapman <doug.chapman@hp.com>
Impact: removal of unnecessary looping
The lockless part of the ring buffer allows for reentry into the code
from interrupts. A timestamp is taken, a test is preformed and if it
detects that an interrupt occurred that did tracing, it tries again.
The problem arises if the timestamp code itself causes a trace.
The detection will detect this and loop again. The difference between
this and an interrupt doing tracing, is that this will fail every time,
and cause an infinite loop.
Currently, we test if the loop happens 1000 times, and if so, it will
produce a warning and disable the ring buffer.
The problem with this approach is that it makes it difficult to perform
some types of tracing (tracing the timestamp code itself).
Each trace entry has a delta timestamp from the previous entry.
If a trace entry is reserved but and interrupt occurs and traces before
the previous entry is commited, the delta timestamp for that entry will
be zero. This actually makes sense in terms of tracing, because the
interrupt entry happened before the preempted entry was commited, so
one may consider the two happening at the same time. The order is
still preserved in the buffer.
With this idea, instead of trying to get a new timestamp if an interrupt
made it in between the timestamp and the test, the entry could simply
make the delta zero and continue. This will prevent interrupts or
tracers in the timer code from causing the above loop.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: fix for bug on resize
This patch addresses the bug found here:
http://bugzilla.kernel.org/show_bug.cgi?id=11996
When ftrace converted to the new unified trace buffer, the resizing of
the buffer was not protected as much as it was originally. If tracing
is performed while the resize occurs, then the buffer can be corrupted.
This patch disables all ftrace buffer modifications before a resize
takes place.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Impact: nohz powersavings and wakeup regression
commit fb02fbc14d (NOHZ: restart tick
device from irq_enter()) causes a serious wakeup regression.
While the patch is correct it does not take into account that spurious
wakeups happen on x86. A fix for this issue is available, but we just
revert to the .27 behaviour and let long running softirqs screw
themself.
Disable it for now.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Impact: avoid spurious ksoftirqd wakeups
The tick idle check which is called from irq_enter() was run before
the call to __irq_enter() which did not set the in_interrupt() bits in
preempt_count. That way the raise of a softirq woke up softirqd for
nothing as the softirq was handled on return from interrupt.
Call __irq_enter() before calling into the tick idle check code.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: clean up and fix debug info printout
While looking over the sched_debug code I noticed that we printed the rq
schedstats for every cfs_rq, ammend this.
Also change nr_spead_over into an int, and fix a little buglet in
min_vruntime printing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The affinity setting in setup irq is called before the NO_BALANCING
flag is checked and might therefore override affinity settings from the
calling code with the default setting.
Move the NO_BALANCING flag check before the call to the affinity
setting.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preserve user-modified affinities on interrupts
Kumar Galak noticed that commit
1840475676 (genirq: Expose default irq
affinity mask (take 3))
overrides an already set affinity setting across a free /
request_irq(). Happens e.g. with ifdown/ifup of a network device.
Change the logic to mark the affinities as set and keep them
intact. This also fixes the unlocked access to irq_desc in
irq_select_affinity() when called from irq_affinity_proc_write()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'cpus4096' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
cpumask: introduce new API, without changing anything, v3
cpumask: new API, v2
cpumask: introduce new API, without changing anything
Impact: change in trace output
Because the trace buffers are per cpu ring buffers, the start of
the trace can be confusing. If one CPU is very active at the
end of the trace, its history will not go as far back as the
other CPU traces. This means that output for a particular CPU
may not appear for the first part of a trace.
To help annotate what is happening, and to prevent any more
confusion, this patch adds a line that annotates the start of
a CPU buffer output.
For example:
automount-3495 [001] 184.596443: dnotify_parent <-vfs_write
[...]
automount-3495 [001] 184.596449: dput <-path_put
automount-3496 [002] 184.596450: down_read_trylock <-do_page_fault
[...]
sshd-3497 [001] 184.597069: up_read <-do_page_fault
<idle>-0 [000] 184.597074: __exit_idle <-exit_idle
[...]
automount-3496 [002] 184.597257: filemap_fault <-__do_fault
<idle>-0 [003] 184.597261: exit_idle <-smp_apic_timer_interrupt
Note, parsers of a trace output should always ignore any lines that
start with a '#'.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: preemptoff not tested in selftest
Due to the BKL not being preemptable anymore, the selftest of the
preemptoff code can not be tested. It requires that it is called
with preemption enabled, but since the BKL is held, that is no
longer the case.
This patch simply skips those tests if it detects that the context
is not preemptable. The following will now show up in the tests:
Testing tracer preemptoff: can not test ... force PASSED
Testing tracer preemptirqsoff: can not test ... force PASSED
When the BKL is removed, or it becomes preemptable once again, then
the tests will be performed.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: remove obsolete variable in trace_array structure
With the new start / stop method of ftrace, the ctrl variable
in the trace_array structure is now obsolete. Remove it.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: Remove the ctrl_update tracer method
With the new quick start/stop method of tracing, the ctrl_update
method is out of date.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: have the ftrace_printk enabled on startup
It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl"
after adding ftrace_printk in the kernel.
Currently the trace_printk is set to off by default. ftrace_printk
should only be in open kernel code when used for debugging, and thus
it should be enabled by default.
It may also be used to record data within a tracer, but those ftrace_printks
should be within wrappers that are either enabled by trace_points or
have a variable protecting the code path from being entered when the
tracer is disabled.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix to irqsoff tracer output
In converting to the new start / stop ftrace handling, the irqsoff
tracer start called the irqsoff reset function. irqsoff tracer is
not the same as the other traces, and it resets the buffers while
searching for the longest latency.
The reset that the irqsoff stop method calls disables the function
tracing. That means that, by starting the tracer, the function
tracer is disabled incorrectly.
This patch simply removes the call to reset which keeps the function
tracing enabled. Reset is not needed for the irqsoff stop method.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix for sched_switch that broke dynamic ftrace startup
The commit: tracing/fastboot: use sched switch tracer from boot tracer
broke the API of the sched_switch trace. The use of the
tracing_start/stop_cmdline record is for only recording the cmdline,
NOT recording the schedule switches themselves.
Seeing that the boot tracer broke the API to do something that it
wanted, this patch adds a new interface for the API while
puting back the original interface of the old API.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: boot tracer startup modified
The boot tracer calls into some of the schedule tracing private functions
that should not be exported. This patch cleans it up, and makes
way for further changes in the ftrace infrastructure.
This patch adds a api to assign a tracer array to the schedule
context switch tracer.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix of output of set_ftrace_filter
Commit ftrace: do not show freed records in available_filter_functions
Removed a bit too much from the set_ftrace_filter code, where we now see
all functions in the set_ftrace_filter file even when we set a filter.
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: cleanup
The #if/#endif is ugly. Change SCHED_CPUMASK_ALLOC and
SCHED_CPUMASK_FREE to static inline functions.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Impact: fix rare memory leak in the sched-domains manual reconfiguration code
In the failure path, rd is not attached to a sched domain,
so it causes a leak.
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
Block: use round_jiffies_up()
Add round_jiffies_up and related routines
block: fix __blkdev_get() for removable devices
generic-ipi: fix the smp_mb() placement
blk: move blk_delete_timer call in end_that_request_last
block: add timer on blkdev_dequeue_request() not elv_next_request()
bio: define __BIOVEC_PHYS_MERGEABLE
block: remove unused ll_new_mergeable()