Commit graph

21812 commits

Author SHA1 Message Date
Steven Rostedt (Red Hat)
4ee4301c4b tracing: Only create branch tracer options when compiled in
When the branch tracer is not compiled in, do not create the option files
associated to it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 13:23:59 -04:00
Steven Rostedt (Red Hat)
729358da95 tracing: Only create function graph options when it is compiled in
Do not create fuction graph tracer options when function graph tracer is not
even compiled in.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 13:23:58 -04:00
Steven Rostedt (Red Hat)
a3418a364e tracing: Use TRACE_FLAGS macro to keep enums and strings matched
Use a cute little macro trick to keep the names of the trace flags file
guaranteed to match the corresponding masks.

The macro TRACE_FLAGS is defined as a serious of enum names followed by
the string name of the file that matches it. For example:

 #define TRACE_FLAGS						\
		C(PRINT_PARENT,		"print-parent"),	\
		C(SYM_OFFSET,		"sym-offset"),		\
		C(SYM_ADDR,		"sym-addr"),		\
		C(VERBOSE,		"verbose"),

Now we can define the following:

 #undef C
 #define C(a, b) TRACE_ITER_##a##_BIT
 enum trace_iterator_bits { TRACE_FLAGS };

The above creates:

 enum trace_iterator_bits {
	TRACE_ITER_PRINT_PARENT_BIT,
	TRACE_ITER_SYM_OFFSET_BIT,
	TRACE_ITER_SYM_ADDR_BIT,
	TRACE_ITER_VERBOSE_BIT,
 };

Then we can redefine C as:

 #undef C
 #define C(a, b) TRACE_ITER_##a = (1 << TRACE_ITER_##a##_BIT)
 enum trace_iterator_flags { TRACE_FLAGS };

Which creates:

 enum trace_iterator_flags {
	TRACE_ITER_PRINT_PARENT	= (1 << TRACE_ITER_PRINT_PARENT_BIT),
	TRACE_ITER_SYM_OFFSET	= (1 << TRACE_ITER_SYM_OFFSET_BIT),
	TRACE_ITER_SYM_ADDR	= (1 << TRACE_ITER_SYM_ADDR_BIT),
	TRACE_ITER_VERBOSE	= (1 << TRACE_ITER_VERBOSE_BIT),
 };

Then finally we can create the list of file names:

 #undef C
 #define C(a, b) b
 static const char *trace_options[] = {
	TRACE_FLAGS
	NULL
 };

Which creates:
 static const char *trace_options[] = {
	"print-parent",
	"sym-offset",
	"sym-addr",
	"verbose",
	NULL
 };

The importance of this is that the strings match the bit index.

	trace_options[TRACE_ITER_SYM_ADDR_BIT] == "sym-addr"

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 13:23:57 -04:00
Steven Rostedt (Red Hat)
ce3fed628e tracing: Use enums instead of hard coded bitmasks for TRACE_ITER flags
Using enums with FLAG_BIT and then defining a FLAG = (1 << FLAG_BIT), is a
bit more robust as we require that there are no bits out of order or skipped
to match the file names that represent the bits.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 13:23:56 -04:00
Steven Rostedt (Red Hat)
938db5f569 tracing: Remove unused tracing option "ftrace_preempt"
There was a time where the function tracing would disable interrupts unless
specifically told not to, where it would only disable preemption. With the
new lockless code, the function tracing never disalbes interrupts and just
uses disabling of preemption. Remove the option "ftrace_preempt" as it does
nothing anyway.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 13:23:54 -04:00
Steven Rostedt (Red Hat)
03905582fd tracing: Move "display-graph" option to main options
In order to facilitate making all tracer options visible even when the
tracer is not active, we need to get rid of duplicate options. Any option
that is shared between multiple tracers really should be a main option.

As the wakeup and irqsoff tracers both use the "display-graph" option, and
use it exactly the same way, move that option from the tracer options to the
main options and consolidate them.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-29 12:56:40 -04:00
Steven Rostedt (Red Hat)
ef92480a58 tracing: Turn seq_print_user_ip() into a static function
seq_print_user_ip() is used in only one location in one file. Turn it into a
static function. We could inject its code into the caller, but that would
make the code a bit too complex. Keep the code separate.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-28 10:16:12 -04:00
Steven Rostedt (Red Hat)
6b1032d53c tracing: Inject seq_print_userip_objs() into its only user
seq_print_userip_objs() is used only in one location, in one file. Instead
of having it as an external function, go one further than making it static,
but inject is code into its only user. It doesn't make the calling function
much more complex.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-28 10:11:44 -04:00
Steven Rostedt (Red Hat)
ca475e831f tracing: Make ftrace_trace_stack() static
ftrace_trace_stack() is not called outside of trace.c. Make it a static
function.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-28 09:41:11 -04:00
Geliang Tang
18ab2cd3ee perf/core, perf/x86: Change needlessly global functions and a variable to static
Fixes various sparse warnings.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/70c14234da1bed6e3e67b9c419e2d5e376ab4f32.1443367286.git.geliangtang@163.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:09:52 +02:00
Ingo Molnar
6afc0c269c Merge branch 'linus' into perf/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:06:57 +02:00
Ingo Molnar
7c4f1c694b Merge branch 'for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/urgent
Pull RCU fixes from Paul E. McKenney, for two regressions
introduced in this merge window:

  - Fix bug with recent GCCs.
  - Fix false positive lockdep splat.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-28 08:03:52 +02:00
Linus Torvalds
e3be4266d3 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
 "Another pile of fixes for perf:

   - Plug overflows and races in the core code

   - Sanitize the flow of the perf syscall so we error out before
     handling the more complex and hard to undo setups

   - Improve and fix Broadwell and Skylake hardware support

   - Revert a fix which broke what it tried to fix in perf tools

   - A couple of smaller fixes in various places of perf tools"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix copying of /proc/kcore
  perf intel-pt: Remove no_force_psb from documentation
  perf probe: Use existing routine to look for a kernel module by dso->short_name
  perf/x86: Change test_aperfmperf() and test_intel() to static
  tools lib traceevent: Fix string handling in heterogeneous arch environments
  perf record: Avoid infinite loop at buildid processing with no samples
  perf: Fix races in computing the header sizes
  perf: Fix u16 overflows
  perf: Restructure perf syscall point of no return
  perf/x86/intel: Fix Skylake FRONTEND MSR extrareg mask
  perf/x86/intel/pebs: Add PEBS frontend profiling for Skylake
  perf/x86/intel: Make the CYCLE_ACTIVITY.* constraint on Broadwell more specific
  perf tools: Bool functions shouldn't return -1
  tools build: Add test for presence of __get_cpuid() gcc builtin
  tools build: Add test for presence of numa_num_possible_cpus() in libnuma
  Revert "perf symbols: Fix mismatched declarations for elf_getphdrnum"
  perf stat: Fix per-pkg event reporting bug
2015-09-27 12:51:39 -04:00
Linus Torvalds
73f479b243 Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
 "A single bug fix for the scheduler to prevent dequeueing of the idle
  task when setting the cpus allowed mask"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched: Fix crash trying to dequeue/enqueue the idle thread
2015-09-27 12:50:27 -04:00
Linus Torvalds
fc11a9c5da Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
 "A single bugfix for lockdep to preserve the pinning counter when
  rebuilding the lock stack"

* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  locking/lockdep: Fix hlock->pin_count reset on lock stack rebuilds
2015-09-27 12:47:20 -04:00
Steven Rostedt (Red Hat)
b7f0c959ed tracing: Pass trace_array into trace_buffer_unlock_commit()
In preparation for having trace options be per instance, the trace_array
needs to be passed to the trace_buffer_unlock_commit(). The
trace_event_buffer_lock_reserve() already passes in the trace_event_file
where the trace_array can be derived from.

Also added a "__init" to the boot up test event plus function tracing
function function_test_events_call().

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-25 17:38:44 -04:00
Tejun Heo
a3e72739b7 cgroup: fix too early usage of static_branch_disable()
49d1dc4b81 ("cgroup: implement static_key based
cgroup_subsys_enabled() and cgroup_subsys_on_dfl()") converted cgroup
enabled test to use static_key; however, cgroup_disable() is called
before static_key subsystem itself is initialized and thus leads to
the following warning when "cgroup_disable=" parameter is specified.

 WARNING: CPU: 0 PID: 0 at kernel/jump_label.c:99 static_key_slow_dec+0x44/0x60()
 static_key_slow_dec used before call to jump_label_init
 ...
 Call Trace:
  [<ffffffff813b18c2>] dump_stack+0x44/0x62
  [<ffffffff8108dd52>] warn_slowpath_common+0x82/0xc0
  [<ffffffff8108ddec>] warn_slowpath_fmt+0x5c/0x80
  [<ffffffff8119c054>] static_key_slow_dec+0x44/0x60
  [<ffffffff81d826b6>] cgroup_disable+0xaf/0xd6
  [<ffffffff81d5f9de>] unknown_bootoption+0x8c/0x194
  [<ffffffff810b0c03>] parse_args+0x273/0x4a0
  [<ffffffff81d5fd67>] start_kernel+0x205/0x4b8
 ...

Fix it by making cgroup_disable() to record the subsystems to disable
in cgroup_disable_mask and moving the actual application to
cgroup_init() which is late enough and where the enabled state is
first used.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Andrey Wagin <avagin@gmail.com>
Link: http://lkml.kernel.org/g/CANaxB-yFuS4SA2znSvcKrO9L_CbHciHYW+o9bN8sZJ8eR9FxYA@mail.gmail.com
Fixes: 49d1dc4b81
2015-09-25 16:25:07 -04:00
Steven Rostedt (Red Hat)
41907416bc tracing: Remove unused function trace_current_buffer_lock_reserve()
trace_current_buffer_lock_reserve() is not used by anything. Might as well
get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-25 15:37:31 -04:00
Steven Rostedt (Red Hat)
d78a461427 tracing: Remove ftrace_trace_stack_regs()
ftrace_trace_stack_regs() is used in only one place, and because that is
such a simple function, just move its code into the location that it was
used in (trace_buffer_unlock_commit_regs()).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-25 15:37:23 -04:00
Juergen Gross
c6e1e7b5b7 sched/core: Make 'sched_domain_topology' declaration static
The 'sched_domain_topology' variable is only used within kernel/sched/core.c.
Make it static.

Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1442918939-9907-1-git-send-email-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 10:19:12 +02:00
Ingo Molnar
4bbffe718f Merge branch 'locking/urgent' into locking/core, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:52:03 +02:00
Juri Lelli
269b26a5ef sched/rt: Make (do_)balance_runtime() return void
The return value of (do_)balance_runtime() is not consumed by anybody.
Make them return void.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441188096-23021-5-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:51:26 +02:00
Juri Lelli
f52405757e sched/deadline, locking/rtmutex: Fix open coded check in rt_mutex_waiter_less()
rt_mutex_waiter_less() check of task deadlines is open coded. Since this
is subject to wraparound bugs, make it use the correct helper.

Reported-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441188096-23021-4-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:51:25 +02:00
Juri Lelli
2726d6ce38 sched/deadline: Unify dl_time_before() usage
Move dl_time_before() static definition in include/linux/sched/deadline.h
so that it can be used by different parties without being re-defined.

Reported-by: Luca Abeni <luca.abeni@unitn.it>
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1441188096-23021-3-git-send-email-juri.lelli@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:51:25 +02:00
Peter Zijlstra
21199f27b4 locking/lockdep: Fix hlock->pin_count reset on lock stack rebuilds
Various people reported hitting the "unpinning an unpinned lock"
warning. As it turns out there are 2 places where we take a lock out
of the middle of a stack, and in those cases it would fail to preserve
the pin_count when rebuilding the lock stack.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reported-by: Tim Spriggs <tspriggs@apple.com>
Tested-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davej@codemonkey.org.uk
Link: http://lkml.kernel.org/r/20150916141040.GA11639@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-23 09:48:53 +02:00
Andrea Arcangeli
ac5be6b47e userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key"
This reverts commit 51360155ec and adapts
fs/userfaultfd.c to use the old version of that function.

It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what we
insert in the two waitqueue heads of the blocked userfaults.  No
exclusive waitqueue risks to be inserted into those two waitqueue heads
so we can as well stick to "nr == 1" of the old code and we can rely
purely on the fact no waitqueue inserted in one of the two waitqueue
heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-09-22 15:09:53 -07:00
Yaowei Bai
f0132c4e0d kernel/trace_probe: is_good_name can be boolean
This patch makes is_good_name return bool to improve readability
due to this particular function only using either one or zero as its
return value.

No functional change.

Link: http://lkml.kernel.org/r/1442929393-4753-2-git-send-email-bywxiaobai@163.com

Signed-off-by: Yaowei Bai <bywxiaobai@163.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2015-09-22 13:11:30 -04:00
Tejun Heo
10265075aa cgroup: make cgroup_update_dfl_csses() migrate all target processes atomically
cgroup_update_dfl_csses() is responsible for migrating processes when
controllers are enabled or disabled on the default hierarchy.  As the
css association changes for all the processes in the affected cgroups,
this involves migrating multiple processes.

Up until now, it was implemented by migrating process-by-process until
the source css_sets are empty; however, this means that if a process
fails to migrate after some succeed before it, the recovery is very
tricky.  This was considered okay as subsystems weren't allowed to
reject process migration on the default hierarchy; unfortunately,
enforcing this policy turned out to be problematic for certain types
of resources - realtime slices for now.

As such, the default hierarchy is gonna allow restricted failures
during migration and to support that this patch makes
cgroup_update_dfl_csses() migrate all target processes atomically
rather than one-by-one.  The preceding patches made subsystems ready
for multi-process migration and factored out taskset operations making
this almost trivial.  All tasks of the target processes are put in the
same taskset and the migration operations are performed once which
either fails or succeeds for all.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2015-09-22 12:46:53 -04:00
Tejun Heo
adaae5dcf8 cgroup: separate out taskset operations from cgroup_migrate()
Currently, cgroup_migreate() implements large part of the migration
logic inline including building the target taskset and actually
migrating them.  This patch separates out the following taskset
operations.

 CGROUP_TASKSET_INIT()		: taskset initializer
 cgroup_taskset_add()		: add a task to a taskset
 cgroup_taskset_migrate()	: migrate a taskset to the destination cgroup

This will be used to implement atomic multi-process migration in
cgroup_update_dfl_csses().  This is pure reorganization which doesn't
introduce any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2015-09-22 12:46:53 -04:00
Tejun Heo
9af2ec45c2 cgroup: reorder cgroup_migrate()'s parameters
cgroup_migrate() has the destination cgroup as the first parameter
while cgroup_task_migrate() has the destination cset as the last.
Another migration function is scheduled to be added which can make the
discrepancy further stand out.  Let's reorder cgroup_migrate()'s
parameters so that the destination cgroup is the last.

This doesn't cause any functional difference.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2015-09-22 12:46:53 -04:00
Tejun Heo
4530eddb59 cgroup, memcg, cpuset: implement cgroup_taskset_for_each_leader()
It wasn't explicitly documented but, when a process is being migrated,
cpuset and memcg depend on cgroup_taskset_first() returning the
threadgroup leader; however, this approach is somewhat ghetto and
would no longer work for the planned multi-process migration.

This patch introduces explicit cgroup_taskset_for_each_leader() which
iterates over only the threadgroup leaders and replaces
cgroup_taskset_first() usages for accessing the leader with it.

This prepares both memcg and cpuset for multi-process migration.  This
patch also updates the documentation for cgroup_taskset_for_each() to
clarify the iteration rules and removes comments mentioning task
ordering in tasksets.

v2: A previous patch which added threadgroup leader test was dropped.
    Patch updated accordingly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2015-09-22 12:46:53 -04:00
Tejun Heo
3df9ca0a2b cpuset: migrate memory only for threadgroup leaders
If memory_migrate flag is set, cpuset migrates memory according to the
destnation css's nodemask.  The current implementation migrates memory
whenever any thread of a process is migrated making the behavior
somewhat arbitrary.  Let's tie memory operations to the threadgroup
leader so that memory is migrated only when the leader is migrated.

While this is a behavior change, given the inherent fuziness, this
change is not too likely to be noticed and allows us to clearly define
who owns the memory (always the leader) and helps the planned atomic
multi-process migration.

Note that we're currently migrating memory in migration path proper
while holding all the locks.  In the long term, this should be moved
out to an async work item.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Zefan Li <lizefan@huawei.com>
2015-09-22 12:46:53 -04:00
Rasmus Villemoes
ac742d3718 futex: Force hot variables into a single cache line
futex_hash() references two global variables: the base pointer
futex_queues and the size of the array futex_hashsize. The latter is
marked __read_mostly, while the former is not, so they are likely to
end up very far from each other. This means that futex_hash() is
likely to encounter two cache misses.

We could mark futex_queues as __read_mostly as well, but that doesn't
guarantee they'll end up next to each other (and even if they do, they
may still end up in different cache lines). So put the two variables
in a small singleton struct with sufficient alignment and mark that as
__read_mostly.

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: kbuild test robot <fengguang.wu@intel.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/1441834601-13633-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 16:23:15 +02:00
Huang Shijie
71f64340fc genirq: Remove the second parameter from handle_irq_event_percpu()
Actually, we always use the first irq action of the @desc->action
chain, so remove the second parameter from handle_irq_event_percpu()
which makes the code more tidy.

Signed-off-by: Huang Shijie <shijie.huang@arm.com>
Reviewed-by: Jiang Liu <jiang.liu@linux.intel.com>
Cc: peterz@infradead.org
Cc: marc.zyngier@arm.com
Link: http://lkml.kernel.org/r/1441160695-19809-1-git-send-email-shijie.huang@arm.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 16:14:55 +02:00
Dmitry Vyukov
3ed769bdb2 timers: Fix data race in timer_stats_account_timer()
timer_stats_account_timer() reads timer->start_site, then checks it
for NULL and then re-reads it again, while
timer_stats_timer_clear_start_info() can concurrently reset
timer->start_site to NULL. This should not lead to crashes, but can
double number of entries in timer stats as start_site is used during
comparison, the doubled entries will have unuseful NULL start_site.

Read timer->start_site only once in timer_stats_account_timer().

The data race was found with KernelThreadSanitizer (KTSAN).

Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Cc: andreyknvl@google.com
Cc: glider@google.com
Cc: kcc@google.com
Cc: ktsan@googlegroups.com
Cc: john.stultz@linaro.org
Link: http://lkml.kernel.org/r/1442584463-69553-1-git-send-email-dvyukov@google.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 15:43:18 +02:00
Zhen Lei
571af55a31 time: Fix spelling in comments
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tianhong Ding <dingtianhong@huawei.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Xinwei Hu <huxinwei@huawei.com>
Cc: Xunlei Pang <pang.xunlei@linaro.org>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1440484973-13892-1-git-send-email-thunder.leizhen@huawei.com
[ Fixed yet another typo in one of the sentences fixed. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-09-22 12:54:23 +02:00
Thomas Gleixner
2a1d3ab898 genirq: Handle force threading of irqs with primary and thread handler
Force threading of interrupts does not really deal with interrupts
which are requested with a primary and a threaded handler. The current
policy is to leave them alone and let the primary handler run in
interrupt context, but we set the ONESHOT flag for those interrupts as
well.

Kohji Okuno debugged a problem with the SDHCI driver where the
interrupt thread waits for a hardware interrupt to trigger, which can't
work well because the hardware interrupt is masked due to the ONESHOT
flag being set. He proposed to set the ONESHOT flag only if the
interrupt does not provide a thread handler.

Though that does not work either because these interrupts can be
shared. So the other interrupt would rightfully get the ONESHOT flag
set and therefor the same situation would happen again.

To deal with this proper, we need to force thread the primary handler
of such interrupts as well. That means that the primary interrupt
handler is treated as any other primary interrupt handler which is not
marked IRQF_NO_THREAD. The threaded handler becomes a separate thread
so the SDHCI flow logic can be handled gracefully.

The same issue was reported against 4.1-rt.

Reported-and-tested-by: Kohji Okuno <okuno.kohji@jp.panasonic.com>
Reported-By: Michal Smucr <msmucr@gmail.com>
Reported-and-tested-by: Nathan Sullivan <nathan.sullivan@ni.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1509211058080.5606@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2015-09-22 12:39:57 +02:00
Linus Torvalds
bcee19f424 Merge branch 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
 "The threadgroup locking changes which went in during 4.2 devel cycle
  added write locking of a percpu_rwsem in cgroup task migration path;
  unfortunately, that involved expedited rcu syncing which turned out to
  be too slow and heavy for certain workloads.  The patchset which is
  dependent on this one didn't get committed during that devel cycle, so
  these two patches can be reverted safely.

  Oleg reworked percpu_rwsem for 4.4 so that the writer path is a lot
  lighter.  The reported issue goes away with Oleg's reworked
  percpu_rwsem and I'll reapply these patches on the for-4.4 branch so
  that they can land together with Oleg's changes"

* 'for-4.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  Revert "sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem"
  Revert "cgroup: simplify threadgroup locking"
2015-09-21 18:26:54 -07:00
Paul E. McKenney
5b74c45890 rcu: Make ->cpu_no_qs be a union for aggregate OR
This commit converts the rcu_data structure's ->cpu_no_qs field
to a union.  The bytewise side of this union allows individual access
to indications as to whether this CPU needs to find a quiescent state
for a normal (.norm) and/or expedited (.exp) grace period.  The setwise
side of the union allows testing whether or not a quiescent state is
needed at all, for either type of grace period.

For now, only .norm is used.  A later commit will introduce the expedited
usage.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
0d43eb34f9 rcu: Invert passed_quiesce and rename to cpu_no_qs
This commit inverts the sense of the rcu_data structure's ->passed_quiesce
field and renames it to ->cpu_no_qs.  This will allow a later commit to
use an "aggregate OR" operation to test expedited as well as normal grace
periods without added overhead.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:21 -07:00
Paul E. McKenney
97c668b8e9 rcu: Rename qs_pending to core_needs_qs
An upcoming commit needs to invert the sense of the ->passed_quiesce
rcu_data structure field, so this commit is taking this opportunity
to clarify things a bit by renaming ->qs_pending to ->core_needs_qs.

So if !rdp->core_needs_qs, then this CPU need not concern itself with
quiescent states, in particular, it need not acquire its leaf rcu_node
structure's ->lock to check.  Otherwise, it needs to report the next
quiescent state.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
bce5fa12aa rcu: Move synchronize_sched_expedited() to combining tree
Currently, synchronize_sched_expedited() uses a single global counter
to track the number of remaining context switches that the current
expedited grace period must wait on.  This is problematic on large
systems, where the resulting memory contention can be pathological.
This commit therefore makes synchronize_sched_expedited() instead use
the combining tree in the same manner as synchronize_rcu_expedited(),
keeping memory contention down to a dull roar.

This commit creates a temporary function sync_sched_exp_select_cpus()
that is very similar to sync_rcu_exp_select_cpus().  A later commit
will consolidate these two functions, which becomes possible when
synchronize_sched_expedited() switches from stop_one_cpu_nowait() to
smp_call_function_single().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:20 -07:00
Paul E. McKenney
8203d6d0ee rcu: Use single-stage IPI algorithm for RCU expedited grace period
The current preemptible-RCU expedited grace-period algorithm invokes
synchronize_sched_expedited() to enqueue all tasks currently running
in a preemptible-RCU read-side critical section, then waits for all the
->blkd_tasks lists to drain.  This works, but results in both an IPI and
a double context switch even on CPUs that do not happen to be running
in a preemptible RCU read-side critical section.

This commit implements a new algorithm that causes less OS jitter.
This new algorithm IPIs all online CPUs that are not idle (from an
RCU perspective), but refrains from self-IPIs.  If a CPU receiving
this IPI is not in a preemptible RCU read-side critical section (or
is just now exiting one), it pushes quiescence up the rcu_node tree,
otherwise, it sets a flag that will be handled by the upcoming outermost
rcu_read_unlock(), which will then push quiescence up the tree.

The expedited grace period must of course wait on any pre-existing blocked
readers, and newly blocked readers must be queued carefully based on
the state of both the normal and the expedited grace periods.  This
new queueing approach also avoids the need to update boost state,
courtesy of the fact that blocked tasks are no longer ever migrated to
the root rcu_node structure.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:19 -07:00
Paul E. McKenney
b9585e940a rcu: Consolidate tree setup for synchronize_rcu_expedited()
This commit replaces sync_rcu_preempt_exp_init1(() and
sync_rcu_preempt_exp_init2() with sync_exp_reset_tree_hotplug()
and sync_exp_reset_tree(), which will also be used by
synchronize_sched_expedited(), and sync_rcu_exp_select_nodes(), which
contains code specific to synchronize_rcu_expedited().

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
7922cd0e56 rcu: Move rcu_report_exp_rnp() to allow consolidation
This is a nearly pure code-movement commit, moving rcu_report_exp_rnp(),
sync_rcu_preempt_exp_done(), and rcu_preempted_readers_exp() so
that later commits can make synchronize_sched_expedited() use them.
The non-code-movement portion of this commit tags rcu_report_exp_rnp()
as __maybe_unused to avoid build errors when CONFIG_PREEMPT=n.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:18 -07:00
Paul E. McKenney
f4ecea309d rcu: Use rsp->expedited_wq instead of sync_rcu_preempt_exp_wq
Now that there is an ->expedited_wq waitqueue in each rcu_state structure,
there is no need for the sync_rcu_preempt_exp_wq global variable.  This
commit therefore substitutes ->expedited_wq for sync_rcu_preempt_exp_wq.
It also initializes ->expedited_wq only once at boot instead of at the
start of each expedited grace period.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:16:17 -07:00
Paul E. McKenney
19a5ecde08 rcu: Suppress lockdep false positive for rcp->exp_funnel_mutex
In kernels built with CONFIG_PREEMPT=y, synchronize_rcu_expedited()
invokes synchronize_sched_expedited() while holding RCU-preempt's
root rcu_node structure's ->exp_funnel_mutex, which is acquired after
the rcu_data structure's ->exp_funnel_mutex.  The first thing that
synchronize_sched_expedited() will do is acquire RCU-sched's rcu_data
structure's ->exp_funnel_mutex.   There is no danger of an actual deadlock
because the locking order is always from RCU-preempt's expedited mutexes
to those of RCU-sched.  Unfortunately, lockdep considers both rcu_data
structures' ->exp_funnel_mutex to be in the same lock class and therefore
reports a deadlock cycle.

This commit silences this false positive by placing RCU-sched's rcu_data
structures' ->exp_funnel_mutex locks into their own lock class.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
2015-09-20 21:01:22 -07:00
Tejun Heo
6f60eade24 cgroup: generalize obtaining the handles of and notifying cgroup files
cgroup core handles creations and removals of cgroup interface files
as described by cftypes.  There are cases where the handle for a given
file instance is necessary, for example, to generate a file modified
event.  Currently, this is handled by explicitly matching the callback
method pointer and storing the file handle manually in
cgroup_add_file().  While this simple approach works for cgroup core
files, it can't for controller interface files.

This patch generalizes cgroup interface file handle handling.  struct
cgroup_file is defined and each cftype can optionally tell cgroup core
to store the file handle by setting ->file_offset.  A file handle
remains accessible as long as the containing css is accessible.

Both "cgroup.procs" and "cgroup.events" are converted to use the new
generic mechanism instead of hooking directly into cgroup_add_file().
Also, cgroup_file_notify() which takes a struct cgroup_file and
generates a file modified event on it is added and replaces explicit
kernfs_notify() invocations.

This generalizes cgroup file handle handling and allows controllers to
generate file modified notifications.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2015-09-18 17:54:23 -04:00
Tejun Heo
4df8dc9031 cgroup: restructure file creation / removal handling
The file creation / removal path has always been a bit icky and the
planned notification update requires css during file creation.
Restructure as follows.

* cgroup_addrm_files() now takes both @css and @cgrp and is only
  called directly by other file handling functions.

* cgroup_populate/clear_dir() are replaced with
  css_populate/clear_dir() taking @css and @cgrp_override.
  @cgrp_override is used only when files needs to be created on /
  removed from a cgroup which isn't attached to @css which happens
  during subsystem rebinds.  Subsystem loops are moved to the callers.

* cgroup_add_file() now takes both @css and @cgrp.  @css isn't used
  yet but will be used by the planned notification update.

This patch doens't cause any behavior changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2015-09-18 17:54:23 -04:00
Tejun Heo
1ada48387a cgroup: cosmetic updates to rebind_subsystems()
* Use local variables @scgrp and @dcgrp for @src_root->cgrp and
  @dst_root->cgrp respectively.

* Use initializers to set @src_root and @css in the inner bind loop.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
2015-09-18 17:54:23 -04:00