Commit graph

22582 commits

Author SHA1 Message Date
John Stultz
78c7b55b36 timekeeping: Fix __ktime_get_fast_ns() regression
commit 58bfea9532552d422bde7afa207e1a0f08dffa7d upstream.

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg <bgregg@netflix.com>
Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Tested-and-reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:14 +02:00
Christopher S. Hall
9b57d91c03 time: Add cycles to nanoseconds translation
commit 6bd58f09e1d8cc6c50a824c00bf0d617919986a1 upstream.

The timekeeping code does not currently provide a way to translate
externally provided clocksource cycles to system time. The cycle count
is always provided by the result clocksource read() method internal to
the timekeeping code. The added function timekeeping_cycles_to_ns()
calculated a nanosecond value from a cycle count that can be added to
tk_read_base.base value yielding the current system time. This allows
clocksource cycle values external to the timekeeping code to provide a
cycle count that can be transformed to system time.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: kevin.b.stanton@intel.com
Cc: kevin.j.clarke@intel.com
Cc: hpa@zytor.com
Cc: jeffrey.t.kirsher@intel.com
Cc: netdev@vger.kernel.org
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christopher S. Hall <christopher.s.hall@intel.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-16 17:36:14 +02:00
Linux Build Service Account
4de07155f9 Merge "sched/cgroup: Fix/cleanup cgroup teardown/init" 2016-10-13 19:11:24 -07:00
Linux Build Service Account
a6138ccde2 Merge "sched: bucketize CPU c-state levels" 2016-10-13 12:29:04 -07:00
Linux Build Service Account
daa54e82ab Merge "sched: use wakeup latency as c-state determinant" 2016-10-13 12:29:04 -07:00
Joonwoo Park
825b7ef93a sched: bucketize CPU c-state levels
C-state aware scheduler takes note of wakeup latency of each c-state
level to determine whether to pack or wake up LPM CPU.  But it doesn't
distinguish small and large delta as it's inefficient for scheduler to
do so on its critical path.

Disregard wakeup latencies less than 64 us between different c-state
levels.  This reduces unnecessary task packing.

CRs-fixed: 1074879
Change-Id: Ib0cadbd390d1a0b6da3e39c98010cedb43e5bf60
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-10-12 14:14:16 -07:00
Joonwoo Park
15d2c97d2a sched: use wakeup latency as c-state determinant
C-state aware scheduler at present, uses a raw c-state index number as
its determinant and avoids task placement on deeper c-state CPUs at
cost of latency.  However there are CPUs offering comparable wake-up
latency at different c-state levels and the wake-up latency at each
c-state levels are already have being fed to scheduler.

Hence use the wakeup_latency as c-state determinant instead of raw
c-state index to avoid unnecessary task packing where it's doable.

CRs-fixed: 1074879
Change-Id: If927f84f6c8ba719716d99669e5d1f1b19aaacbe
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-10-12 14:14:06 -07:00
Syed Rameez Mustafa
2a5b04bf9b sched/tune: Remove redundant checks for NULL css
The check for NULL css is redundant as upper layers are already
making sure that css cannot be NULL. Remove this check. It helps
to silence static analysis errors as well.

Change-Id: I64585ff8cceb307904e20ff788e52eb05c000e1f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-12 13:28:03 -07:00
Paul Moore
f885566c5e BACKPORT: audit: consistently record PIDs with task_tgid_nr()
Unfortunately we record PIDs in audit records using a variety of
methods despite the correct way being the use of task_tgid_nr().
This patch converts all of these callers, except for the case of
AUDIT_SET in audit_receive_msg() (see the comment in the code).

Reported-by: Jeff Vander Stoep <jeffv@google.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>

Bug: 28952093

(cherry picked from commit fa2bea2f5cca5b8d4a3e5520d2e8c0ede67ac108)
Signed-off-by: Jeff Vander Stoep <jeffv@google.com>
Change-Id: If6645f9de8bc58ed9755f28dc6af5fbf08d72a00
2016-10-12 17:34:22 +05:30
John Stultz
fc1d6c8c6a sched: Add Kconfig option DEFAULT_USE_ENERGY_AWARE to set ENERGY_AWARE feature flag
The ENERGY_AWARE sched feature flag cannot be set unless
CONFIG_SCHED_DEBUG is enabled.

So this patch allows the flag to default to true at build time
if the config is set.

Change-Id: I8835a571fdb7a8f8ee6a54af1e11a69f3b5ce8e6
Signed-off-by: John Stultz <john.stultz@linaro.org>
2016-10-12 17:34:22 +05:30
Caesar Wang
ca4b83c547 sched/fair: remove printk while schedule is in progress
It will cause deadlock and while(1) if call printk while schedule is in
progress. The block state like as below:

cpu0(hold the console sem):
printk->console_unlock->up_sem->spin_lock(&sem->lock)->wake_up_process(cpu1)
->try_to_wake_up(cpu1)->while(p->on_cpu).

cpu1(request console sem):
console_lock->down_sem->schedule->idle_banlance->update_cpu_capacity->
printk->console_trylock->spin_lock(&sem->lock).

p->on_cpu will be 1 forever, because the task is still running on cpu1,
so cpu0 is blocked in while(p->on_cpu), but cpu1 could not get
spin_lock(&sem->lock), it is blocked too, it means the task will running
on cpu1 forever.

Signed-off-by: Caesar Wang <wxt@rock-chips.com>
2016-10-12 17:34:22 +05:30
Chris Redpath
27f0430c61 sched/walt: Drop arch-specific timer access
On at least one platform, occasionally the timer providing the wallclock
was able to be reset/go backwards for at least some time after wakeup.

Accept that this might happen and warn the first time, but otherwise just
carry on.

Change-Id: Id3164477ba79049561af7f0889cbeebc199ead4e
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2016-10-12 17:34:22 +05:30
Srinath Sridharan
09c5f91afe eas/sched/fair: Fixing comments in find_best_target.
Change-Id: I83f5b9887e98f9fdb81318cde45408e7ebfc4b13
Signed-off-by: Srinath Sridharan <srinathsr@google.com>
2016-10-12 17:34:22 +05:30
Alex Shi
16d185eee4 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android
Conflicts:
	kernel/cpuset.c
2016-10-11 23:33:37 +02:00
Syed Rameez Mustafa
2640728359 sched: Add cgroup attach functionality to the tune controller
This is required to allow tasks to freely move between cgroups associated
with the tune controller.

Change-Id: I1f39b957462034586edc2fdc0a35488b314e9c8c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-10 11:10:42 -07:00
Syed Rameez Mustafa
14b52227eb sched: Update the number of tune groups to 5
The schedtune controller will mimic the cpusets controller configuration
for now. For that we need to make 4 groups in addition to the root
group present by default.

Change-Id: I082f1e4e4ebf863e623cf66ee127eac70a3e2716
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-10 11:10:41 -07:00
Patrick Bellasi
e89595cd93 sched/tune: add initial support for CGroups based boosting
To support task performance boosting, the usage of a single knob has the
advantage to be a simple solution, both from the implementation and the
usability standpoint.  However, on a real system it can be difficult to
identify a single value for the knob which fits the needs of multiple
different tasks. For example, some kernel threads and/or user-space
background services should be better managed the "standard" way while we
still want to be able to boost the performance of specific workloads.

In order to improve the flexibility of the task boosting mechanism this
patch is the first of a small series which extends the previous
implementation to introduce a "per task group" support.
This first patch introduces just the basic CGroups support, a new
"schedtune" CGroups controller is added which allows to configure
different boost value for different groups of tasks.
To keep the implementation simple but still effective for a boosting
strategy, the new controller:
  1. allows only a two layer hierarchy
  2. supports only a limited number of boost groups

A two layer hierarchy allows to place each task either:
  a) in the root control group
     thus being subject to a system-wide boosting value
  b) in a child of the root group
     thus being subject to the specific boost value defined by that
     "boost group"

The limited number of "boost groups" supported is mainly motivated by
the observation that in a real system it could be useful to have only
few classes of tasks which deserve different treatment.
For example, background vs foreground or interactive vs low-priority.
As an additional benefit, a limited number of boost groups allows also
to have a simpler implementation especially for the code required to
compute the boost value for CPUs which have runnable tasks belonging to
different boost groups.

Change-Id: I1304e33a8440bfdad9c8bcf8129ff390216f2e32
cc: Tejun Heo <tj@kernel.org>
cc: Li Zefan <lizefan@huawei.com>
cc: Johannes Weiner <hannes@cmpxchg.org>
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 13001f47c9
Git-repo: https://android.googlesource.com/kernel/common
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2016-10-10 11:09:53 -07:00
Prasad Sodagudi
4e38f843b6 genirq: Avoid race between cpu hot plug and irq_desc() allocation paths
One of the core might have just allocated irq_desc() and other core
might be doing irq migration in the hot plug path. In the hot plug path
during the IRQ migration, for_each_active_irq macro is trying to get
irqs whose bits are set in allocated_irqs bit map but there is no return
value check after irq_to_desc for desc validity.

[   24.566381] msm_thermal:do_core_control Set Offline: CPU4 Temp: 73
[   24.568821] Unable to handle kernel NULL pointer dereference at virtual address 000000a4
[   24.568931] pgd = ffffffc002184000
[   24.568995] [000000a4] *pgd=0000000178df5003, *pud=0000000178df5003, *pmd=0000000178df6003, *pte=0060000017a00707
[   24.569153] ------------[ cut here ]------------
[   24.569228] Kernel BUG at ffffffc0000f3060 [verbose debug info unavailable]
[   24.569334] Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
[   24.569422] Modules linked in:
[   24.569486] CPU: 4 PID: 28 Comm: migration/4 Tainted: G        W       4.4.8-perf-9407222-eng #1
[   24.569684] task: ffffffc0f28f0e80 ti: ffffffc0f2a84000 task.ti: ffffffc0f2a84000
[   24.569785] PC is at do_raw_spin_lock+0x20/0x160
[   24.569859] LR is at _raw_spin_lock+0x34/0x40
[   24.569931] pc : [<ffffffc0000f3060>] lr : [<ffffffc001023dec>] pstate: 200001c5
[   24.570029] sp : ffffffc0f2a87bc0
[   24.570091] x29: ffffffc0f2a87bc0 x28: ffffffc001033988
[   24.570174] x27: ffffffc001adebb0 x26: 0000000000000000
[   24.570257] x25: 00000000000000a0 x24: 0000000000000020
[   24.570339] x23: ffffffc001033978 x22: ffffffc001adeb80
[   24.570421] x21: 000000000000027e x20: 0000000000000000
[   24.570502] x19: 00000000000000a0 x18: 000000000000000d
[   24.570584] x17: 0000000000005f00 x16: 0000000000000003
[   24.570666] x15: 000000000000bd39 x14: 0ffffffffffffffe
[   24.570748] x13: 0000000000000000 x12: 0000000000000018
[   24.570829] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[   24.570911] x9 : fefefefeff332e6d x8 : 7f7f7f7f7f7f7f7f
[   24.570993] x7 : ffffffc00344f6a0 x6 : 0000000000000000
[   24.571075] x5 : 0000000000000001 x4 : ffffffc00344f488
[   24.571157] x3 : 0000000000000000 x2 : 0000000000000000
[   24.571238] x1 : ffffffc0f2a84000 x0 : 0000000000004ead
...
...
...
[   24.581324] Call trace:
[   24.581379] [<ffffffc0000f3060>] do_raw_spin_lock+0x20/0x160
[   24.581463] [<ffffffc001023dec>] _raw_spin_lock+0x34/0x40
[   24.581546] [<ffffffc000103f10>] irq_migrate_all_off_this_cpu+0x84/0x1c4
[   24.581641] [<ffffffc00008ec84>] __cpu_disable+0x54/0x74
[   24.581722] [<ffffffc0000a3368>] take_cpu_down+0x1c/0x58
[   24.581803] [<ffffffc00013ac08>] multi_cpu_stop+0xb0/0x10c
[   24.581885] [<ffffffc00013ad60>] cpu_stopper_thread+0xb8/0x184
[   24.581972] [<ffffffc0000c4920>] smpboot_thread_fn+0x26c/0x290
[   24.582057] [<ffffffc0000c0f84>] kthread+0x100/0x108
[   24.582135] Code: aa0003f3 aa1e03e0 d503201f 5289d5a0 (b9400661)
[   24.582224] ---[ end trace 609f38584306f5d9 ]---

CRs-Fixed: 1074611
Change-Id: I6cc5399e511b6d62ec7fbc4cac21f4f41023520e
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
2016-10-07 13:07:10 -07:00
Michal Hocko
82b7839a40 kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d01152 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d01152 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Tested-by: William Preston <wpreston@suse.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Roland McGrath <roland@hack.frob.com>
Cc: Andreas Schwab <schwab@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:46 +02:00
Subash Abhinov Kasiviswanathan
70cd763eb1 sysctl: handle error writing UINT_MAX to u32 fields
commit e7d316a02f683864a12389f8808570e37fb90aa3 upstream.

We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels.  An entry which we write to here is
xfrm_aevent_rseqth which is u32.

  echo 4294967295  > /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs.  However, this does not apply to unsigned integers.

Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX.  This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax.  Alternatively,
we would need to change the datatype of the entry to 64 bit.

  static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
  {
      i = (unsigned long *) data;   //This cast is causing to read beyond the size of data (u32)
      vleft = table->maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.

Introduce a new proc handler proc_douintvec.  Individual proc entries
will need to be updated to use the new handler.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:46 +02:00
Nicolas Iooss
6b502c1d73 printk: fix parsing of "brl=" option
commit ae6c33ba6e37eea3012fe2640b22400ef3f2d0f3 upstream.

Commit bbeddf52ad ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**.  In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.

Fixes: bbeddf52ad ("printk: move braille console support into separate braille.[ch] files")
Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss <nicolas.iooss_linux@m4x.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:43 +02:00
Mark Rutland
711a78a792 perf/core: Fix pmu::filter_match for SW-led groups
commit 2c81a6477081966fe80b8c6daa68459bca896774 upstream.

The following commit:

  66eb579e66 ("perf: allow for PMU-specific event filtering")

added the pmu::filter_match() callback. This was intended to
avoid HW constraints on events from resulting in extremely
pessimistic scheduling.

However, pmu::filter_match() is only called for the leader of each event
group. When the leader is a SW event, we do not filter the groups, and
may fail at pmu::add() time, and when this happens we'll give up on
scheduling any event groups later in the list until they are rotated
ahead of the failing group.

This can result in extremely sub-optimal event scheduling behaviour,
e.g. if running the following on a big.LITTLE platform:

$ taskset -c 0 ./perf stat \
 -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \
 -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \
 ls

     <not counted>      context-switches                                              (0.00%)
     <not counted>      armv8_cortex_a57/config=0x11/                                 (0.00%)
                24      context-switches                                              (37.36%)
          57589154      armv8_cortex_a53/config=0x11/                                 (37.36%)

Here the 'a53' event group was always eligible to be scheduled, but
the 'a57' group never eligible to be scheduled, as the task was always
affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57'
group was eligible, but the HW event failed at pmu::add() time,
resulting in ctx_flexible_sched_in giving up on scheduling further
groups with HW events.

One way of avoiding this is to check pmu::filter_match() on siblings
as well as the group leader. If any of these fail their
pmu::filter_match() call, we must skip the entire group before
attempting to add any events.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Fixes: 66eb579e66 ("perf: allow for PMU-specific event filtering")
Link: http://lkml.kernel.org/r/1465917041-15339-1-git-send-email-mark.rutland@arm.com
[ Small readability edits. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:41 +02:00
Joonwoo Park
8132ffc977 cpuset: handle race between CPU hotplug and cpuset_hotplug_work
commit 28b89b9e6f7b6c8fef7b3af39828722bca20cfee upstream.

A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 > cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 > cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   <snip>
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
   [<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
   [<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
   [<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
   [<ffffffc000085cb0>] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-10-07 15:23:40 +02:00
Linux Build Service Account
9a87b7449d Merge "sched/tune: add sysctl interface to define a boost value" 2016-10-06 19:45:49 -07:00
Linux Build Service Account
6312438224 Merge "sched: Fix a division by zero bug in scale_exec_time()" 2016-10-06 01:07:16 -07:00
Linux Build Service Account
9d4ed2cb20 Merge "sched: Fix integer overflow in sched_update_nr_prod()" 2016-10-05 19:29:24 -07:00
Linux Build Service Account
fb89803f09 Merge "sched: Add a device tree property to specify the sched boost type" 2016-10-05 19:29:18 -07:00
Linux Build Service Account
3ff37b4bac Merge "RFC: FROMLIST: cgroup: reduce read locked section of cgroup_threadgroup_rwsem during fork" 2016-10-05 19:29:13 -07:00
Patrick Bellasi
754a122792 sched/tune: add sysctl interface to define a boost value
The current (CFS) scheduler implementation does not allow "to boost"
tasks performance by running them at a higher OPP compared to the
minimum required to meet their workload demands.

To support tasks performance boosting the scheduler should provide a
"knob" which allows to tune how much the system is going to be optimised
for energy efficiency vs performance.

This patch is the first of a series which provides a simple interface to
define a tuning knob. One system-wide "boost" tunable is exposed via:
  /proc/sys/kernel/sched_cfs_boost
which can be configured in the range [0..100], to define a percentage
where:
  - 0%   boost requires to operate in "standard" mode by scheduling
         tasks at the minimum capacities required by the workload demand
  - 100% boost requires to push at maximum the task performances,
         "regardless" of the incurred energy consumption

A boost value in between these two boundaries is used to bias the
power/performance trade-off, the higher the boost value the more the
scheduler is biased toward performance boosting instead of energy
efficiency.

Change-Id: I59a41725e2d8f9238a61dfb0c909071b53560fc0
cc: Ingo Molnar <mingo@redhat.com>
cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Git-commit: 63c8fad2b06805ef88f1220551289f0a3c3529f1
Git-repo: https://source.codeaurora.org/quic/la/kernel/msm-4.4
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2016-10-05 17:24:23 -07:00
Syed Rameez Mustafa
475125d9f9 sched: Initialize HMP stats inside init_sd_lb_stats()
This ensures that the load balancer always works correctly even
without compiler optimizations.

Change-Id: I36408ae65833b624401e60edfb50c19cc061d7bf
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-05 17:24:22 -07:00
Peter Zijlstra
678e5640b9 sched/cgroup: Fix/cleanup cgroup teardown/init
The CPU controller hasn't kept up with the various changes in the whole
cgroup initialization / destruction sequence, and commit:

  2e91fa7f6d ("cgroup: keep zombies associated with their original cgroups")

caused it to explode.

The reason for this is that zombies do not inhibit css_offline() from
being called, but do stall css_released(). Now we tear down the cfs_rq
structures on css_offline() but zombies can run after that, leading to
use-after-free issues.

The solution is to move the tear-down to css_released(), which
guarantees nobody (including no zombies) is still using our cgroup.

Furthermore, a few simple cleanups are possible too. There doesn't
appear to be any point to us using css_online() (anymore?) so fold that
in css_alloc().

And since cgroup code guarantees an RCU grace period between
css_released() and css_free() we can forgo using call_rcu() and free the
stuff immediately.

Change-Id: I51af3d4f0e5dd1c9df6375cce4bb933f67f1022e
Suggested-by: Tejun Heo <tj@kernel.org>
Reported-by: Kazuki Yamaguchi <k@rhe.jp>
Reported-by: Niklas Cassel <niklas.cassel@axis.com>
Tested-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2e91fa7f6d ("cgroup: keep zombies associated with their original cgroups")
Link: http://lkml.kernel.org/r/20160316152245.GY6344@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 2f5177f0fd7e531b26d54633be62d1d4cb94621c
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2016-10-05 14:10:10 -07:00
Peter Zijlstra
2eab0c7176 sched/cgroup: Fix cgroup entity load tracking tear-down
When a cgroup's CPU runqueue is destroyed, it should remove its
remaining load accounting from its parent cgroup.

The current site for doing so it unsuited because its far too late and
unordered against other cgroup removal (->css_free() will be, but we're also
in an RCU callback).

Put it in the ->css_offline() callback, which is the start of cgroup
destruction, right after the group has been made unavailable to
userspace. The ->css_offline() callbacks are called in hierarchical order
after the following v4.4 commit:

  aa226ff4a1ce ("cgroup: make sure a parent css isn't offlined before its children")

Change-Id: Ice7cbd71d9e545da84d61686aa46c7213607bb9d
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160121212416.GL6357@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Git-commit: 6fe1f348b3dd1f700f9630562b7d38afd6949568
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
2016-10-05 14:09:53 -07:00
Alex Shi
be2a76aecf Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4 2016-10-05 14:53:08 +02:00
Alex Shi
10fd238c91 Merge remote-tracking branch 'lts/linux-4.4.y' into linux-linaro-lsk-v4.4
Conflicts:
	resovle the conflict on pax_copy for
	arch/ia64/include/asm/uaccess.h
	arch/powerpc/include/asm/uaccess.h
	arch/sparc/include/asm/uaccess_32.h
2016-10-05 13:21:50 +02:00
Pavankumar Kondeti
f20772adf3 sched: Fix integer overflow in sched_update_nr_prod()
"int" type is used to hold the time difference between the successive
updates to nr_run in sched_update_nr_prod(). This can result in
overflow, if the function is called ~2.15 sec after it was called
before. The most probable scenarios are when CPU is idle and
hotplugged. But as we update the last_time of all possible CPUs in
sched_get_nr_running_avg() periodically from a deferrable timer context
(core_ctl module), this overflow is observed only when the system is
completely idle for long time. When this overflow happens we hit
a BUG_ON() in sched_get_nr_running_avg().

Use "u64" type instead of "int" for holding the time difference and
add additional BUG_ON() to catch the instances where sched_clock()
returns a backward value.

Change-Id: I284abb5889ceb8cf9cc689c79ed69422a0e74986
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-04 08:47:48 +05:30
Linux Build Service Account
f02700dfc6 Merge "sched: Fix CPU selection when all online CPUs are isolated" 2016-10-03 10:35:02 -07:00
Linux Build Service Account
ae0165688c Merge "sched: Add a stub function for init_clusters()" 2016-10-03 10:34:59 -07:00
Linux Build Service Account
a6e4924acb Merge "sched: add a knob to prefer the waker CPU for sync wakeups" 2016-10-03 10:34:58 -07:00
Linux Build Service Account
08d58a723b Merge "hrtimer: Ensure timer is not running before migrating" 2016-10-03 10:34:56 -07:00
Pavankumar Kondeti
a86b380f35 sched: Add a device tree property to specify the sched boost type
The HMP scheduler has two types of task placement boost policies.

(1) boost-on-big policy make use of all big CPUs up to their full capacity
before using the little CPUs. This improves performance on true b.L systems
where the big CPUs have higher efficiency compared to the little CPUs.

(2) boost-on-all policy place the tasks on the CPU having the highest
spare capacity. This policy is optimal for SMP like systems.

The scheduler sets the boost policy to boost-on-big on systems which has
CPUs of different efficiencies. However it is possible that CPUs of the
same micro architecture to have slight difference in efficiency due to
other factors like cache size. Selecting the boost-on-big policy based
on relative difference in efficiency is not optimal on such systems.
The boost-policy device tree property is introduced to specify the
required boost type and it overrides the default selection of boost
type in the scheduler. The possible values for this property are
"boost-on-big" and "boost-on-all".

Change-Id: Iac19183fa7d4bfd9e5746b02a02b2b19cf64b78d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:54:45 +05:30
Pavankumar Kondeti
c7e3dde08c sched: Add a stub function for init_clusters()
Add a stub function for init_cluster() and remove a ifdefry
for SCHED_HMP in sched_init()

Change-Id: I6745485152d735436d8398818f7fb5e70ce5ee65
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:52:15 +05:30
Pavankumar Kondeti
f1e9995fe4 sched: add a knob to prefer the waker CPU for sync wakeups
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.

However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.

Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:52:06 +05:30
Pavankumar Kondeti
7c3461a6ac sched: Fix a division by zero bug in scale_exec_time()
When cycle_counter is used to estimate the frequency, calling
update_task_ravg() twice on the same task without refreshing
the wallclock results in a division by zero bug. Add a safety
check in update_task_ravg() to prevent this.

The above bug is hit from __schedule() when next == prev. There
is no need to call update_task_ravg() twice for PUT_PREV_TASK
and PICK_NEXT_TASK events for the same task. Calling
update_task_ravg() with TASK_UPDATE event is sufficient.

Change-Id: Ib3af9004f2462618c535b8195377bedb584d0261
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-01 10:05:08 +05:30
Linux Build Service Account
818156d2ac Merge "sched: don't assume higher capacity means higher power in lb" 2016-09-30 18:23:48 -07:00
Linux Build Service Account
4b2199d820 Merge "sched: panic on corrupted stack end" 2016-09-30 18:23:31 -07:00
Syed Rameez Mustafa
84589336cd sched: Fix CPU selection when all online CPUs are isolated
After the introduction of "33c24b sched: add cpu isolation support"
select_fallback_rq() might sometimes be unable find any CPU to place
a task on. This happens when the all online CPUs are isolated and
the allow isolated flag is set to false. In such cases, we have
little choice but to use an isolated CPU and wait for core control
to eventually un-isolate one or more online CPUs.

Change-Id: Id8738bd8493c11731c5491efcc99eb90f051233e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-30 17:25:51 -07:00
Olav Haugan
62abf225df hrtimer: Ensure timer is not running before migrating
A timer might be running when we are trying to move the timer to another
CPU so ensure that we wait for the timer to finish before migrating.

Change-Id: I4c9ee39c715baebfbdb8a50476a475e38b092f70
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-09-30 17:04:11 -07:00
James Morse
116fcd882e PM / hibernate: Fix rtree_next_node() to avoid walking off list ends
commit 924d8696751c4b9e58263bc82efdafcf875596a6 upstream.

rtree_next_node() walks the linked list of leaf nodes to find the next
block of pages in the struct memory_bitmap. If it walks off the end of
the list of nodes, it walks the list of memory zones to find the next
region of memory. If it walks off the end of the list of zones, it
returns false.

This leaves the struct bm_position's node and zone pointers pointing
at their respective struct list_heads in struct mem_zone_bm_rtree.

memory_bm_find_bit() uses struct bm_position's node and zone pointers
to avoid walking lists and trees if the next bit appears in the same
node/zone. It handles these values being stale.

Swap rtree_next_node()s 'step then test' to 'test-next then step',
this means if we reach the end of memory we return false and leave
the node and zone pointers as they were.

This fixes a panic on resume using AMD Seattle with 64K pages:
[    6.868732] Freezing user space processes ... (elapsed 0.000 seconds) done.
[    6.875753] Double checking all user space processes after OOM killer disable... (elapsed 0.000 seconds)
[    6.896453] PM: Using 3 thread(s) for decompression.
[    6.896453] PM: Loading and decompressing image data (5339 pages)...
[    7.318890] PM: Image loading progress:   0%
[    7.323395] Unable to handle kernel paging request at virtual address 00800040
[    7.330611] pgd = ffff000008df0000
[    7.334003] [00800040] *pgd=00000083fffe0003, *pud=00000083fffe0003, *pmd=00000083fffd0003, *pte=0000000000000000
[    7.344266] Internal error: Oops: 96000005 [#1] PREEMPT SMP
[    7.349825] Modules linked in:
[    7.352871] CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W I     4.8.0-rc1 #4737
[    7.360512] Hardware name: AMD Overdrive/Supercharger/Default string, BIOS ROD1002C 04/08/2016
[    7.369109] task: ffff8003c0220000 task.stack: ffff8003c0280000
[    7.375020] PC is at set_bit+0x18/0x30
[    7.378758] LR is at memory_bm_set_bit+0x24/0x30
[    7.383362] pc : [<ffff00000835bbc8>] lr : [<ffff0000080faf18>] pstate: 60000045
[    7.390743] sp : ffff8003c0283b00
[    7.473551]
[    7.475031] Process swapper/0 (pid: 1, stack limit = 0xffff8003c0280020)
[    7.481718] Stack: (0xffff8003c0283b00 to 0xffff8003c0284000)
[    7.800075] Call trace:
[    7.887097] [<ffff00000835bbc8>] set_bit+0x18/0x30
[    7.891876] [<ffff0000080fb038>] duplicate_memory_bitmap.constprop.38+0x54/0x70
[    7.899172] [<ffff0000080fcc40>] snapshot_write_next+0x22c/0x47c
[    7.905166] [<ffff0000080fe1b4>] load_image_lzo+0x754/0xa88
[    7.910725] [<ffff0000080ff0a8>] swsusp_read+0x144/0x230
[    7.916025] [<ffff0000080fa338>] load_image_and_restore+0x58/0x90
[    7.922105] [<ffff0000080fa660>] software_resume+0x2f0/0x338
[    7.927752] [<ffff000008083350>] do_one_initcall+0x38/0x11c
[    7.933314] [<ffff000008b40cc0>] kernel_init_freeable+0x14c/0x1ec
[    7.939395] [<ffff0000087ce564>] kernel_init+0x10/0xfc
[    7.944520] [<ffff000008082e90>] ret_from_fork+0x10/0x40
[    7.949820] Code: d2800022 8b400c21 f9800031 9ac32043 (c85f7c22)
[    7.955909] ---[ end trace 0024a5986e6ff323 ]---
[    7.960529] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b

Here struct mem_zone_bm_rtree's start_pfn has been returned instead of
struct rtree_node's addr as the node/zone pointers are corrupt after
we walked off the end of the lists during mark_unsafe_pages().

This behaviour was exposed by commit 6dbecfd345a6 ("PM / hibernate:
Simplify mark_unsafe_pages()"), which caused mark_unsafe_pages() to call
duplicate_memory_bitmap(), which uses memory_bm_find_bit() after walking
off the end of the memory bitmap.

Fixes: 3a20cb1779 (PM / Hibernate: Implement position keeping in radix tree)
Signed-off-by: James Morse <james.morse@arm.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:39 +02:00
Thomas Garnier
119c348a8e PM / hibernate: Restore processor state before using per-CPU variables
commit 62822e2ec4ad091ba31f823f577ef80db52e3c2c upstream.

Restore the processor state before calling any other functions to
ensure per-CPU variables can be used with KASLR memory randomization.

Tracing functions use per-CPU variables (GS based on x86) and one was
called just before restoring the processor state fully. It resulted
in a double fault when both the tracing & the exception handler
functions tried to use a per-CPU variable.

Fixes: bb3632c610 (PM / sleep: trace events for suspend/resume)
Reported-and-tested-by: Borislav Petkov <bp@suse.de>
Reported-by: Jiri Kosina <jikos@kernel.org>
Tested-by: Rafael J. Wysocki <rafael@kernel.org>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Thomas Garnier <thgarnie@google.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:39 +02:00
Steven Rostedt (Red Hat)
8b275b4522 tracing: Move mutex to protect against resetting of seq data
commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream.

The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.

Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:38 +02:00