Commit graph

22163 commits

Author SHA1 Message Date
Patrick Bellasi
ca42e80446 sched/fair: enforce EAS mode
For non latency sensitive tasks the goal is to optimize for energy efficiency.
Thus, we should try our best to avoid moving a task on a CPU which is then
going to be marked as overutilized.

Let's use the capacity_margin metric to verify if a candidate target CPU
should be considered without risking to bail out of EAS mode.

Change-Id: Ib3697106f4073aedf4a6c6ce42bd5d000fa8c007
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Patrick Bellasi
4edc5b0e38 sched/fair: ignore backup CPU when not valid
The find_best_target can sometimes not return a valid backup CPU, either
because it cannot find one or just becasue it returns prev_cpu as a backup.
In these cases we should skip the energy_diff evaluation for the backup CPU.

Change-Id: I3787dbdfe74122348dd7a7485b88c4679051bd32
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Patrick Bellasi
2aada289d7 sched/fair: trace energy_diff for non boosted tasks
In systems where SchedTune is enabled, we do not report energy diff for non
boosted tasks. Let's fix this by always genereting an energy_diff event where
however:
  nrg.delta = 0, since we skip energy normalization
  payoff = nrg.diff, since the payoff is defined just by the energy difference

Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
2f30db8df4 UPSTREAM: sched/fair: Sync task util before slow-path wakeup
We use task_util() in find_idlest_group() via capacity_spare_wake().
This task_util() updated in wake_cap(). However wake_cap() is not the
only reason for ending up in find_idlest_group() - we could have been sent
there by wake_wide(). So explicitly sync the task util with prev_cpu
when we are about to head to find_idlest_group().

We could simply do this at the beginning of
select_task_rq_fair() (i.e. irrespective of whether we're heading to
select_idle_sibling() or find_idlest_group() & co), but I didn't want to
slow down the select_idle_sibling() path more than necessary.

Don't do this during fork balancing, we won't need the task_util and
we'd just clobber the last_update_time, which is supposed to be 0.

Change-Id: I935f4bfdfec3e8b914457aac3387ce264d5fd484
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andres Oportus <andresoportus@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170808095519.10077-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit ea16f0ea6c3d tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
5a86636649 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
find_idlest_group() returns NULL when the local group is idlest. The
caller then continues the find_idlest_group() search at a lower level
of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is
not consulted and, crucially, @new_cpu is not updated. This means the
search is pointless and we return @prev_cpu from select_task_rq_fair().

This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu.

Change-Id: Ie531f5bb29775952bdc4c148b6e974b2f5f32b7a
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 93f50f90247e tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
4116547645 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
When 'p' is not allowed on any of the CPUs in the sched_domain, we
currently return NULL from find_idlest_group(), and pointlessly
continue the search on lower sched_domain levels (where 'p' is also not
allowed) before returning prev_cpu regardless (as we have not updated
new_cpu).

Add an explicit check for this case, and add a comment to
find_idlest_group(). Now when find_idlest_group() returns NULL, it always
means that the local group is allowed and idlest.

Change-Id: I5f2648d2f7fb0465677961ecb7473df3d06f0057
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 6fee85ccbc76 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
9c825cf616 BACKPORT: sched/fair: Fix find_idlest_group when local group is not allowed
When the local group is not allowed we do not modify this_*_load from
their initial value of 0. That means that the load checks at the end
of find_idlest_group cause us to incorrectly return NULL. Fixing the
initial values to ULONG_MAX means we will instead return the idlest
remote group in that case.

BACKPORT: Note 4.4 is missing commit 6b94780e45c1 "sched/core: Use
load_avg for selecting idlest group", so we only have to fix
this_load instead of this_runnable_load and this_avg_load.

Change-Id: I41f775b0e7c8f5e675c2780f955bb130a563cba7
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 0d10ab952e99 tip:sched/core)
(backport changes described above)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
529def2ffe UPSTREAM: sched/fair: Remove unnecessary comparison with -1
Since commit:

  83a0a96a5f ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu")

find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1,
so we can simplify the checking of the return value in find_idlest_cpu().

Change-Id: I98f4b9f178cd93a30408e024e608d36771764c7b
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from commit e90381eaecf6 in tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
0f743ce745 BACKPORT: sched/fair: Move select_task_rq_fair slow-path into its own function
In preparation for changes that would otherwise require adding a new
level of indentation to the while(sd) loop, create a new function
find_idlest_cpu() which contains this loop, and rename the existing
find_idlest_cpu() to find_idlest_group_cpu().

Code inside the while(sd) loop is unchanged. @new_cpu is added as a
variable in the new function, with the same initial value as the
@new_cpu in select_task_rq_fair().

Change-Id: I9842308cab00dc9cd6c513fc38c609089a1aaaaf
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-2-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(reworked for eas/cas schedstats added in Android)
(cherry-picked commit 18bd1b4bd53a from tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
795a6867cf UPSTREAM: sched/fair: Force balancing on nohz balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228b ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Change-Id: I807ba5cba0ef1b8bbec02cbcd4755fd32af10135
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 583ffd99d765 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
fd4a95dab8 UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()
Address this rq-clock update bug:

  WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    set_next_entity()
    ? _raw_spin_lock()
    set_curr_task_fair()
    set_user_nice.part.85()
    set_user_nice()
    create_worker()
    worker_thread()
    kthread()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2fb8d36787affe26f3536c3d8ec094995a48037d)
Change-Id: I53ba056e72820c7fadb3f022e4ee3b821c0de17d
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bab39eb879 UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()
Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.

  WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    assert_clock_updated.isra.63.part.64()
    can_migrate_task()
    load_balance()
    pick_next_task_fair()
    __schedule()
    schedule()
    worker_thread()
    kthread()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166a5e433bf62162f3cd3c5174d335934)
Change-Id: Ief5070dcce486535334dcb739ee16b989ea9df42
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bea1b621d9 UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.

  WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    detach_task_cfs_rq()
    switched_from_fair()
    __sched_setscheduler()
    _sched_setscheduler()
    sched_set_stop_task()
    cpu_stop_create()
    __smpboot_create_thread.part.2()
    smpboot_register_percpu_thread_cpumask()
    cpu_stop_init()
    do_one_initcall()
    ? print_cpu_info()
    kernel_init_freeable()
    ? rest_init()
    kernel_init()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84baa8180c3c27b7e227429712cd967b6)
Change-Id: Ibffde077d18eabec4c2984158bd9d6d73bd0fb96
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
4863faf5e4 UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
Address this rq-clock update bug:

  WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    __warn()
    post_init_entity_util_avg()
    wake_up_new_task()
    _do_fork()
    kernel_thread()
    rest_init()
    start_kernel()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4126bad6717336abe5d666440ae15555563ca53f)
Change-Id: Ibe9a73386896377f96483d195e433259218755a5
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Vincent Guittot
c14c9b6e3e UPSTREAM: sched/core: Fix find_idlest_group() for fork
During fork, the utilization of a task is init once the rq has been
selected because the current utilization level of the rq is used to
set the utilization of the fork task. As the task's utilization is
still 0 at this step of the fork sequence, it doesn't make sense to
look for some spare capacity that can fit the task's utilization.
Furthermore, I can see perf regressions for the test:

   hackbench -P -g 1

because the least loaded policy is always bypassed and tasks are not
spread during fork.

With this patch and the fix below, we are back to same performances as
for v4.8. The fix below is only a temporary one used for the test
until a smarter solution is found because we can't simply remove the
test which is useful for others benchmarks

| @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
|
|	avg_cost = this_sd->avg_scan_cost;
|
| -	/*
| -	 * Due to large variance we need a large fuzz factor; hackbench in
| -	 * particularly is sensitive here.
| -	 */
| -	if ((avg_idle / 512) < avg_cost)
| -		return -1;
| -
|	time = local_clock();
|
|	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: kernellwp@gmail.com
Cc: umgwanakikbuti@gmail.com
Cc: yuyang.du@intel.comc
Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f519a3f1c6b7a990e5aed37a8f853c6ecfdee945)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I86cc2ad81af3467c0b2f82b995111f428248baa4
2017-10-27 13:30:33 +01:00
Peter Zijlstra
97cb74f485 BACKPORT: sched/fair: Fix PELT integrity for new tasks
Vincent and Yuyang found another few scenarios in which entity
tracking goes wobbly.

The scenarios are basically due to the fact that new tasks are not
immediately attached and thereby differ from the normal situation -- a
task is always attached to a cfs_rq load average (such that it
includes its blocked contribution) and are explicitly
detached/attached on migration to another cfs_rq.

Scenario 1: switch to fair class

  p->sched_class = fair_class;
  if (queued)
    enqueue_task(p);
      ...
        enqueue_entity()
	  enqueue_entity_load_avg()
	    migrated = !sa->last_update_time (true)
	    if (migrated)
	      attach_entity_load_avg()
  check_class_changed()
    switched_from() (!fair)
    switched_to()   (fair)
      switched_to_fair()
        attach_entity_load_avg()

If @p is a new task that hasn't been fair before, it will have
!last_update_time and, per the above, end up in
attach_entity_load_avg() _twice_.

Scenario 2: change between cgroups

  sched_move_group(p)
    if (queued)
      dequeue_task()
    task_move_group_fair()
      detach_task_cfs_rq()
        detach_entity_load_avg()
      set_task_rq()
      attach_task_cfs_rq()
        attach_entity_load_avg()
    if (queued)
      enqueue_task();
        ...
          enqueue_entity()
	    enqueue_entity_load_avg()
	      migrated = !sa->last_update_time (true)
	      if (migrated)
	        attach_entity_load_avg()

Similar as with scenario 1, if @p is a new task, it will have
!load_update_time and we'll end up in attach_entity_load_avg()
_twice_.

Furthermore, notice how we do a detach_entity_load_avg() on something
that wasn't attached to begin with.

As stated above; the problem is that the new task isn't yet attached
to the load tracking and thereby violates the invariant assumption.

This patch remedies this by ensuring a new task is indeed properly
attached to the load tracking on creation, through
post_init_entity_util_avg().

Of course, this isn't entirely as straightforward as one might think,
since the task is hashed before we call wake_up_new_task() and thus
can be poked at. We avoid this by adding TASK_NEW and teaching
cpu_cgroup_can_attach() to refuse such tasks.

.:: BACKPORT

Complicated by the fact that mch of the lines changed by the original
of this commit were then changed by:

df217913e72e sched/fair: Factorize attach/detach entity <Vincent Guittot>

and then

d31b1a66cbe0 sched/fair: Factorize PELT update <Vincent Guittot>

, which have both already been backported here.

Reported-by: Yuyang Du <yuyang.du@intel.com>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dc603c9028ea5d4354e0e317e8481df99b06d7e)
Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Vincent Guittot
138a670d97 BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling
A new fair task is detached and attached from/to task_group with:

  cgroup_post_fork()
    ss->fork(child) := cpu_cgroup_fork()
      sched_move_task()
        task_move_group_fair()

Which is wrong, because at this point in fork() the task isn't fully
initialized and it cannot 'move' to another group, because its not
attached to any group as yet.

In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
can just call this small part directly instead sched_move_task(). And
the task doesn't really migrate because it is not yet attached so we
need the following sequence:

  do_fork()
    sched_fork()
      __set_task_cpu()

    cgroup_post_fork()
      set_task_rq() # set task group and runqueue

    wake_up_new_task()
      select_task_rq() can select a new cpu
      __set_task_cpu
      post_init_entity_util_avg
        attach_task_cfs_rq()
      activate_task
        enqueue_task

This patch makes that happen.

BACKPORT: Difference from original commit:

- Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in
  dequeue_task flags
- Replaced "struct rq_flags rf" with "unsigned long flags".

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ Added TASK_SET_GROUP to set depth properly. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea86cb4b7621e1298a37197005bf0abcc86348d4)
Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Peter Zijlstra
6b02ab68ec UPSTREAM: sched/fair: Fix and optimize the fork() path
The task_fork_fair() callback already calls __set_task_cpu() and takes
rq->lock.

If we move the sched_class::task_fork callback in sched_fork() under
the existing p->pi_lock, right after its set_task_cpu() call, we can
avoid doing two such calls and omit the IRQ disabling on the rq->lock.

Change to __set_task_cpu() to skip the migration bits, this is a new
task, not a migration. Similarly, make wake_up_new_task() use
__set_task_cpu() for the same reason, the task hasn't actually
migrated as it hasn't ever ran.

This cures the problem of calling migrate_task_rq_fair(), which does
remove_entity_from_load_avg() on tasks that have never been added to
the load avg to begin with.

This bug would result in transiently messed up load_avg values, averaged
out after a few dozen milliseconds. This is probably the reason why
this bug was not found for such a long time.

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e210bffd39d01b649c94b820c28ff112673266dd)
Change-Id: Icbddbaa6e8c1071859673d8685bc3f38955cf144
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
792510d9b3 BACKPORT: sched/fair: Make it possible to account fair load avg consistently
While set_task_rq_fair() is introduced in mainline by commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently"),
the function results to be introduced here by the backport of
commit 09a43ace1f98 ("sched/fair: Propagate load during synchronous
attach/detach"). The problem (apart from the confusion introduced by the
backport) is actually that set_task_rq_fair() is currently not called at
all.

Fix the problem by backporting again commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently").

Original change log:

The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from ad936d8658fd348338cb7d42c577dac77892b074)
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I17294ab0ada3901d35895014715fd60952949358
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
fac311be26 cpufreq/sched: Consider max cpu capacity when choosing frequencies
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.

Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.

Also comments updated to be clearer about what is needed.

Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Oleg Nesterov
0f85c0954b sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
commit 18f649ef344127ef6de23a5a4272dbe2fdb73dde upstream.

The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.

However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:

	int main(void)
	{
		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);

		assert(sctl > 0);
		if (fork()) {
			wait(NULL); // destroy the child's ag/tg
			pause();
		}

		assert(pwrite(sctl, "1\n", 2, 0) == 2);
		assert(setsid() > 0);
		if (fork())
			pause();

		kill(getppid(), SIGKILL);
		sleep(1);

		// The child has gone, the grandchild runs with kref == 1
		assert(pwrite(sctl, "0\n", 2, 0) == 2);
		assert(setsid() > 0);

		// runs with the freed ag/tg
		for (;;)
			sleep(1);

		return 0;
	}

crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.

Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
 [sumits: submit to 4.4 LTS, post testing on Hikey]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27 10:23:17 +02:00
Chris Redpath
4f8767d1ca ANDROID: sched/fair: Select correct capacity state for energy_diff
The util returned from group_max_util is not capped at the max util
present in the group, so it can be larger than the capacity stored in
the array. Ensure that when this happens, we always use the last entry
in the array to fetch energy from.

Tested with synthetics on Juno board.

Bug: 38159576
Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-25 19:28:51 +00:00
Leo Yan
774481506a cpufreq: schedutil: clamp util to CPU maximum capacity
The code is to get the CPU util by accumulate different scheduling
classes and when the total util value is larger than CPU capacity
then it clamps util to CPU maximum capacity. So we can get correct util
value when use PELT signal but if with WALT signal it misses to clamp
util value.

On the other hand, WALT doesn't accumulate different class utilization
but it needs to applying boost margin for WALT signal the CPU util
value is possible to be larger than CPU capacity; so this patch is to
always clamp util to CPU maximum capacity.

Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8
Signed-off-by: Leo Yan <leo.yan@linaro.org>
2017-10-24 15:49:27 +00:00
Chris Redpath
d34d2c97ae cpufreq/sched: Use cpu max freq rather than policy max
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.

Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.

Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.

freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200

Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-23 12:30:46 +01:00
Greg Kroah-Hartman
89074de67a This is the 4.4.94 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnrYxcACgkQONu9yGCS
 aT6VpQ/9GzBA59FGi6ohxZnrUR35+U5Ehuw0IZo4JTUTrlj28QozeV6dBAdgQHLH
 eGcejtzAsD39m7JjmBzkxiBBlCH9nQkq5IaUrJG6q5dYoTCYMLzHwQLgPSbrhbnS
 hCSeHdJ0fevw9tKQELtWlIiG1iOULrWATf4MtpOCHcRmpxxSMRi22yQ4vKD2Vz4y
 TdKb5c8bYjJoEqbtON4wKIiEK1JfyO80E4eZtNK7FXI+XX1WI65pum9/NBiDqB78
 K0sK1t5pSJHvDgMwtOJ7Nxzcwle1cG3xm7NhZhNCfF9OWedCy+ZCc+e48T+TeoF4
 UDHIhEvhOOOf/W3dRBQQj8VElj0zt92I+ivsWxKmheY9JzJdOvq2pTQoPAtLsBMD
 /mChCvMSNEcHTfLYrm6Bjap0e6D10n1oUHX7jgLtq04EcX9Rh2zgYvL9u9QFLjFx
 sAgTp+kmScgj0fi0XgiXJxj8mPc2MpTVmSUjcwAZD+N9Kuafkqbf3ZddZJiGyPfw
 v4ZiAdUAtACdOaIRVPRUcG2fyLfKYqg2bFsif4Z67/0RmNf3C3rJpS9yX+Q36zCo
 f66xbvysN3pRiME0obenrsxBJ0LvIkSVskxV0+5x0UfP5pOdf7jZqqpkr6IFMtLZ
 /o4DYV+Da/qeYZQnmvF0BEvEnnX8GJFIJ+9RSbz9mAWcCWtWxTU=
 =gevA
 -----END PGP SIGNATURE-----

Merge 4.4.94 into android-4.4

Changes in 4.4.94
	percpu: make this_cpu_generic_read() atomic w.r.t. interrupts
	drm/dp/mst: save vcpi with payloads
	MIPS: Fix minimum alignment requirement of IRQ stack
	sctp: potential read out of bounds in sctp_ulpevent_type_enabled()
	bpf/verifier: reject BPF_ALU64|BPF_END
	udpv6: Fix the checksum computation when HW checksum does not apply
	ip6_gre: skb_push ipv6hdr before packing the header in ip6gre_header
	net: emac: Fix napi poll list corruption
	packet: hold bind lock when rebinding to fanout hook
	bpf: one perf event close won't free bpf program attached by another perf event
	isdn/i4l: fetch the ppp_write buffer in one shot
	vti: fix use after free in vti_tunnel_xmit/vti6_tnl_xmit
	l2tp: Avoid schedule while atomic in exit_net
	l2tp: fix race condition in l2tp_tunnel_delete
	tun: bail out from tun_get_user() if the skb is empty
	packet: in packet_do_bind, test fanout with bind_lock held
	packet: only test po->has_vnet_hdr once in packet_snd
	net: Set sk_prot_creator when cloning sockets to the right proto
	tipc: use only positive error codes in messages
	Revert "bsg-lib: don't free job in bsg_prepare_job"
	locking/lockdep: Add nest_lock integrity test
	watchdog: kempld: fix gcc-4.3 build
	irqchip/crossbar: Fix incorrect type of local variables
	mac80211_hwsim: check HWSIM_ATTR_RADIO_NAME length
	mac80211: fix power saving clients handling in iwlwifi
	net/mlx4_en: fix overflow in mlx4_en_init_timestamp()
	netfilter: nf_ct_expect: Change __nf_ct_expect_check() return value.
	iio: adc: xilinx: Fix error handling
	Btrfs: send, fix failure to rename top level inode due to name collision
	f2fs: do not wait for writeback in write_begin
	md/linear: shutup lockdep warnning
	sparc64: Migrate hvcons irq to panicked cpu
	net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs
	crypto: xts - Add ECB dependency
	ocfs2/dlmglue: prepare tracking logic to avoid recursive cluster lock
	slub: do not merge cache if slub_debug contains a never-merge flag
	scsi: scsi_dh_emc: return success in clariion_std_inquiry()
	net: mvpp2: release reference to txq_cpu[] entry after unmapping
	i2c: at91: ensure state is restored after suspending
	ceph: clean up unsafe d_parent accesses in build_dentry_path
	uapi: fix linux/rds.h userspace compilation errors
	uapi: fix linux/mroute6.h userspace compilation errors
	target/iscsi: Fix unsolicited data seq_end_offset calculation
	nfsd/callback: Cleanup callback cred on shutdown
	cpufreq: CPPC: add ACPI_PROCESSOR dependency
	Revert "tty: goldfish: Fix a parameter of a call to free_irq"
	Linux 4.4.94

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-22 08:09:11 +02:00
Peter Zijlstra
28eab3db72 locking/lockdep: Add nest_lock integrity test
[ Upstream commit 7fb4a2cea6b18dab56d609530d077f168169ed6b ]

Boqun reported that hlock->references can overflow. Add a debug test
for that to generate a clear error when this happens.

Without this, lockdep is likely to report a mysterious failure on
unlock.

Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicolai Hähnle <Nicolai.Haehnle@amd.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:03 +02:00
Yonghong Song
1a4f1ecdb2 bpf: one perf event close won't free bpf program attached by another perf event
[ Upstream commit ec9dd352d591f0c90402ec67a317c1ed4fb2e638 ]

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
     <this will be successful>
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
     no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:02 +02:00
Edward Cree
2ec54b21dd bpf/verifier: reject BPF_ALU64|BPF_END
[ Upstream commit e67b8a685c7c984e834e3181ef4619cd7025a136 ]

Neither ___bpf_prog_run nor the JITs accept it.
Also adds a new test case.

Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Edward Cree <ecree@solarflare.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-21 17:09:02 +02:00
Dietmar Eggemann
724091f67f sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
Fixes: https://bugs.linaro.org/show_bug.cgi?id=3075

Change-Id: I62d714fc4b9366a9b2535649aa92d1edc840cf94
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-20 15:33:41 +00:00
Joonwoo Park
ed9e749668 sched: EAS/WALT: finish accounting prior to task_tick
In order to set rq->misfit_task in time, call update_task_ravg() prior
to task_tick.  This reduces upmigration delay by 1 scheduler window.

Change-Id: I7cc80badd423f2e7684125fbfd853b0a3610f0e8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Joonwoo Park
40c3aaa56a cpufreq: sched: update capacity request upon tick always
At present, sched_freq_tick() skips updating of capacity update when
current frequency is fmax.  This can cause incorrect frequency drop
when a CPU bound task goes into sleep for example :
  1) A task (A) enqueues onto CPU 0 and executes for long time.
  2) A new task (B) which has low task demand enqueues onto CPU 1 and
     executes long so becomes a CPU bound task.
  3) Both CPU 0 and 1 gets scheduler tick but skip sched_freq_tick()
     since current frequency is fmax.
  4) Task (A) sleeps and lower the CPU 0's capacity request.
  5) Because task (B) voted CPU capacity at step 2 with low demand and
     skipped to request afterwards, cluster frequency for both CPU 0
     and 1 drops to match capacity voted by CPU 1 at step 2 even though
     task (B) on CPU 1 requires max capacity.

Fix such incorrectness by not skipping CPU capacity voting at tick
path.

Change-Id: Ieb46af1ac96ffce7a5532c58c7f07bf1ada06b86
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Joonwoo Park
7ab48e4c8d sched/fair: prevent meaningless active migration
At present need_active_balance() determines whether an active
upmigration is needed by using capacity_of(). A CPU's capacity
may be reduced by RT pressure, and therefore distinguishing
capability differences with capacity_of() may lead to suboptimal
active migrations to less capable CPUs. Use capacity_orig_of
to distinguish differently capable CPUs in addition to
capacity_of(), thus avoiding placing tasks on less capable CPUs
due to instantaneous RT pressure.

Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[markivx: Reworked the commit text a bit]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Vikram Mulukutla
be832f69a9 sched: walt: Leverage existing helper APIs to apply invariance
There's no need for a separate hierarchy of notifiers, APIs
and variables in walt.c for the purpose of applying frequency
and IPC invariance. Let's just use capacity_curr_of and get
rid of a lot of the infrastructure relating to capacity,
load_scale_factor etc.

Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Dmitry Shmidt
d6fbbe5e66 This is the 4.4.93 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnnAMwACgkQONu9yGCS
 aT63GRAAzQRMJMgigfYCskk/3bXyca7lkvt9mi7DXhYjsFhouCz99w/LXw85+mDj
 AEqpFolmNdKysmpHeuNkCOU88Dx1Tg/W05bISproQaoB9DYBmBBnPSY05TSFtm3B
 IHeOHOmObJYh98f38y8idJ6pOA+N/r8xWJRmr/86kxNObB1E+0pBSONIJjfcFzp8
 Ny5GVb/O9dTkU/wqDw+jlpLDo/KMKansve4XSQeAIkZGY7Gp43z+CBXjCxuCYwTg
 /nvjZ5W0yemVlXTZWqvDhkDbjJ9+N0IKWjpTgcPrKWsYqKcOx5xY1vAGN1ynOAE7
 3NF+B6MDZwvyVfcisNXgYHxOy81f+HOgSBZKmfnKGZh+JoFmMG3nXY3osyIl8/sh
 2K9QdUiI7ldI5UkLDQJ16GpJwaiPocEBHETKjtn9H09CpMPHGtwxFNm1rIMDRDZl
 VQeiVkmra1CPcsnbYQNH+H4dXza4qcSlSqVukQQckQHiYGbP8UGqZVM0L6cfOmQs
 oLG69Rd5OGGWDizxgTfId16wOnelGij6LvNB8GBB+Vs3E4W8efsnIqVhcR4GjT1o
 1vRQzsrmdjPCmBCjJ9qRAG84oz6WfbjvmGVN2RizfWW2uM79cUcdBzN+Bzf9WTw/
 F30cJZi6OJFrH4bXdufbe6/XZXqyv4pkBDtv7qfTXA2/UmXS3IA=
 =uPo9
 -----END PGP SIGNATURE-----

Merge 4.4.93 into android-4.4

Changes in 4.4.93
	brcmfmac: add length check in brcmf_cfg80211_escan_handler()
	ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
	CIFS: Reconnect expired SMB sessions
	nl80211: Define policy for packet pattern attributes
	iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
	rcu: Allow for page faults in NMI handlers
	USB: dummy-hcd: Fix deadlock caused by disconnect detection
	MIPS: math-emu: Remove pr_err() calls from fpu_emu()
	dmaengine: edma: Align the memcpy acnt array size with the transfer
	HID: usbhid: fix out-of-bounds bug
	crypto: shash - Fix zero-length shash ahash digest crash
	KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
	usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
	iommu/amd: Finish TLB flush in amd_iommu_unmap()
	ALSA: usb-audio: Kill stray URB at exiting
	ALSA: seq: Fix use-after-free at creating a port
	ALSA: seq: Fix copy_from_user() call inside lock
	ALSA: caiaq: Fix stray URB at probe error path
	ALSA: line6: Fix leftover URB at error-path during probe
	usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
	direct-io: Prevent NULL pointer access in submit_page_section
	fix unbalanced page refcounting in bio_map_user_iov
	USB: serial: ftdi_sio: add id for Cypress WICED dev board
	USB: serial: cp210x: add support for ELV TFD500
	USB: serial: option: add support for TP-Link LTE module
	USB: serial: qcserial: add Dell DW5818, DW5819
	USB: serial: console: fix use-after-free after failed setup
	x86/alternatives: Fix alt_max_short macro to really be a max()
	Linux 4.4.93

Change-Id: I731bf1eef5aca9728dddd23bfbe407f0c6ff2d84
Signed-off-by: Dmitry Shmidt <dimitrysh@google.com>
2017-10-19 10:08:29 -07:00
Paul E. McKenney
5fd4551659 rcu: Allow for page faults in NMI handlers
commit 28585a832602747cbfa88ad8934013177a3aae38 upstream.

A number of architecture invoke rcu_irq_enter() on exception entry in
order to allow RCU read-side critical sections in the exception handler
when the exception is from an idle or nohz_full CPU.  This works, at
least unless the exception happens in an NMI handler.  In that case,
rcu_nmi_enter() would already have exited the extended quiescent state,
which would mean that rcu_irq_enter() would (incorrectly) cause RCU
to think that it is again in an extended quiescent state.  This will
in turn result in lockdep splats in response to later RCU read-side
critical sections.

This commit therefore causes rcu_irq_enter() and rcu_irq_exit() to
take no action if there is an rcu_nmi_enter() in effect, thus avoiding
the unscheduled return to RCU quiescent state.  This in turn should
make the kernel safe for on-demand RCU voyeurism.

Link: http://lkml.kernel.org/r/20170922211022.GA18084@linux.vnet.ibm.com

Cc: stable@vger.kernel.org
Fixes: 0be964be0 ("module: Sanitize RCU usage and locking")
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-18 09:20:41 +02:00
Olav Haugan
ec888d46d8 sched: Update task->on_rq when tasks are moving between runqueues
Task->on_rq has three states:
	0 - Task is not on runqueue (rq)
	1 (TASK_ON_RQ_QUEUED) - Task is on rq
	2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the
	process of being migrated to another rq

When a task is moving between rqs task->on_rq state should be
TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative
runnable average correctly.  Without such state marking for all the
classes, WALT's update_history() would try to fixup task's demand
which was never contributed to any of CPUs during migration.

Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop: Reinforced changelog to explain why this is needed by WALT.
           Fixed conflicts in deadline.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-10-16 21:07:06 +00:00
Greg Kroah-Hartman
73a2b70bdf This is the 4.4.92 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnfNZMACgkQONu9yGCS
 aT4KcxAAg3C7QW8Nm30Ds1ZgUrkFB78YAFLdgZNKzTBfQFwFHhkPR8fTGyCboMV6
 rL56FUDepHpGfnX8yT8PeoWNBzBprwFtCfMKFUi1MKRFpNEde4pLDVoVKQ3ie0dm
 ZMLoRsrzF4UOktRidEmxQr5yNv6qmPwMw6P68Giguu9QDabADjDJMY5Fv/2JQc0q
 UHefFWbA7WDwaNhBpgF0/P3IBgNF781sdYm7lOjiyMCJZoJyBGjyH6jeBSyJy9Qg
 bUhr7jALOe45PU0Lo3KpVjr14OEb78k/c2Iw1M4RuiNQgBrPOdmpGckstbRn1vtz
 yCmaUimLRB+Lu/7RSxI94gOduEt9t2NLQNhNKbbSJvEdU1L8o+MOhCLyaT+VQ2Dg
 xwKpXUefDRcZbetTNBCP4lvivSLo9mwY6U74ZmDUXZhVCJj+v7AriCB8y8k2uFdY
 uRf0Dv3Wf0H7LDFbjuVKirv5rILVGFL8gcg421yFUJQvQ1k0Mx1pBlUI3srjZCMu
 tRP3ObZta32ArGEvIKJAn3uhrguPgy8FWRsG3akh000jwDCsXHLohSX/5FDcnlUH
 q2N9wHoEYBe4aPBaDm5ll2wOvpxubwBbt9xdbj6I5AUPCUfYcAni9D1+3cKDSLK8
 hbSbkKrjK9zFT4qi7QNvKoU8IscTb7lRGqjoT0ee+ws15SvxSVk=
 =oa8l
 -----END PGP SIGNATURE-----

Merge 4.4.92 into android-4.4

Changes in 4.4.92
	usb: gadget: inode.c: fix unbalanced spin_lock in ep0_write
	USB: gadgetfs: Fix crash caused by inadequate synchronization
	USB: gadgetfs: fix copy_to_user while holding spinlock
	usb: gadget: udc: atmel: set vbus irqflags explicitly
	usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
	usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
	usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
	ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
	usb: pci-quirks.c: Corrected timeout values used in handshake
	USB: dummy-hcd: fix connection failures (wrong speed)
	USB: dummy-hcd: fix infinite-loop resubmission bug
	USB: dummy-hcd: Fix erroneous synchronization change
	USB: devio: Don't corrupt user memory
	usb: gadget: mass_storage: set msg_registered after msg registered
	USB: g_mass_storage: Fix deadlock when driver is unbound
	lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
	ALSA: compress: Remove unused variable
	ALSA: usx2y: Suppress kernel warning at page allocation failures
	driver core: platform: Don't read past the end of "driver_override" buffer
	Drivers: hv: fcopy: restore correct transfer length
	stm class: Fix a use-after-free
	ftrace: Fix kmemleak in unregister_ftrace_graph
	HID: i2c-hid: allocate hid buffers for real worst case
	iwlwifi: add workaround to disable wide channels in 5GHz
	scsi: sd: Do not override max_sectors_kb sysfs setting
	USB: uas: fix bug in handling of alternate settings
	USB: core: harden cdc_parse_cdc_header
	usb: Increase quirk delay for USB devices
	USB: fix out-of-bounds in usb_set_configuration
	xhci: fix finding correct bus_state structure for USB 3.1 hosts
	iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
	iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
	iio: ad_sigma_delta: Implement a dedicated reset function
	staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
	iio: core: Return error for failed read_reg
	iio: ad7793: Fix the serial interface reset
	iio: adc: mcp320x: Fix readout of negative voltages
	iio: adc: mcp320x: Fix oops on module unload
	uwb: properly check kthread_run return value
	uwb: ensure that endpoint is interrupt
	brcmfmac: setup passive scan if requested by user-space
	drm/i915/bios: ignore HDMI on port A
	nvme: protect against simultaneous shutdown invocations
	sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
	ext4: fix data corruption for mmap writes
	ext4: Don't clear SGID when inheriting ACLs
	ext4: don't allow encrypted operations without keys
	Linux 4.4.92

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-12 12:05:45 +02:00
Peter Zijlstra
90fd673873 sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream.

Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.

On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.

But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.

So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.

An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.

Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: deb7aa308e ("cpuset: reorganize CPU / memory hotplug handling")
Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:27:35 +02:00
Shu Wang
87509592ec ftrace: Fix kmemleak in unregister_ftrace_graph
commit 2b0b8499ae75df91455bbeb7491d45affc384fb0 upstream.

The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.

kmmeleak backtrace:
    kmemleak_vmalloc+0x77/0xc0
    __vmalloc_node_range+0x1b5/0x2c0
    module_alloc+0x7c/0xd0
    arch_ftrace_update_trampoline+0xb5/0x290
    ftrace_startup+0x78/0x210
    register_ftrace_function+0x8b/0xd0
    function_trace_init+0x4f/0x80
    tracing_set_tracer+0xe6/0x170
    tracing_set_trace_write+0x90/0xd0
    __vfs_write+0x37/0x170
    vfs_write+0xb2/0x1b0
    SyS_write+0x55/0xc0
    do_syscall_64+0x67/0x180
    return_from_SYSCALL_64+0x0/0x6a

[
  Looking further into this, I found that this was left over from when the
  function and function graph tracers shared the same ftrace_ops. But in
  commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
  together"), the two were separated, and the save_global_trampoline no
  longer was necessary (and it may have been broken back then too).
  -- Steven Rostedt
]

Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com

Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together")
Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:27:33 +02:00
Joel Fernandes
e5486e9c89 FROMLIST: tracing: Add support for preempt and irq enable/disable events
Preempt and irq trace events can be used for tracing the start and
end of an atomic section which can be used by a trace viewer like
systrace to graphically view the start and end of an atomic section and
correlate them with latencies and scheduling issues.

This also serves as a prelude to using synthetic events or probes to
rewrite the preempt and irqsoff tracers, along with numerous benefits of
using trace events features for these events.

Change-Id: I718d40f7c3c48579adf9d7121b21495a669c89bd
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zilstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988157/
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-06 16:16:18 -07:00
Joel Fernandes
139ac8ac89 FROMLIST: tracing: Prepare to add preempt and irq trace events
In preparation of adding irqsoff and preemptsoff enable and disable trace
events, move required functions and code to make it easier to add these events
in a later patch. This patch is just code movement and no functional change.

Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988159/
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-06 16:16:11 -07:00
Greg Kroah-Hartman
a958344209 This is the 4.4.90 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnV4kwACgkQONu9yGCS
 aT560A//f5HQvKsfbDuzGTeCggXr+m9cd8L6GfKvQbmAKwPG9qfRiBM2EO4zqDkq
 Q6A6yjy0YVvsnQ3rJCA44yg/MC9JojlVgmBowtgD9uSh0Z8Q8qDdzAi35xdOF+BD
 O+2opoKSCKuc1ckpb7AoY/803XllTbNWbYd2eDzBvjxXd5i+qCF4GnBTTvkMIshm
 Fis8va+fgNHjuBlHgmV+sCR3CRWGv6PqtkDcG79nv69JkwQ2tx3JbMwtDOrgnR5X
 nIlvgNtZwYKtorxin6qaDWfmhLBHiI4Xhr9L1gAKLsi9S5m3nQ4Ozzsndqtlxppa
 bOHpsdCzVRkBz1UB2QQfZOJzE3tQvCBaAxUGeMAcO/F5wcgeyHl9Wo2bblqidJSc
 MdtN044pSE1yFTYtd14CdUKl+Jx/R9lFYM/o7IzxTrrRHfBuylTSA8fx6OIdPxJA
 Wmd+4HwVJxXmCBNaWnH4LRhd6rp7FB3wzUSt3Euxfq5GRa5+522u6VEriGRuBEeW
 SOrcU++U9mIuR2Zk6A6vBVwB1g78vEvGlyQehFzJWghcLtRqqzEPYz281CklcOe/
 G3p8+v8wSZo/hHEyeJLRwX74Nlna/ZjoAdxS+ngW9BuNXFeZErTN32HgJNtBF5La
 4MIKiOBWCytmxhrS+PIykQAmal6HlDEvjue6xGzqKxeBq5l250I=
 =NPlq
 -----END PGP SIGNATURE-----

Merge 4.4.90 into android-4.4

Changes in 4.4.90
	cifs: release auth_key.response for reconnect.
	mac80211: flush hw_roc_start work before cancelling the ROC
	KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
	tracing: Fix trace_pipe behavior for instance traces
	tracing: Erase irqsoff trace with empty write
	md/raid5: fix a race condition in stripe batch
	md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
	scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
	crypto: talitos - Don't provide setkey for non hmac hashing algs.
	crypto: talitos - fix sha224
	KEYS: fix writing past end of user-supplied buffer in keyring_read()
	KEYS: prevent creating a different user's keyrings
	KEYS: prevent KEYCTL_READ on negative key
	powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
	Fix SMB3.1.1 guest authentication to Samba
	SMB: Validate negotiate (to protect against downgrade) even if signing off
	SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
	vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
	nl80211: check for the required netlink attributes presence
	bsg-lib: don't free job in bsg_prepare_job
	seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
	arm64: Make sure SPsel is always set
	arm64: fault: Route pte translation faults via do_translation_fault
	KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
	kvm: nVMX: Don't allow L2 to access the hardware CR8
	PCI: Fix race condition with driver_override
	btrfs: fix NULL pointer dereference from free_reloc_roots()
	btrfs: propagate error to btrfs_cmp_data_prepare caller
	btrfs: prevent to set invalid default subvolid
	x86/fpu: Don't let userspace set bogus xcomp_bv
	gfs2: Fix debugfs glocks dump
	timer/sysclt: Restrict timer migration sysctl values to 0 and 1
	KVM: VMX: do not change SN bit in vmx_update_pi_irte()
	KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
	cxl: Fix driver use count
	dmaengine: mmp-pdma: add number of requestors
	ARM: pxa: add the number of DMA requestor lines
	ARM: pxa: fix the number of DMA requestor lines
	KVM: VMX: use cmpxchg64
	video: fbdev: aty: do not leak uninitialized padding in clk to userspace
	swiotlb-xen: implement xen_swiotlb_dma_mmap callback
	fix xen_swiotlb_dma_mmap prototype
	Linux 4.4.90

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-05 10:03:13 +02:00
Myungho Jung
5e9b526fcc timer/sysclt: Restrict timer migration sysctl values to 0 and 1
commit b94bf594cf8ed67cdd0439e70fa939783471597a upstream.

timer_migration sysctl acts as a boolean switch, so the allowed values
should be restricted to 0 and 1.

Add the necessary extra fields to the sysctl table entry to enforce that.

[ tglx: Rewrote changelog ]

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Oleg Nesterov
9237605e0b seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
commit 66a733ea6b611aecf0119514d2dddab5f9d6c01e upstream.

As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end
up using different filters. Once we drop ->siglock it is possible for
task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC.

Fixes: f8e529ed94 ("seccomp, ptrace: add support for dumping seccomp filters")
Reported-by: Chris Salls <chrissalls5@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[tycho: add __get_seccomp_filter vs. open coding refcount_inc()]
Signed-off-by: Tycho Andersen <tycho@docker.com>
[kees: tweak commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Bo Yan
68a4a52899 tracing: Erase irqsoff trace with empty write
commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream.

One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com

Cc: <mingo@redhat.com>
Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Bo Yan <byan@nvidia.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Tahsin Erdogan
9c5afa726a tracing: Fix trace_pipe behavior for instance traces
commit 75df6e688ccd517e339a7c422ef7ad73045b18a2 upstream.

When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.

Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe

Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com

Fixes: 10246fa35d ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Juri Lelli
b0fa18e1ca UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests
Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.

However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:

               CPU0                |               CPU1
                                   |
                                   |     RT task scheduled
                                   |     SCHED_CPUFREQ_RT is set
                                   |     CPU1->last_update = now
                                   |     freq transition to max
                                   |     last_freq_update_time = now
                                   |

                        more than TICK_NSEC nsecs

                                   |
     a small CFS wakes up          |
     CPU0->last_update = now1      |
     delta_ns(CPU0) < TICK_NSEC*   |
     CPU0's util is considered     |
     delta_ns(CPU1) =              |
      last_freq_update_time -      |
      CPU1->last_update = 0        |
      < TICK_NSEC                  |
     CPU1 is still considered      |
     CPU1->SCHED_CPUFREQ_RT is set |
     we stay at max (until CPU1    |
     exits from idle)              |

* delta_ns is actually negative as now1 > last_freq_update_time

While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.

Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d86ab9cff8b936aadde444d0e263a8db5ff0349b)
2017-10-03 21:18:45 +00:00
Greg Kroah-Hartman
d68ba9f116 This is the 4.4.89 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnLaLoACgkQONu9yGCS
 aT7hDw/+Ipx/xnjIUJFV/aqo8lTh3XqP/TjD5whoi+yYC8axLEZBLiOSLZceVjsG
 hi2mP22gKn1i7GLXNeWIZ+rMtVzAN+qNg7i8cjWNfFp1fA7cCfFaYvlV0LVrO2tK
 WnvvE8r5kQAKyQG8498ebEjianxwxHVERnNiE5/SDpCNj14DnwCJBTEYM0tEnuXZ
 /jBIIs4xvndVa0fFfUjuAzh65AefAT1BmgsPll4GnFMUFHh30smYdFla5LL0GNIq
 FQGFvIi8Q02disSMg9lFJVOlazc/HUREiFB1qy1DRtGMnS6/Q0HW0kCxeRi/7QEi
 +HN2rLxtbpnuD5P7W4lDJ5/cyCHMIv8SJ8OqUd8uxbTWz31P/QxbM7d35d+w3rq8
 dv3sQ6CMRnuIXGL5dFHh7zYqlzNS9PKjLmxzAw9grDf+nVsDxE4KUfJy00DSN1I1
 Bopi1kCD2nUMOiBrmxkIczN6OOvcGBHh6/TTB2WEKVHn42D0fjLnO66kJVJLMsBm
 vDdKJDDSGM/0HiUa5ydr6R0Ae7My3h5AJZRa5gn0kL/myatX/vsa0B2ZLpHlVipM
 GhODBsDFkI4k4yceONDZPJmhhVab1lewTMuIW5D2KRMsgpQqLmlOyL5gykfH0rTx
 FVnLSoMAHsgm6qVPwRS5BqK/UnXogfqjiB0iXzNNZnkiABWWoUQ=
 =Skkr
 -----END PGP SIGNATURE-----

Merge 4.4.89 into android-4.4

Changes in 4.4.89
	ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
	ipv6: add rcu grace period before freeing fib6_node
	ipv6: fix sparse warning on rt6i_node
	qlge: avoid memcpy buffer overflow
	Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
	tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
	Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
	Revert "net: fix percpu memory leaks"
	gianfar: Fix Tx flow control deactivation
	ipv6: fix memory leak with multiple tables during netns destruction
	ipv6: fix typo in fib6_net_exit()
	f2fs: check hot_data for roll-forward recovery
	x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
	md/raid5: release/flush io in raid5_do_work()
	nfsd: Fix general protection fault in release_lock_stateid()
	mm: prevent double decrease of nr_reserved_highatomic
	tty: improve tty_insert_flip_char() fast path
	tty: improve tty_insert_flip_char() slow path
	tty: fix __tty_insert_flip_char regression
	Input: i8042 - add Gigabyte P57 to the keyboard reset table
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
	MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
	MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
	MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
	MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
	crypto: AF_ALG - remove SGL terminator indicator when chaining
	ext4: fix incorrect quotaoff if the quota feature is enabled
	ext4: fix quota inconsistency during orphan cleanup for read-only mounts
	powerpc: Fix DAR reporting when alignment handler faults
	block: Relax a check in blk_start_queue()
	md/bitmap: disable bitmap_resize for file-backed bitmaps.
	skd: Avoid that module unloading triggers a use-after-free
	skd: Submit requests to firmware before triggering the doorbell
	scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
	scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
	scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
	scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
	scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
	scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
	scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
	scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
	scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
	scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
	scsi: storvsc: fix memory leak on ring buffer busy
	scsi: sg: remove 'save_scat_len'
	scsi: sg: use standard lists for sg_requests
	scsi: sg: off by one in sg_ioctl()
	scsi: sg: factor out sg_fill_request_table()
	scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
	scsi: qla2xxx: Fix an integer overflow in sysfs code
	ftrace: Fix selftest goto location on error
	tracing: Apply trace_clock changes to instance max buffer
	ARC: Re-enable MMU upon Machine Check exception
	PCI: shpchp: Enable bridge bus mastering if MSI is enabled
	media: v4l2-compat-ioctl32: Fix timespec conversion
	media: uvcvideo: Prevent heap overflow when accessing mapped controls
	bcache: initialize dirty stripes in flash_dev_run()
	bcache: Fix leak of bdev reference
	bcache: do not subtract sectors_to_gc for bypassed IO
	bcache: correct cache_dirty_target in __update_writeback_rate()
	bcache: Correct return value for sysfs attach errors
	bcache: fix for gc and write-back race
	bcache: fix bch_hprint crash and improve output
	ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
	Linux 4.4.89

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-09-27 11:52:16 +02:00
Steven Rostedt (VMware)
ed1bf4397d ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream.

If function tracing is disabled by the user via the function-trace option or
the proc sysctl file, and a ftrace_ops that was allocated on the heap is
unregistered, then the shutdown code exits out without doing the proper
clean up. This was found via kmemleak and running the ftrace selftests, as
one of the tests unregisters with function tracing disabled.

 # cat kmemleak
unreferenced object 0xffffffffa0020000 (size 4096):
  comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s)
  hex dump (first 32 bytes):
    55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89  U.t$.UH...t$.UH.
    e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c  .H......H.D$PH.L
  backtrace:
    [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0
    [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0
    [<ffffffff8109697f>] module_alloc+0x4f/0x90
    [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420
    [<ffffffff81249947>] ftrace_startup+0xe7/0x300
    [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90
    [<ffffffff81263786>] trace_selftest_ops+0x204/0x397
    [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624
    [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7
    [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192
    [<ffffffff81002230>] do_one_initcall+0x90/0x1e2
    [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe
    [<ffffffff81d61ec3>] kernel_init+0x13/0x122
    [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40
    [<ffffffffffffffff>] 0xffffffffffffffff

Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:17 +02:00
Baohong Liu
d28e96be7c tracing: Apply trace_clock changes to instance max buffer
commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream.

Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.

Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com

Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27 11:00:16 +02:00