Commit graph

2368 commits

Author SHA1 Message Date
Patrick Bellasi
2aada289d7 sched/fair: trace energy_diff for non boosted tasks
In systems where SchedTune is enabled, we do not report energy diff for non
boosted tasks. Let's fix this by always genereting an energy_diff event where
however:
  nrg.delta = 0, since we skip energy normalization
  payoff = nrg.diff, since the payoff is defined just by the energy difference

Change-Id: I9a11ec19b6f56da04147f5ae5b47daf1dd180445
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
2f30db8df4 UPSTREAM: sched/fair: Sync task util before slow-path wakeup
We use task_util() in find_idlest_group() via capacity_spare_wake().
This task_util() updated in wake_cap(). However wake_cap() is not the
only reason for ending up in find_idlest_group() - we could have been sent
there by wake_wide(). So explicitly sync the task util with prev_cpu
when we are about to head to find_idlest_group().

We could simply do this at the beginning of
select_task_rq_fair() (i.e. irrespective of whether we're heading to
select_idle_sibling() or find_idlest_group() & co), but I didn't want to
slow down the select_idle_sibling() path more than necessary.

Don't do this during fork balancing, we won't need the task_util and
we'd just clobber the last_update_time, which is supposed to be 0.

Change-Id: I935f4bfdfec3e8b914457aac3387ce264d5fd484
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andres Oportus <andresoportus@google.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: http://lkml.kernel.org/r/20170808095519.10077-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit ea16f0ea6c3d tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
5a86636649 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when the local group is idlest
find_idlest_group() returns NULL when the local group is idlest. The
caller then continues the find_idlest_group() search at a lower level
of the current CPU's sched_domain hierarchy. find_idlest_group_cpu() is
not consulted and, crucially, @new_cpu is not updated. This means the
search is pointless and we return @prev_cpu from select_task_rq_fair().

This is fixed by initialising @new_cpu to @cpu instead of @prev_cpu.

Change-Id: Ie531f5bb29775952bdc4c148b6e974b2f5f32b7a
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-6-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 93f50f90247e tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
4116547645 UPSTREAM: sched/fair: Fix usage of find_idlest_group() when no groups are allowed
When 'p' is not allowed on any of the CPUs in the sched_domain, we
currently return NULL from find_idlest_group(), and pointlessly
continue the search on lower sched_domain levels (where 'p' is also not
allowed) before returning prev_cpu regardless (as we have not updated
new_cpu).

Add an explicit check for this case, and add a comment to
find_idlest_group(). Now when find_idlest_group() returns NULL, it always
means that the local group is allowed and idlest.

Change-Id: I5f2648d2f7fb0465677961ecb7473df3d06f0057
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-5-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 6fee85ccbc76 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
9c825cf616 BACKPORT: sched/fair: Fix find_idlest_group when local group is not allowed
When the local group is not allowed we do not modify this_*_load from
their initial value of 0. That means that the load checks at the end
of find_idlest_group cause us to incorrectly return NULL. Fixing the
initial values to ULONG_MAX means we will instead return the idlest
remote group in that case.

BACKPORT: Note 4.4 is missing commit 6b94780e45c1 "sched/core: Use
load_avg for selecting idlest group", so we only have to fix
this_load instead of this_runnable_load and this_avg_load.

Change-Id: I41f775b0e7c8f5e675c2780f955bb130a563cba7
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20171005114516.18617-4-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 0d10ab952e99 tip:sched/core)
(backport changes described above)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
529def2ffe UPSTREAM: sched/fair: Remove unnecessary comparison with -1
Since commit:

  83a0a96a5f ("sched/fair: Leverage the idle state info when choosing the "idlest" cpu")

find_idlest_group_cpu() (formerly find_idlest_cpu) no longer returns -1,
so we can simplify the checking of the return value in find_idlest_cpu().

Change-Id: I98f4b9f178cd93a30408e024e608d36771764c7b
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-3-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from commit e90381eaecf6 in tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
0f743ce745 BACKPORT: sched/fair: Move select_task_rq_fair slow-path into its own function
In preparation for changes that would otherwise require adding a new
level of indentation to the while(sd) loop, create a new function
find_idlest_cpu() which contains this loop, and rename the existing
find_idlest_cpu() to find_idlest_group_cpu().

Code inside the while(sd) loop is unchanged. @new_cpu is added as a
variable in the new function, with the same initial value as the
@new_cpu in select_task_rq_fair().

Change-Id: I9842308cab00dc9cd6c513fc38c609089a1aaaaf
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171005114516.18617-2-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(reworked for eas/cas schedstats added in Android)
(cherry-picked commit 18bd1b4bd53a from tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Brendan Jackman
795a6867cf UPSTREAM: sched/fair: Force balancing on nohz balance if local group has capacity
The "goto force_balance" here is intended to mitigate the fact that
avg_load calculations can result in bad placement decisions when
priority is asymmetrical.

The original commit that adds it:

  fab476228b ("sched: Force balancing on newidle balance if local group has capacity")

explains:

    Under certain situations, such as a niced down task (i.e. nice =
    -15) in the presence of nr_cpus NICE0 tasks, the niced task lands
    on a sched group and kicks away other tasks because of its large
    weight. This leads to sub-optimal utilization of the
    machine. Even though the sched group has capacity, it does not
    pull tasks because sds.this_load >> sds.max_load, and f_b_g()
    returns NULL.

A similar but inverted issue also affects ARM big.LITTLE (asymmetrical CPU
capacity) systems - consider 8 always-running, same-priority tasks on a
system with 4 "big" and 4 "little" CPUs. Suppose that 5 of them end up on
the "big" CPUs (which will be represented by one sched_group in the DIE
sched_domain) and 3 on the "little" (the other sched_group in DIE), leaving
one CPU unused. Because the "big" group has a higher group_capacity its
avg_load may not present an imbalance that would cause migrating a
task to the idle "little".

The force_balance case here solves the problem but currently only for
CPU_NEWLY_IDLE balances, which in theory might never happen on the
unused CPU. Including CPU_IDLE in the force_balance case means
there's an upper bound on the time before we can attempt to solve the
underutilization: after DIE's sd->balance_interval has passed the
next nohz balance kick will help us out.

Change-Id: I807ba5cba0ef1b8bbec02cbcd4755fd32af10135
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170807163900.25180-1-brendan.jackman@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked-from: commit 583ffd99d765 tip:sched/core)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
fd4a95dab8 UPSTREAM: sched/core: Add missing update_rq_clock() call in set_user_nice()
Address this rq-clock update bug:

  WARNING: CPU: 30 PID: 195 at ../kernel/sched/sched.h:797 set_next_entity()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    set_next_entity()
    ? _raw_spin_lock()
    set_curr_task_fair()
    set_user_nice.part.85()
    set_user_nice()
    create_worker()
    worker_thread()
    kthread()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 2fb8d36787affe26f3536c3d8ec094995a48037d)
Change-Id: I53ba056e72820c7fadb3f022e4ee3b821c0de17d
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bab39eb879 UPSTREAM: sched/core: Add missing update_rq_clock() call for task_hot()
Add the update_rq_clock() call at the top of the callstack instead of
at the bottom where we find it missing, this to aid later effort to
minimize the number of update_rq_lock() calls.

  WARNING: CPU: 30 PID: 194 at ../kernel/sched/sched.h:797 assert_clock_updated()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    assert_clock_updated.isra.63.part.64()
    can_migrate_task()
    load_balance()
    pick_next_task_fair()
    __schedule()
    schedule()
    worker_thread()
    kthread()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 3bed5e2166a5e433bf62162f3cd3c5174d335934)
Change-Id: Ief5070dcce486535334dcb739ee16b989ea9df42
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
bea1b621d9 UPSTREAM: sched/core: Add missing update_rq_clock() in detach_task_cfs_rq()
Instead of adding the update_rq_clock() all the way at the bottom of
the callstack, add one at the top, this to aid later effort to
minimize update_rq_lock() calls.

  WARNING: CPU: 0 PID: 1 at ../kernel/sched/sched.h:797 detach_task_cfs_rq()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    dump_stack()
    __warn()
    warn_slowpath_fmt()
    detach_task_cfs_rq()
    switched_from_fair()
    __sched_setscheduler()
    _sched_setscheduler()
    sched_set_stop_task()
    cpu_stop_create()
    __smpboot_create_thread.part.2()
    smpboot_register_percpu_thread_cpumask()
    cpu_stop_init()
    do_one_initcall()
    ? print_cpu_info()
    kernel_init_freeable()
    ? rest_init()
    kernel_init()
    ret_from_fork()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 80f5c1b84baa8180c3c27b7e227429712cd967b6)
Change-Id: Ibffde077d18eabec4c2984158bd9d6d73bd0fb96
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Peter Zijlstra
4863faf5e4 UPSTREAM: sched/core: Add missing update_rq_clock() in post_init_entity_util_avg()
Address this rq-clock update bug:

  WARNING: CPU: 0 PID: 0 at ../kernel/sched/sched.h:797 post_init_entity_util_avg()
  rq->clock_update_flags < RQCF_ACT_SKIP

  Call Trace:
    __warn()
    post_init_entity_util_avg()
    wake_up_new_task()
    _do_fork()
    kernel_thread()
    rest_init()
    start_kernel()

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 4126bad6717336abe5d666440ae15555563ca53f)
Change-Id: Ibe9a73386896377f96483d195e433259218755a5
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:33 +01:00
Vincent Guittot
c14c9b6e3e UPSTREAM: sched/core: Fix find_idlest_group() for fork
During fork, the utilization of a task is init once the rq has been
selected because the current utilization level of the rq is used to
set the utilization of the fork task. As the task's utilization is
still 0 at this step of the fork sequence, it doesn't make sense to
look for some spare capacity that can fit the task's utilization.
Furthermore, I can see perf regressions for the test:

   hackbench -P -g 1

because the least loaded policy is always bypassed and tasks are not
spread during fork.

With this patch and the fix below, we are back to same performances as
for v4.8. The fix below is only a temporary one used for the test
until a smarter solution is found because we can't simply remove the
test which is useful for others benchmarks

| @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
|
|	avg_cost = this_sd->avg_scan_cost;
|
| -	/*
| -	 * Due to large variance we need a large fuzz factor; hackbench in
| -	 * particularly is sensitive here.
| -	 */
| -	if ((avg_idle / 512) < avg_cost)
| -		return -1;
| -
|	time = local_clock();
|
|	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {

Tested-by: Matt Fleming <matt@codeblueprint.co.uk>
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk>
Acked-by: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dietmar.eggemann@arm.com
Cc: kernellwp@gmail.com
Cc: umgwanakikbuti@gmail.com
Cc: yuyang.du@intel.comc
Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f519a3f1c6b7a990e5aed37a8f853c6ecfdee945)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I86cc2ad81af3467c0b2f82b995111f428248baa4
2017-10-27 13:30:33 +01:00
Peter Zijlstra
97cb74f485 BACKPORT: sched/fair: Fix PELT integrity for new tasks
Vincent and Yuyang found another few scenarios in which entity
tracking goes wobbly.

The scenarios are basically due to the fact that new tasks are not
immediately attached and thereby differ from the normal situation -- a
task is always attached to a cfs_rq load average (such that it
includes its blocked contribution) and are explicitly
detached/attached on migration to another cfs_rq.

Scenario 1: switch to fair class

  p->sched_class = fair_class;
  if (queued)
    enqueue_task(p);
      ...
        enqueue_entity()
	  enqueue_entity_load_avg()
	    migrated = !sa->last_update_time (true)
	    if (migrated)
	      attach_entity_load_avg()
  check_class_changed()
    switched_from() (!fair)
    switched_to()   (fair)
      switched_to_fair()
        attach_entity_load_avg()

If @p is a new task that hasn't been fair before, it will have
!last_update_time and, per the above, end up in
attach_entity_load_avg() _twice_.

Scenario 2: change between cgroups

  sched_move_group(p)
    if (queued)
      dequeue_task()
    task_move_group_fair()
      detach_task_cfs_rq()
        detach_entity_load_avg()
      set_task_rq()
      attach_task_cfs_rq()
        attach_entity_load_avg()
    if (queued)
      enqueue_task();
        ...
          enqueue_entity()
	    enqueue_entity_load_avg()
	      migrated = !sa->last_update_time (true)
	      if (migrated)
	        attach_entity_load_avg()

Similar as with scenario 1, if @p is a new task, it will have
!load_update_time and we'll end up in attach_entity_load_avg()
_twice_.

Furthermore, notice how we do a detach_entity_load_avg() on something
that wasn't attached to begin with.

As stated above; the problem is that the new task isn't yet attached
to the load tracking and thereby violates the invariant assumption.

This patch remedies this by ensuring a new task is indeed properly
attached to the load tracking on creation, through
post_init_entity_util_avg().

Of course, this isn't entirely as straightforward as one might think,
since the task is hashed before we call wake_up_new_task() and thus
can be poked at. We avoid this by adding TASK_NEW and teaching
cpu_cgroup_can_attach() to refuse such tasks.

.:: BACKPORT

Complicated by the fact that mch of the lines changed by the original
of this commit were then changed by:

df217913e72e sched/fair: Factorize attach/detach entity <Vincent Guittot>

and then

d31b1a66cbe0 sched/fair: Factorize PELT update <Vincent Guittot>

, which have both already been backported here.

Reported-by: Yuyang Du <yuyang.du@intel.com>
Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 7dc603c9028ea5d4354e0e317e8481df99b06d7e)
Change-Id: Ibc59eb52310a62709d49a744bd5a24e8b97c4ae8
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Vincent Guittot
138a670d97 BACKPORT: sched/cgroup: Fix cpu_cgroup_fork() handling
A new fair task is detached and attached from/to task_group with:

  cgroup_post_fork()
    ss->fork(child) := cpu_cgroup_fork()
      sched_move_task()
        task_move_group_fair()

Which is wrong, because at this point in fork() the task isn't fully
initialized and it cannot 'move' to another group, because its not
attached to any group as yet.

In fact, cpu_cgroup_fork() needs a small part of sched_move_task() so we
can just call this small part directly instead sched_move_task(). And
the task doesn't really migrate because it is not yet attached so we
need the following sequence:

  do_fork()
    sched_fork()
      __set_task_cpu()

    cgroup_post_fork()
      set_task_rq() # set task group and runqueue

    wake_up_new_task()
      select_task_rq() can select a new cpu
      __set_task_cpu
      post_init_entity_util_avg
        attach_task_cfs_rq()
      activate_task
        enqueue_task

This patch makes that happen.

BACKPORT: Difference from original commit:

- Removed use of DEQUEUE_MOVE (which isn't defined in 4.4) in
  dequeue_task flags
- Replaced "struct rq_flags rf" with "unsigned long flags".

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[ Added TASK_SET_GROUP to set depth properly. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit ea86cb4b7621e1298a37197005bf0abcc86348d4)
Change-Id: I8126fd923288acf961218431ffd29d6bf6fd8d72
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Peter Zijlstra
6b02ab68ec UPSTREAM: sched/fair: Fix and optimize the fork() path
The task_fork_fair() callback already calls __set_task_cpu() and takes
rq->lock.

If we move the sched_class::task_fork callback in sched_fork() under
the existing p->pi_lock, right after its set_task_cpu() call, we can
avoid doing two such calls and omit the IRQ disabling on the rq->lock.

Change to __set_task_cpu() to skip the migration bits, this is a new
task, not a migration. Similarly, make wake_up_new_task() use
__set_task_cpu() for the same reason, the task hasn't actually
migrated as it hasn't ever ran.

This cures the problem of calling migrate_task_rq_fair(), which does
remove_entity_from_load_avg() on tasks that have never been added to
the load avg to begin with.

This bug would result in transiently messed up load_avg values, averaged
out after a few dozen milliseconds. This is probably the reason why
this bug was not found for such a long time.

Reported-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit e210bffd39d01b649c94b820c28ff112673266dd)
Change-Id: Icbddbaa6e8c1071859673d8685bc3f38955cf144
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
792510d9b3 BACKPORT: sched/fair: Make it possible to account fair load avg consistently
While set_task_rq_fair() is introduced in mainline by commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently"),
the function results to be introduced here by the backport of
commit 09a43ace1f98 ("sched/fair: Propagate load during synchronous
attach/detach"). The problem (apart from the confusion introduced by the
backport) is actually that set_task_rq_fair() is currently not called at
all.

Fix the problem by backporting again commit ad936d8658fd
("sched/fair: Make it possible to account fair load avg consistently").

Original change log:

The current code accounts for the time a task was absent from the fair
class (per ATTACH_AGE_LOAD). However it does not work correctly when a
task got migrated or moved to another cgroup while outside of the fair
class.

This patch tries to address that by aging on migration. We locklessly
read the 'last_update_time' stamp from both the old and new cfs_rq,
ages the load upto the old time, and sets it to the new time.

These timestamps should in general not be more than 1 tick apart from
one another, so there is a definite bound on things.

Signed-off-by: Byungchul Park <byungchul.park@lge.com>
[ Changelog, a few edits and !SMP build fix ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445616981-29904-2-git-send-email-byungchul.park@lge.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from ad936d8658fd348338cb7d42c577dac77892b074)
Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Change-Id: I17294ab0ada3901d35895014715fd60952949358
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-10-27 13:30:32 +01:00
Chris Redpath
fac311be26 cpufreq/sched: Consider max cpu capacity when choosing frequencies
When using schedfreq on cpus with max capacity significantly smaller than
1024, the tick update uses non-normalised capacities - this leads to
selecting an incorrect OPP as we were scaling the frequency as if the
max capacity achievable was 1024 rather than the max for that particular
cpu or group. This could result in a cpu being stuck at the lowest OPP
and unable to generate enough utilisation to climb out if the max
capacity is significantly smaller than 1024.

Instead, normalize the capacity to be in the range 0-1024 in the tick
so that when we later select a frequency, we get the correct one.

Also comments updated to be clearer about what is needed.

Change-Id: Id84391c7ac015311002ada21813a353ee13bee60
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-27 13:30:32 +01:00
Oleg Nesterov
0f85c0954b sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
commit 18f649ef344127ef6de23a5a4272dbe2fdb73dde upstream.

The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.

However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:

	int main(void)
	{
		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);

		assert(sctl > 0);
		if (fork()) {
			wait(NULL); // destroy the child's ag/tg
			pause();
		}

		assert(pwrite(sctl, "1\n", 2, 0) == 2);
		assert(setsid() > 0);
		if (fork())
			pause();

		kill(getppid(), SIGKILL);
		sleep(1);

		// The child has gone, the grandchild runs with kref == 1
		assert(pwrite(sctl, "0\n", 2, 0) == 2);
		assert(setsid() > 0);

		// runs with the freed ag/tg
		for (;;)
			sleep(1);

		return 0;
	}

crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.

Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
 [sumits: submit to 4.4 LTS, post testing on Hikey]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-27 10:23:17 +02:00
Chris Redpath
4f8767d1ca ANDROID: sched/fair: Select correct capacity state for energy_diff
The util returned from group_max_util is not capped at the max util
present in the group, so it can be larger than the capacity stored in
the array. Ensure that when this happens, we always use the last entry
in the array to fetch energy from.

Tested with synthetics on Juno board.

Bug: 38159576
Change-Id: I89fb52fb7e68fa3e682e308acc232596672d03f7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-25 19:28:51 +00:00
Leo Yan
774481506a cpufreq: schedutil: clamp util to CPU maximum capacity
The code is to get the CPU util by accumulate different scheduling
classes and when the total util value is larger than CPU capacity
then it clamps util to CPU maximum capacity. So we can get correct util
value when use PELT signal but if with WALT signal it misses to clamp
util value.

On the other hand, WALT doesn't accumulate different class utilization
but it needs to applying boost margin for WALT signal the CPU util
value is possible to be larger than CPU capacity; so this patch is to
always clamp util to CPU maximum capacity.

Change-Id: I05481ddbf20246bb9be15b6bd21b6ec039015ea8
Signed-off-by: Leo Yan <leo.yan@linaro.org>
2017-10-24 15:49:27 +00:00
Chris Redpath
d34d2c97ae cpufreq/sched: Use cpu max freq rather than policy max
When we convert capacity into frequency, we used policy->max to get
the max freq of the cpu. Since this can be changed by userspace policy
or thermal events, we are potentially asking for a lower frequency
than the utilization demands.

Change over to using cpuinfo.max which is the max freq supported by
that cpu rather than the currently-chosen max. Frequency granted still
honours the max policy.

Tested by setting a userspace policy and observing the relevant vars
in a trace. In this instance, we ask for around 1ghz instead of 620MHz.

freq_new=1013512
unfixed_freq_new=624487
capacity=546
cpuinfo_max=1900800
policy_max=1171200

Change-Id: I8c5694db42243c6fb78bb9be9046b06ac81295e7
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-23 12:30:46 +01:00
Dietmar Eggemann
724091f67f sched/fair: remove erroneous RCU_LOCKDEP_WARN from start_cpu()
Fixes: https://bugs.linaro.org/show_bug.cgi?id=3075

Change-Id: I62d714fc4b9366a9b2535649aa92d1edc840cf94
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-10-20 15:33:41 +00:00
Blagovest Kolenichev
f9719b203c Merge android-4.4@d6fbbe5 (v4.4.93) into msm-4.4
* refs/heads/tmp-d6fbbe5
  Linux 4.4.93
  x86/alternatives: Fix alt_max_short macro to really be a max()
  USB: serial: console: fix use-after-free after failed setup
  USB: serial: qcserial: add Dell DW5818, DW5819
  USB: serial: option: add support for TP-Link LTE module
  USB: serial: cp210x: add support for ELV TFD500
  USB: serial: ftdi_sio: add id for Cypress WICED dev board
  fix unbalanced page refcounting in bio_map_user_iov
  direct-io: Prevent NULL pointer access in submit_page_section
  usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options
  ALSA: line6: Fix leftover URB at error-path during probe
  ALSA: caiaq: Fix stray URB at probe error path
  ALSA: seq: Fix copy_from_user() call inside lock
  ALSA: seq: Fix use-after-free at creating a port
  ALSA: usb-audio: Kill stray URB at exiting
  iommu/amd: Finish TLB flush in amd_iommu_unmap()
  usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet
  KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit
  crypto: shash - Fix zero-length shash ahash digest crash
  HID: usbhid: fix out-of-bounds bug
  dmaengine: edma: Align the memcpy acnt array size with the transfer
  MIPS: math-emu: Remove pr_err() calls from fpu_emu()
  USB: dummy-hcd: Fix deadlock caused by disconnect detection
  rcu: Allow for page faults in NMI handlers
  iwlwifi: mvm: use IWL_HCMD_NOCOPY for MCAST_FILTER_CMD
  nl80211: Define policy for packet pattern attributes
  CIFS: Reconnect expired SMB sessions
  ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets
  brcmfmac: add length check in brcmf_cfg80211_escan_handler()
  ANDROID: HACK: arm64: use -mno-implicit-float instead of -mgeneral-regs-only
  sched: Update task->on_rq when tasks are moving between runqueues
  FROMLIST: f2fs: expose some sectors to user in inline data or dentry case
  crypto: Work around deallocated stack frame reference gcc bug on sparc.
  UPSTREAM: f2fs: fix potential panic during fstrim
  ANDROID: fscrypt: remove unnecessary fscrypto.h
  ANDROID: binder: fix node sched policy calculation
  ANDROID: Kbuild, LLVMLinux: allow overriding clang target triple
  CHROMIUM: arm64: Disable asm-operand-width warning for clang
  CHROMIUM: kbuild: clang: Disable the 'duplicate-decl-specifier' warning
  UPSTREAM: x86/build: Use cc-option to validate stack alignment parameter
  UPSTREAM: x86/build: Fix stack alignment for CLang
  UPSTREAM: efi/libstub/arm64: Set -fpie when building the EFI stub
  BACKPORT: efi/libstub/arm64: Force 'hidden' visibility for section markers
  UPSTREAM: compiler, clang: always inline when CONFIG_OPTIMIZE_INLINING is disabled
  UPSTREAM: x86/boot: #undef memcpy() et al in string.c
  UPSTREAM: crypto: arm64/sha - avoid non-standard inline asm tricks
  UPSTREAM: kbuild: clang: Disable 'address-of-packed-member' warning
  UPSTREAM: x86/build: Specify stack alignment for clang
  UPSTREAM: x86/build: Use __cc-option for boot code compiler options
  BACKPORT: kbuild: Add __cc-option macro
  UPSTREAM: x86/hweight: Don't clobber %rdi
  BACKPORT: x86/hweight: Get rid of the special calling convention
  BACKPORT: x86/mm/kaslr: Use the _ASM_MUL macro for multiplication to work around Clang incompatibility
  UPSTREAM: crypto, x86: aesni - fix token pasting for clang
  UPSTREAM: x86/kbuild: Use cc-option to enable -falign-{jumps/loops}
  UPSTREAM: compiler, clang: properly override 'inline' for clang
  UPSTREAM: compiler, clang: suppress warning for unused static inline functions
  UPSTREAM: Kbuild: provide a __UNIQUE_ID for clang
  UPSTREAM: modules: mark __inittest/__exittest as __maybe_unused
  BACKPORT: kbuild: Add support to generate LLVM assembly files
  UPSTREAM: kbuild: use -Oz instead of -Os when using clang
  BACKPORT: kbuild, LLVMLinux: Add -Werror to cc-option to support clang
  UPSTREAM: kbuild: drop -Wno-unknown-warning-option from clang options
  UPSTREAM: kbuild: fix asm-offset generation to work with clang
  UPSTREAM: kbuild: consolidate redundant sed script ASM offset generation
  UPSTREAM: kbuild: Consolidate header generation from ASM offset information
  UPSTREAM: kbuild: clang: add -no-integrated-as to KBUILD_[AC]FLAGS
  UPSTREAM: kbuild: Add better clang cross build support

Conflicts:
	arch/x86/lib/Makefile
	net/wireless/nl80211.c

Change-Id: I76032e8d1206903bc948b9ed918e7ddee7e746c7
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-10-20 06:07:34 -07:00
Blagovest Kolenichev
b2465235ad Merge android-4.4@73a2b70 (v4.4.92) into msm-4.4
* refs/heads/tmp-73a2b70
  Linux 4.4.92
  ext4: don't allow encrypted operations without keys
  ext4: Don't clear SGID when inheriting ACLs
  ext4: fix data corruption for mmap writes
  sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
  nvme: protect against simultaneous shutdown invocations
  drm/i915/bios: ignore HDMI on port A
  brcmfmac: setup passive scan if requested by user-space
  uwb: ensure that endpoint is interrupt
  uwb: properly check kthread_run return value
  iio: adc: mcp320x: Fix oops on module unload
  iio: adc: mcp320x: Fix readout of negative voltages
  iio: ad7793: Fix the serial interface reset
  iio: core: Return error for failed read_reg
  staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
  iio: ad_sigma_delta: Implement a dedicated reset function
  iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
  iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
  xhci: fix finding correct bus_state structure for USB 3.1 hosts
  USB: fix out-of-bounds in usb_set_configuration
  usb: Increase quirk delay for USB devices
  USB: core: harden cdc_parse_cdc_header
  USB: uas: fix bug in handling of alternate settings
  scsi: sd: Do not override max_sectors_kb sysfs setting
  iwlwifi: add workaround to disable wide channels in 5GHz
  HID: i2c-hid: allocate hid buffers for real worst case
  ftrace: Fix kmemleak in unregister_ftrace_graph
  stm class: Fix a use-after-free
  Drivers: hv: fcopy: restore correct transfer length
  driver core: platform: Don't read past the end of "driver_override" buffer
  ALSA: usx2y: Suppress kernel warning at page allocation failures
  ALSA: compress: Remove unused variable
  lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
  USB: g_mass_storage: Fix deadlock when driver is unbound
  usb: gadget: mass_storage: set msg_registered after msg registered
  USB: devio: Don't corrupt user memory
  USB: dummy-hcd: Fix erroneous synchronization change
  USB: dummy-hcd: fix infinite-loop resubmission bug
  USB: dummy-hcd: fix connection failures (wrong speed)
  usb: pci-quirks.c: Corrected timeout values used in handshake
  ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
  usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
  usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
  usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
  usb: gadget: udc: atmel: set vbus irqflags explicitly
  USB: gadgetfs: fix copy_to_user while holding spinlock
  USB: gadgetfs: Fix crash caused by inadequate synchronization
  usb: gadget: inode.c: fix unbalanced spin_lock in ep0_write
  ANDROID: binder: init desired_prio.sched_policy before use it
  BACKPORT: net: xfrm: support setting an output mark.
  UPSTREAM: xfrm: Only add l3mdev oif to dst lookups
  UPSTREAM: net: l3mdev: Add master device lookup by index
  Linux 4.4.91
  ttpci: address stringop overflow warning
  ALSA: au88x0: avoid theoretical uninitialized access
  ARM: remove duplicate 'const' annotations'
  IB/qib: fix false-postive maybe-uninitialized warning
  drivers: firmware: psci: drop duplicate const from psci_of_match
  libata: transport: Remove circular dependency at free time
  xfs: remove kmem_zalloc_greedy
  i2c: meson: fix wrong variable usage in meson_i2c_put_data
  md/raid10: submit bio directly to replacement disk
  rds: ib: add error handle
  iommu/io-pgtable-arm: Check for leaf entry before dereferencing it
  parisc: perf: Fix potential NULL pointer dereference
  netfilter: nfnl_cthelper: fix incorrect helper->expect_class_max
  exynos-gsc: Do not swap cb/cr for semi planar formats
  MIPS: IRQ Stack: Unwind IRQ stack onto task stack
  netfilter: invoke synchronize_rcu after set the _hook_ to NULL
  bridge: netlink: register netdevice before executing changelink
  mmc: sdio: fix alignment issue in struct sdio_func
  usb: plusb: Add support for PL-27A1
  team: fix memory leaks
  net/packet: check length in getsockopt() called with PACKET_HDRLEN
  net: core: Prevent from dereferencing null pointer when releasing SKB
  MIPS: Lantiq: Fix another request_mem_region() return code check
  ASoC: dapm: fix some pointer error handling
  usb: chipidea: vbus event may exist before starting gadget
  audit: log 32-bit socketcalls
  ASoC: dapm: handle probe deferrals
  partitions/efi: Fix integer overflow in GPT size calculation
  USB: serial: mos7840: fix control-message error handling
  USB: serial: mos7720: fix control-message error handling
  drm/amdkfd: fix improper return value on error
  IB/ipoib: Replace list_del of the neigh->list with list_del_init
  IB/ipoib: rtnl_unlock can not come after free_netdev
  IB/ipoib: Fix deadlock over vlan_mutex
  tty: goldfish: Fix a parameter of a call to free_irq
  ARM: 8635/1: nommu: allow enabling REMAP_VECTORS_TO_RAM
  iio: adc: hx711: Add DT binding for avia,hx711
  iio: adc: axp288: Drop bogus AXP288_ADC_TS_PIN_CTRL register modifications
  hwmon: (gl520sm) Fix overflows and crash seen when writing into limit attributes
  sh_eth: use correct name for ECMR_MPDE bit
  extcon: axp288: Use vbus-valid instead of -present to determine cable presence
  igb: re-assign hw address pointer on reset after PCI error
  MIPS: ralink: Fix incorrect assignment on ralink_soc
  MIPS: Ensure bss section ends on a long-aligned address
  ARM: dts: r8a7790: Use R-Car Gen 2 fallback binding for msiof nodes
  RDS: RDMA: Fix the composite message user notification
  GFS2: Fix reference to ERR_PTR in gfs2_glock_iter_next
  drm: bridge: add DT bindings for TI ths8135
  drm_fourcc: Fix DRM_FORMAT_MOD_LINEAR #define
  FROMLIST: tracing: Add support for preempt and irq enable/disable events
  FROMLIST: tracing: Prepare to add preempt and irq trace events
  ANDROID: binder: fix transaction leak.
  ANDROID: binder: Add tracing for binder priority inheritance.
  Linux 4.4.90
  fix xen_swiotlb_dma_mmap prototype
  swiotlb-xen: implement xen_swiotlb_dma_mmap callback
  video: fbdev: aty: do not leak uninitialized padding in clk to userspace
  KVM: VMX: use cmpxchg64
  ARM: pxa: fix the number of DMA requestor lines
  ARM: pxa: add the number of DMA requestor lines
  dmaengine: mmp-pdma: add number of requestors
  cxl: Fix driver use count
  KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
  KVM: VMX: do not change SN bit in vmx_update_pi_irte()
  timer/sysclt: Restrict timer migration sysctl values to 0 and 1
  gfs2: Fix debugfs glocks dump
  x86/fpu: Don't let userspace set bogus xcomp_bv
  btrfs: prevent to set invalid default subvolid
  btrfs: propagate error to btrfs_cmp_data_prepare caller
  btrfs: fix NULL pointer dereference from free_reloc_roots()
  PCI: Fix race condition with driver_override
  kvm: nVMX: Don't allow L2 to access the hardware CR8
  KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
  arm64: fault: Route pte translation faults via do_translation_fault
  arm64: Make sure SPsel is always set
  seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
  bsg-lib: don't free job in bsg_prepare_job
  nl80211: check for the required netlink attributes presence
  vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
  SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
  SMB: Validate negotiate (to protect against downgrade) even if signing off
  Fix SMB3.1.1 guest authentication to Samba
  powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
  KEYS: prevent KEYCTL_READ on negative key
  KEYS: prevent creating a different user's keyrings
  KEYS: fix writing past end of user-supplied buffer in keyring_read()
  crypto: talitos - fix sha224
  crypto: talitos - Don't provide setkey for non hmac hashing algs.
  scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
  md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
  md/raid5: fix a race condition in stripe batch
  tracing: Erase irqsoff trace with empty write
  tracing: Fix trace_pipe behavior for instance traces
  KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
  mac80211: flush hw_roc_start work before cancelling the ROC
  cifs: release auth_key.response for reconnect.
  f2fs: catch up to v4.14-rc1
  UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests
  ANDROID: add script to fetch android kernel config fragments
  f2fs: reorganize stat information
  f2fs: clean up flush/discard command namings
  f2fs: check in-memory sit version bitmap
  f2fs: check in-memory nat version bitmap
  f2fs: check in-memory block bitmap
  f2fs: introduce FI_ATOMIC_COMMIT
  f2fs: clean up with list_{first, last}_entry
  f2fs: return fs_trim if there is no candidate
  f2fs: avoid needless checkpoint in f2fs_trim_fs
  f2fs: relax async discard commands more
  f2fs: drop exist_data for inline_data when truncated to 0
  f2fs: don't allow encrypted operations without keys
  f2fs: show the max number of atomic operations
  f2fs: get io size bit from mount option
  f2fs: support IO alignment for DATA and NODE writes
  f2fs: add submit_bio tracepoint
  f2fs: reassign new segment for mode=lfs
  f2fs: fix a missing discard prefree segments
  f2fs: use rb_entry_safe
  f2fs: add a case of no need to read a page in write begin
  f2fs: fix a problem of using memory after free
  f2fs: remove unneeded condition
  f2fs: don't cache nat entry if out of memory
  f2fs: remove unused values in recover_fsync_data
  f2fs: support async discard based on v4.9
  f2fs: resolve op and op_flags confilcts
  f2fs: remove wrong backported codes
  FROMLIST: binder: fix use-after-free in binder_transaction()
  UPSTREAM: ipv6: fib: Unlink replaced routes from their nodes

Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>

Conflicts:
	fs/f2fs/crypto_key.c
	fs/f2fs/f2fs_crypto.h
	net/wireless/nl80211.c
	sound/usb/card.c

Change-Id: I742aeaec84c7892165976b7bea3e07bdd6881d93
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-10-20 02:11:57 -07:00
Joonwoo Park
ed9e749668 sched: EAS/WALT: finish accounting prior to task_tick
In order to set rq->misfit_task in time, call update_task_ravg() prior
to task_tick.  This reduces upmigration delay by 1 scheduler window.

Change-Id: I7cc80badd423f2e7684125fbfd853b0a3610f0e8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Joonwoo Park
40c3aaa56a cpufreq: sched: update capacity request upon tick always
At present, sched_freq_tick() skips updating of capacity update when
current frequency is fmax.  This can cause incorrect frequency drop
when a CPU bound task goes into sleep for example :
  1) A task (A) enqueues onto CPU 0 and executes for long time.
  2) A new task (B) which has low task demand enqueues onto CPU 1 and
     executes long so becomes a CPU bound task.
  3) Both CPU 0 and 1 gets scheduler tick but skip sched_freq_tick()
     since current frequency is fmax.
  4) Task (A) sleeps and lower the CPU 0's capacity request.
  5) Because task (B) voted CPU capacity at step 2 with low demand and
     skipped to request afterwards, cluster frequency for both CPU 0
     and 1 drops to match capacity voted by CPU 1 at step 2 even though
     task (B) on CPU 1 requires max capacity.

Fix such incorrectness by not skipping CPU capacity voting at tick
path.

Change-Id: Ieb46af1ac96ffce7a5532c58c7f07bf1ada06b86
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Joonwoo Park
7ab48e4c8d sched/fair: prevent meaningless active migration
At present need_active_balance() determines whether an active
upmigration is needed by using capacity_of(). A CPU's capacity
may be reduced by RT pressure, and therefore distinguishing
capability differences with capacity_of() may lead to suboptimal
active migrations to less capable CPUs. Use capacity_orig_of
to distinguish differently capable CPUs in addition to
capacity_of(), thus avoiding placing tasks on less capable CPUs
due to instantaneous RT pressure.

Change-Id: I3e1435246a8edc3ad618ef98a34866cfbd8c16a5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[markivx: Reworked the commit text a bit]
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Vikram Mulukutla
be832f69a9 sched: walt: Leverage existing helper APIs to apply invariance
There's no need for a separate hierarchy of notifiers, APIs
and variables in walt.c for the purpose of applying frequency
and IPC invariance. Let's just use capacity_curr_of and get
rid of a lot of the infrastructure relating to capacity,
load_scale_factor etc.

Change-Id: Ia220e2c896373fa535db05bff60f9aa33aefc978
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-10-19 11:56:50 -07:00
Linux Build Service Account
f2b5c20a20 Merge "Merge android-4.4@d68ba9f (v4.4.89) into msm-4.4" 2017-10-17 05:38:14 -07:00
Olav Haugan
ec888d46d8 sched: Update task->on_rq when tasks are moving between runqueues
Task->on_rq has three states:
	0 - Task is not on runqueue (rq)
	1 (TASK_ON_RQ_QUEUED) - Task is on rq
	2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the
	process of being migrated to another rq

When a task is moving between rqs task->on_rq state should be
TASK_ON_RQ_MIGRATING in order for WALT to account rq's cumulative
runnable average correctly.  Without such state marking for all the
classes, WALT's update_history() would try to fixup task's demand
which was never contributed to any of CPUs during migration.

Change-Id: Iced3428f3924fe8ab5d0075698273ead04f12d5b
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop: Reinforced changelog to explain why this is needed by WALT.
           Fixed conflicts in deadline.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-10-16 21:07:06 +00:00
Greg Kroah-Hartman
73a2b70bdf This is the 4.4.92 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnfNZMACgkQONu9yGCS
 aT4KcxAAg3C7QW8Nm30Ds1ZgUrkFB78YAFLdgZNKzTBfQFwFHhkPR8fTGyCboMV6
 rL56FUDepHpGfnX8yT8PeoWNBzBprwFtCfMKFUi1MKRFpNEde4pLDVoVKQ3ie0dm
 ZMLoRsrzF4UOktRidEmxQr5yNv6qmPwMw6P68Giguu9QDabADjDJMY5Fv/2JQc0q
 UHefFWbA7WDwaNhBpgF0/P3IBgNF781sdYm7lOjiyMCJZoJyBGjyH6jeBSyJy9Qg
 bUhr7jALOe45PU0Lo3KpVjr14OEb78k/c2Iw1M4RuiNQgBrPOdmpGckstbRn1vtz
 yCmaUimLRB+Lu/7RSxI94gOduEt9t2NLQNhNKbbSJvEdU1L8o+MOhCLyaT+VQ2Dg
 xwKpXUefDRcZbetTNBCP4lvivSLo9mwY6U74ZmDUXZhVCJj+v7AriCB8y8k2uFdY
 uRf0Dv3Wf0H7LDFbjuVKirv5rILVGFL8gcg421yFUJQvQ1k0Mx1pBlUI3srjZCMu
 tRP3ObZta32ArGEvIKJAn3uhrguPgy8FWRsG3akh000jwDCsXHLohSX/5FDcnlUH
 q2N9wHoEYBe4aPBaDm5ll2wOvpxubwBbt9xdbj6I5AUPCUfYcAni9D1+3cKDSLK8
 hbSbkKrjK9zFT4qi7QNvKoU8IscTb7lRGqjoT0ee+ws15SvxSVk=
 =oa8l
 -----END PGP SIGNATURE-----

Merge 4.4.92 into android-4.4

Changes in 4.4.92
	usb: gadget: inode.c: fix unbalanced spin_lock in ep0_write
	USB: gadgetfs: Fix crash caused by inadequate synchronization
	USB: gadgetfs: fix copy_to_user while holding spinlock
	usb: gadget: udc: atmel: set vbus irqflags explicitly
	usb-storage: unusual_devs entry to fix write-access regression for Seagate external drives
	usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe
	usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction
	ALSA: usb-audio: Check out-of-bounds access by corrupted buffer descriptor
	usb: pci-quirks.c: Corrected timeout values used in handshake
	USB: dummy-hcd: fix connection failures (wrong speed)
	USB: dummy-hcd: fix infinite-loop resubmission bug
	USB: dummy-hcd: Fix erroneous synchronization change
	USB: devio: Don't corrupt user memory
	usb: gadget: mass_storage: set msg_registered after msg registered
	USB: g_mass_storage: Fix deadlock when driver is unbound
	lsm: fix smack_inode_removexattr and xattr_getsecurity memleak
	ALSA: compress: Remove unused variable
	ALSA: usx2y: Suppress kernel warning at page allocation failures
	driver core: platform: Don't read past the end of "driver_override" buffer
	Drivers: hv: fcopy: restore correct transfer length
	stm class: Fix a use-after-free
	ftrace: Fix kmemleak in unregister_ftrace_graph
	HID: i2c-hid: allocate hid buffers for real worst case
	iwlwifi: add workaround to disable wide channels in 5GHz
	scsi: sd: Do not override max_sectors_kb sysfs setting
	USB: uas: fix bug in handling of alternate settings
	USB: core: harden cdc_parse_cdc_header
	usb: Increase quirk delay for USB devices
	USB: fix out-of-bounds in usb_set_configuration
	xhci: fix finding correct bus_state structure for USB 3.1 hosts
	iio: adc: twl4030: Fix an error handling path in 'twl4030_madc_probe()'
	iio: adc: twl4030: Disable the vusb3v1 rugulator in the error handling path of 'twl4030_madc_probe()'
	iio: ad_sigma_delta: Implement a dedicated reset function
	staging: iio: ad7192: Fix - use the dedicated reset function avoiding dma from stack.
	iio: core: Return error for failed read_reg
	iio: ad7793: Fix the serial interface reset
	iio: adc: mcp320x: Fix readout of negative voltages
	iio: adc: mcp320x: Fix oops on module unload
	uwb: properly check kthread_run return value
	uwb: ensure that endpoint is interrupt
	brcmfmac: setup passive scan if requested by user-space
	drm/i915/bios: ignore HDMI on port A
	nvme: protect against simultaneous shutdown invocations
	sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
	ext4: fix data corruption for mmap writes
	ext4: Don't clear SGID when inheriting ACLs
	ext4: don't allow encrypted operations without keys
	Linux 4.4.92

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-12 12:05:45 +02:00
Peter Zijlstra
90fd673873 sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs
commit 50e76632339d4655859523a39249dd95ee5e93e7 upstream.

Cpusets vs. suspend-resume is _completely_ broken. And it got noticed
because it now resulted in non-cpuset usage breaking too.

On suspend cpuset_cpu_inactive() doesn't call into
cpuset_update_active_cpus() because it doesn't want to move tasks about,
there is no need, all tasks are frozen and won't run again until after
we've resumed everything.

But this means that when we finally do call into
cpuset_update_active_cpus() after resuming the last frozen cpu in
cpuset_cpu_active(), the top_cpuset will not have any difference with
the cpu_active_mask and this it will not in fact do _anything_.

So the cpuset configuration will not be restored. This was largely
hidden because we would unconditionally create identity domains and
mobile users would not in fact use cpusets much. And servers what do use
cpusets tend to not suspend-resume much.

An addition problem is that we'd not in fact wait for the cpuset work to
finish before resuming the tasks, allowing spurious migrations outside
of the specified domains.

Fix the rebuild by introducing cpuset_force_rebuild() and fix the
ordering with cpuset_wait_for_hotplug().

Reported-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: deb7aa308e ("cpuset: reorganize CPU / memory hotplug handling")
Link: http://lkml.kernel.org/r/20170907091338.orwxrqkbfkki3c24@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-12 11:27:35 +02:00
Linux Build Service Account
9e61cad143 Merge "sched: Make resched_cpu() unconditional" 2017-10-05 16:10:06 -07:00
Juri Lelli
b0fa18e1ca UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests
Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.

However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:

               CPU0                |               CPU1
                                   |
                                   |     RT task scheduled
                                   |     SCHED_CPUFREQ_RT is set
                                   |     CPU1->last_update = now
                                   |     freq transition to max
                                   |     last_freq_update_time = now
                                   |

                        more than TICK_NSEC nsecs

                                   |
     a small CFS wakes up          |
     CPU0->last_update = now1      |
     delta_ns(CPU0) < TICK_NSEC*   |
     CPU0's util is considered     |
     delta_ns(CPU1) =              |
      last_freq_update_time -      |
      CPU1->last_update = 0        |
      < TICK_NSEC                  |
     CPU1 is still considered      |
     CPU1->SCHED_CPUFREQ_RT is set |
     we stay at max (until CPU1    |
     exits from idle)              |

* delta_ns is actually negative as now1 > last_freq_update_time

While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.

Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d86ab9cff8b936aadde444d0e263a8db5ff0349b)
2017-10-03 21:18:45 +00:00
Paul E. McKenney
15a19dd355 sched: Make resched_cpu() unconditional
The current implementation of synchronize_sched_expedited() incorrectly
assumes that resched_cpu() is unconditional, which it is not.  This means
that synchronize_sched_expedited() can hang when resched_cpu()'s trylock
fails as follows (analysis by Neeraj Upadhyay):

o    CPU1 is waiting for expedited wait to complete:
sync_rcu_exp_select_cpus
     rdp->exp_dynticks_snap & 0x1   // returns 1 for CPU5
     IPI sent to CPU5

synchronize_sched_expedited_wait
         ret = swait_event_timeout(
                                     rsp->expedited_wq,
  sync_rcu_preempt_exp_done(rnp_root),
                                     jiffies_stall);

            expmask = 0x20 , and CPU 5 is in idle path (in cpuidle_enter())

o    CPU5 handles IPI and fails to acquire rq lock.

Handles IPI
     sync_sched_exp_handler
         resched_cpu
             returns while failing to try lock acquire rq->lock
         need_resched is not set

o    CPU5 calls  rcu_idle_enter() and as need_resched is not set, goes to
     idle (schedule() is not called).

o    CPU 1 reports RCU stall.

Given that resched_cpu() is now used only by RCU, this commit fixes the
assumption by making resched_cpu() unconditional.

Change-Id: I67cbf28612004f4b78e355dd00b5abdd0f31ec13
Reported-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Suggested-by: Neeraj Upadhyay <neeraju@codeaurora.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Patch-mainline: linux-kernel @ 18/09/17, 09:01
Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org>
2017-10-03 00:06:48 -07:00
Paul E. McKenney
3bc5ee6fd7 rcu: Stop disabling interrupts in scheduler fastpaths
We need the scheduler's fastpaths to be, well, fast, and unnecessarily
disabling and re-enabling interrupts is not necessarily consistent with
this goal.  Especially given that there are regions of the scheduler that
already have interrupts disabled.

This commit therefore moves the call to rcu_note_context_switch()
to one of the interrupts-disabled regions of the scheduler, and
removes the now-redundant disabling and re-enabling of interrupts from
rcu_note_context_switch() and the functions it calls.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Change-Id: I8de5c9890b1db126b06d4d8fed717b3c8bfcf866
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
[ paulmck: Shift rcu_note_context_switch() to avoid deadlock, as suggested
  by Peter Zijlstra. ]
Git-commit: 46a5d164db53ba6066b11889abb7fa6bddbe5cf7
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
[prsood@codeaurora.org: resolve trivial merge conflicts]
Signed-off-by: Prateek Sood <prsood@codeaurora.org>
2017-10-03 00:05:13 -07:00
Blagovest Kolenichev
fda1654df8 Merge android-4.4@d68ba9f (v4.4.89) into msm-4.4
* refs/heads/tmp-d68ba9f
  Linux 4.4.89
  ftrace: Fix memleak when unregistering dynamic ops when tracing disabled
  bcache: fix bch_hprint crash and improve output
  bcache: fix for gc and write-back race
  bcache: Correct return value for sysfs attach errors
  bcache: correct cache_dirty_target in __update_writeback_rate()
  bcache: do not subtract sectors_to_gc for bypassed IO
  bcache: Fix leak of bdev reference
  bcache: initialize dirty stripes in flash_dev_run()
  media: uvcvideo: Prevent heap overflow when accessing mapped controls
  media: v4l2-compat-ioctl32: Fix timespec conversion
  PCI: shpchp: Enable bridge bus mastering if MSI is enabled
  ARC: Re-enable MMU upon Machine Check exception
  tracing: Apply trace_clock changes to instance max buffer
  ftrace: Fix selftest goto location on error
  scsi: qla2xxx: Fix an integer overflow in sysfs code
  scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLE
  scsi: sg: factor out sg_fill_request_table()
  scsi: sg: off by one in sg_ioctl()
  scsi: sg: use standard lists for sg_requests
  scsi: sg: remove 'save_scat_len'
  scsi: storvsc: fix memory leak on ring buffer busy
  scsi: megaraid_sas: Return pended IOCTLs with cmd_status MFI_STAT_WRONG_STATE in case adapter is dead
  scsi: megaraid_sas: Check valid aen class range to avoid kernel panic
  scsi: zfcp: trace high part of "new" 64 bit SCSI LUN
  scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response
  scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records
  scsi: zfcp: fix missing trace records for early returns in TMF eh handlers
  scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBA
  scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace records
  scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path
  scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled
  skd: Submit requests to firmware before triggering the doorbell
  skd: Avoid that module unloading triggers a use-after-free
  md/bitmap: disable bitmap_resize for file-backed bitmaps.
  block: Relax a check in blk_start_queue()
  powerpc: Fix DAR reporting when alignment handler faults
  ext4: fix quota inconsistency during orphan cleanup for read-only mounts
  ext4: fix incorrect quotaoff if the quota feature is enabled
  crypto: AF_ALG - remove SGL terminator indicator when chaining
  MIPS: math-emu: MINA.<D|S>: Fix some cases of infinity and zero inputs
  MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of both infinite inputs
  MIPS: math-emu: <MAXA|MINA>.<D|S>: Fix cases of input values with opposite signs
  MIPS: math-emu: <MAX|MIN>.<D|S>: Fix cases of both inputs negative
  MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix cases of both inputs zero
  MIPS: math-emu: <MAX|MAXA|MIN|MINA>.<D|S>: Fix quiet NaN propagation
  Input: i8042 - add Gigabyte P57 to the keyboard reset table
  tty: fix __tty_insert_flip_char regression
  tty: improve tty_insert_flip_char() slow path
  tty: improve tty_insert_flip_char() fast path
  mm: prevent double decrease of nr_reserved_highatomic
  nfsd: Fix general protection fault in release_lock_stateid()
  md/raid5: release/flush io in raid5_do_work()
  x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
  f2fs: check hot_data for roll-forward recovery
  ipv6: fix typo in fib6_net_exit()
  ipv6: fix memory leak with multiple tables during netns destruction
  gianfar: Fix Tx flow control deactivation
  Revert "net: fix percpu memory leaks"
  Revert "net: use lib/percpu_counter API for fragmentation mem accounting"
  tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0
  Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
  qlge: avoid memcpy buffer overflow
  ipv6: fix sparse warning on rt6i_node
  ipv6: add rcu grace period before freeing fib6_node
  ipv6: accept 64k - 1 packet length in ip6_find_1stfragopt()
  f2fs: fix a missing size change in f2fs_setattr
  f2fs: fix to access nullified flush_cmd_control pointer
  f2fs: free meta pages if sanity check for ckpt is failed
  f2fs: detect wrong layout
  f2fs: call sync_fs when f2fs is idle
  Revert "f2fs: use percpu_counter for # of dirty pages in inode"
  f2fs: return AOP_WRITEPAGE_ACTIVATE for writepage
  f2fs: do not activate auto_recovery for fallocated i_size
  f2fs: fix 32-bit build
  f2fs: fix incorrect free inode count in ->statfs
  f2fs: drop duplicate header timer.h
  f2fs: fix wrong AUTO_RECOVER condition
  f2fs: do not recover i_size if it's valid
  f2fs: fix fdatasync
  f2fs: fix to account total free nid correctly
  f2fs: fix an infinite loop when flush nodes in cp
  f2fs: don't wait writeback for datas during checkpoint
  f2fs: fix wrong written_valid_blocks counting
  f2fs: avoid BG_GC in f2fs_balance_fs
  f2fs: fix redundant block allocation
  f2fs: use err for f2fs_preallocate_blocks
  f2fs: support multiple devices
  f2fs: allow dio read for LFS mode
  f2fs: revert segment allocation for direct IO
  f2fs: return directly if block has been removed from the victim
  Revert "f2fs: do not recover from previous remained wrong dnodes"
  f2fs: remove checkpoint in f2fs_freeze
  f2fs: assign segments correctly for direct_io
  f2fs: fix wrong i_atime recovery
  f2fs: record inode updating status correctly
  f2fs: Trace reset zone events
  f2fs: Reset sequential zones on zoned block devices
  f2fs: Cache zoned block devices zone type
  f2fs: Do not allow adaptive mode for host-managed zoned block devices
  f2fs: Always enable discard for zoned blocks devices
  f2fs: Suppress discard warning message for zoned block devices
  f2fs: Check zoned block feature for host-managed zoned block devices
  f2fs: Use generic zoned block device terminology
  f2fs: Add missing break in switch-case
  f2fs: avoid infinite loop in the EIO case on recover_orphan_inodes
  f2fs: report error of f2fs_fill_dentries
  fs/crypto: catch up 4.9-rc6
  f2fs: hide a maybe-uninitialized warning
  f2fs: remove percpu_count due to performance regression
  f2fs: make clean inodes when flushing inode page
  f2fs: keep dirty inodes selectively for checkpoint
  f2fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
  f2fs: use BIO_MAX_PAGES for bio allocation
  f2fs: declare static function for __build_free_nids
  f2fs: call f2fs_balance_fs for setattr
  f2fs: count dirty inodes to flush node pages during checkpoint
  f2fs: avoid casted negative value as shrink count
  f2fs: don't interrupt free nids building during nid allocation
  f2fs: clean up free nid list operations
  f2fs: split free nid list
  f2fs: clear nlink if fail to add_link
  f2fs: fix sparse warnings
  f2fs: fix error handling in fsync_node_pages
  f2fs: fix to update largest extent under lock
  f2fs: be aware of extent beyond EOF in fiemap
  f2fs: don't miss any f2fs_balance_fs cases
  f2fs: add missing f2fs_balance_fs in f2fs_zero_range
  f2fs: give a chance to detach from dirty list
  f2fs: fix to release discard entries during checkpoint
  f2fs: exclude free nids building and allocation
  f2fs: fix to determine start_cp_addr by sbi->cur_cp_pack
  f2fs: fix overflow due to condition check order
  posix_acl: Clear SGID bit when setting file permissions
  f2fs: fix wrong sum_page pointer in f2fs_gc
  f2fs: backport from (4c1fad64 - Merge tag 'for-f2fs-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs)
  Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups"
  BACKPORT: partial: mm, oom_reaper: do not mmput synchronously from the oom reaper context
  FROMLIST: android: binder: Don't get mm from task
  FROMLIST: android: binder: Remove unused vma argument
  FROMLIST: android: binder: Drop lru lock in isolate callback
  ANDROID: configs: remove config fragments
  drivers: cpufreq_interactive: handle error for module load fail
  UPSTREAM: Fix build break in fork.c when THREAD_SIZE < PAGE_SIZE

Conflicts:
	android/configs/android-base.cfg
	android/configs/android-recommended.cfg
	fs/f2fs/data.c
	fs/f2fs/f2fs.h
	fs/f2fs/super.c
	include/linux/mm_types.h
	include/linux/sched.h
	kernel/fork.c

Change-Id: I21a427f17e8a1892a212df7c8707f74fb37ce400
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-09-29 14:49:19 -07:00
Vikram Mulukutla
650b6a5c41 Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups"
This reverts commit c5616f2f874faa20b59b116177b99bf3948586df.

If we re-init the per-cpu boostgroup spinlock every time that
we add a new boosted cgroup, we can easily wipe out (reinit)
a spinlock struct while in a critical section. We should only
be setting up the per-cpu boostgroup data, and the spin_lock
initialization need only happen once - which we're already
doing in a postcore_initcall.

For example:

     -------- CPU 0 --------   | -------- CPU1 --------
cgroupX boost group added      |
schedtune_enqueue_task         |
  acquires(bg->lock)           | cgroupY boost group added
                               |  for_each_cpu()
                               |    raw_spin_lock_init(bg->lock)
  releases(bg->lock)           |
      BUG (already unlocked)   |
                               |

This results in the following BUG from the debug spinlock code:
	BUG: spinlock already unlocked on CPU#5, rcuop/6/68

Change-Id: I3016702780b461a0cd95e26c538cd18df27d6316
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-09-23 01:25:03 +00:00
Vikram Mulukutla
2bbf3a762a Revert "ANDROID: sched/tune: Initialize raw_spin_lock in boosted_groups"
This reverts commit c5616f2f874faa20b59b116177b99bf3948586df.

If we re-init the per-cpu boostgroup spinlock every time that
we add a new boosted cgroup, we can easily wipe out (reinit)
a spinlock struct while in a critical section. We should only
be setting up the per-cpu boostgroup data, and the spin_lock
initialization need only happen once - which we're already
doing in a postcore_initcall.

For example:

     -------- CPU 0 --------   | -------- CPU1 --------
cgroupX boost group added      |
schedtune_enqueue_task         |
  acquires(bg->lock)           | cgroupY boost group added
                               |  for_each_cpu()
                               |    raw_spin_lock_init(bg->lock)
  releases(bg->lock)           |
      BUG (already unlocked)   |
			       |

This results in the following BUG from the debug spinlock code:
	BUG: spinlock already unlocked on CPU#5, rcuop/6/68

CRs-fixed: 2113062
Change-Id: I1cd780d9ba5801cf99bfe46504b18a88e45f17a8
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
2017-09-22 04:24:28 -07:00
Blagovest Kolenichev
c988eaaeaf Merge android-4.4@a8935c9 (v4.4.87) into msm-4.4
* refs/heads/tmp-a8935c9:
  Linux 4.4.87
  crypto: algif_skcipher - only call put_page on referenced and used pages
  epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()
  kvm: arm/arm64: Force reading uncached stage2 PGD
  kvm: arm/arm64: Fix race in resetting stage2 PGD
  drm/ttm: Fix accounting error when fail to get pages for pool
  xfrm: policy: check policy direction value
  wl1251: add a missing spin_lock_init()
  CIFS: remove endian related sparse warning
  CIFS: Fix maximum SMB2 header size
  alpha: uapi: Add support for __SANE_USERSPACE_TYPES__
  cpuset: Fix incorrect memory_pressure control file mapping
  cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs
  ceph: fix readpage from fscache
  i2c: ismt: Return EMSGSIZE for block reads with bogus length
  i2c: ismt: Don't duplicate the receive length for block reads
  irqchip: mips-gic: SYNC after enabling GIC region
  ANDROID: cpufreq-dt: Set sane defaults for schedutil rate limits
  BACKPORT: cpufreq: schedutil: Use policy-dependent transition delays
  FROMLIST: binder: fix an ret value override
  FROMLIST: binder: fix memory corruption in binder_transaction binder
  Linux 4.4.86
  drm/i915: fix compiler warning in drivers/gpu/drm/i915/intel_uncore.c
  scsi: sg: reset 'res_in_use' after unlinking reserved array
  scsi: sg: protect accesses to 'reserved' page array
  arm64: fpsimd: Prevent registers leaking across exec
  x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl
  arm64: mm: abort uaccess retries upon fatal signal
  lpfc: Fix Device discovery failures during switch reboot test.
  p54: memset(0) whole array
  lightnvm: initialize ppa_addr in dev_to_generic_addr()
  gcov: support GCC 7.1
  gcov: add support for gcc version >= 6
  i2c: jz4780: drop superfluous init
  btrfs: remove duplicate const specifier
  ALSA: au88x0: Fix zero clear of stream->resources
  scsi: isci: avoid array subscript warning
  sched: WALT: fix window mis-alignment
  sched: EAS: kill incorrect nohz idle cpu kick
  sched: EAS: fix incorrect energy delta calculation due to rounding error
  sched: EAS/WALT: take into account of waking task's load
  cpufreq: sched: WALT: don't apply capacity margin twice
  sched: WALT: fix potential overflow
  sched: EAS: schedfreq: fix CPU util over estimation
  sched: EAS/WALT: use cr_avg instead of prev_runnable_sum
  sched: WALT: fix broken cumulative runnable average accounting
  sched: deadline: WALT: account cumulative runnable avg
  FROMLIST: android: binder: Add page usage in binder stats
  FROMLIST: android: binder: Add shrinker tracepoints
  FROMLIST: android: binder: Add global lru shrinker to binder
  FROMLIST: android: binder: Move buffer out of area shared with user space
  FROMLIST: android: binder: Add allocator selftest
  FROMLIST: android: binder: Refactor prev and next buffer into a helper function
  android: android-base.config: enable IP6_NF_MATCH_RPFILTER
  UPSTREAM: cpufreq: schedutil: Use unsigned int for iowait boost
  UPSTREAM: cpufreq: schedutil: Make iowait boost more energy efficient

Conflicts:
	drivers/cpufreq/cpufreq-dt.c
	kernel/sched/deadline.c
	kernel/sched/fair.c
	kernel/sched/sched.h

Change-Id: Iee31db3fd1a0d1650ebf3d6de307a4e4637120b4
Signed-off-by: Blagovest Kolenichev <bkolenichev@codeaurora.org>
2017-09-21 13:19:38 -07:00
Linux Build Service Account
ef93f7a163 Merge "Merge android-4.4@4b8fc9f (v4.4.82) into msm-4.4" 2017-09-08 22:04:15 -07:00
Rafael J. Wysocki
3482bbea6b BACKPORT: cpufreq: schedutil: Use policy-dependent transition delays
Make the schedutil governor take the initial (default) value of the
rate_limit_us sysfs attribute from the (new) transition_delay_us
policy parameter (to be set by the scaling driver).

That will allow scaling drivers to make schedutil use smaller default
values of rate_limit_us and reduce the default average time interval
between consecutive frequency changes.

Make intel_pstate set transition_delay_us to 500.

BACKPORT: Modified to support the separate up_rate_limit_us and
down_rate_limit_us (upstream just has a single rate_limit_us). Also
dropped the changes for intel_pstate as there's a merge conflict.

Change-Id: I62a8543879a4d8582cdcb31ebd55607705d1c8b1
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
(cherry picked from commit 1b72e7fd304639f1cd49d1e11955c4974936d88c)
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
2017-09-06 20:36:47 +00:00
Joonwoo Park
0caf1df0c5 sched: WALT: fix window mis-alignment
The initial window start needs to be close to ktime ns = 0 to be
aligned with scheduler tick.

Change-Id: Ia91f74efce2f910106622a054a6fcd507e763ca5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:24:21 -07:00
Joonwoo Park
3989a247e2 sched: EAS: kill incorrect nohz idle cpu kick
EAS won't allow NOHZ idle balancer until CPU's over utilized.  However
nohz_kick_needed() can return true.  This causes idle CPU wake up for
nothing.

Change-Id: I6e548442e29e4f85cda695e4c7101dd591b12fe6
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:24:12 -07:00
Joonwoo Park
11b618a0b2 sched: EAS: fix incorrect energy delta calculation due to rounding error
In order to calculate energy difference we currently iterates CPUs under
the same sched doamin to accumulate total energy cost and compare before
and after :

  for_each_domain(cpu)
          total_energy_before += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;

  for_each_domain(cpu)
          total_energy_after += (cpu_util * power) >> SCHED_CAPACITY_SHIFT;

Doing such can incorrectly calculate and report abs(delta) > 0 when
there is actually no energy delta between before and after because the
same total accumulated cpu_util of all the CPUs can be distributed
differently before and after and it causes different amount of rounding
error.

Fix such incorrectness by shifting just once with accumulated
total_energy.

Change-Id: I82f1e2e358367058960938b4ef81714f57e921cf
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(moved part to another commit)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-09-01 17:23:56 -07:00
Joonwoo Park
94e5c96507 sched: EAS/WALT: take into account of waking task's load
WALT's function cpu_util(cpu) reports CPU's load without taking into
account of waking task's load.  Thus currently cpu_overutilized()
underestimates load on the previous CPU of waking task.

Take into account of task's load to determine whether previous CPU is
overutilzed to bail out early without running energy_diff() which is
expensive.

Change-Id: I30f146984a880ad2cc1b8a4ce35bd239a8c9a607
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(minor rebase conflicts)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-09-01 17:23:47 -07:00
Joonwoo Park
f94958ffa7 cpufreq: sched: WALT: don't apply capacity margin twice
With WALT all the scheduler classes' load are accounted in scr->cfs and
update_cpu_capacity_request() adds capacity margin.  At present, at tick
path, scheduler also adds capacity margin.  Therefore the margin applied
twice.

Fix such error by using margin applied cpu utilization only for checking
whether frequency increase is needed.

Change-Id: Id7d8cc73b2e4eec70b274ca66e09bb0b16bf6f09
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
(trivial rebase conflict)
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
2017-09-01 17:23:40 -07:00
Joonwoo Park
c8b8c92bbc sched: WALT: fix potential overflow
Task demand and CPU util are in u64.

Change-Id: If7ec1623e723026d3346201122aab0303a6d2ba2
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:23:31 -07:00
Joonwoo Park
2d7da09705 sched: EAS: schedfreq: fix CPU util over estimation
WALT CPU utilization reports CPU load of all the scheduler classes.
Therefore adding RT class's load additionally will cause frequency
overshooting.  Fix such issue by not accounting RT class load when
requesting capacity.

Change-Id: I29600d7af7ca8c00e0d2ff1e13872024ccaa72bf
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2017-09-01 17:21:06 -07:00