Commit graph

21901 commits

Author SHA1 Message Date
Joonwoo Park
60abcbcfdf sched: print sched_task_load always
At present select_best_cpu() bails out when best idle CPU found without
printing sched_task_load trace event.  Print it.

Change-Id: Ie749239bdb32afa5b1b704c048342b905733647e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:37 -07:00
Joonwoo Park
0254e50843 sched: add preference for prev and sibling CPU in RT task placement
Add a bias towards the RT task's previous CPU and sibling CPUs in order
to avoid cache bouncing and migrations.

CRs-fixed: 927903
Change-Id: I45d79d774e65efcb38282130b6692b4c3b03c2f0
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:36 -07:00
Vikram Mulukutla
dba1a27b5a sched: core: Don't use current task_cpu when migrating with stop_one_cpu
To migrate a running task using stop_one_cpu, one has to give up
the the pi_lock and rq_lock. To safeguard against migration
between giving up those locks and actually invoking stop_one_cpu,
one has to save away task_cpu(p) before releasing pi_lock, and
use the saved value when passing it as the src_cpu argument to
stop_one_cpu. If the current task_cpu is passed in, the task may
have already been migrated to that CPU for whatever other reason.

sched_exec attempts to invoke stop_one_cpu with source CPU
set to task_cpu(task) after dropping the pi_lock. While this
doesn't result in a functional error, it is rather useless to
have the entire migration code run when the task is already
running on the destination CPU.

Change-Id: I02963ed02c7119a3d707580a191fbc86b94cdfaf
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
[joonwoop@codeaurora.org: omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:35 -07:00
Syed Rameez Mustafa
c00814c023 sched: Notify cpufreq governor early about potential big tasks
Tasks that are on the runqueue continuously for a certain amount of time
have the potential to be big tasks at the end of the window in which they
are runnable. In such scenarios ramping the CPU frequency early can
boost performance rather than waiting till the end of a window for the
governor to query load. Notify the governor early at every tick when a
task has been observed to execute beyond some percentage of the tick
period.

The threshold beyond which a task is eligible for early detection can be
changed via the tunable sched_early_detection_duration. The feature itself
is enabled only when scheduler boost is in effect.

Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in
 kernel/sched/core.c.  fixed minor conflicts in include/linux/sched.h,
 include/linux/sched/sysctl.h and kernel/sysctl.c due to
 CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:34 -07:00
Syed Rameez Mustafa
a805e4b220 sched: Skip resetting HMP stats when max frequencies remain unchanged
A change in cpufreq policy parameters currently trigger a partial reset
of HMP stats. This is necessary when there are changes in the max
frequency of any cluster since updated load scaling factors necessitate
updating the number of big and small tasks on every CPU. However, this
computation is redundant when parameters other than the max freq change.
Optimize code by avoiding the redundant calculations.

Change-Id: Ib572f5dfdc4ada378e695f328ff81e2ce31132ba
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:33 -07:00
Joonwoo Park
cee02f8168 sched: update sched_task_load trace event
Add best_cpu and latency field to sched_task_load trace event.  The latency
field represents combined latency of update_task_ravg(), update_task_ravg()
and select_best_cpu() which is useful to analyze latency overhead of HMP
scheduler.

Change-Id: Ie6d777c918d0414d361d758490e3cd7d509f5837
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:32 -07:00
Joonwoo Park
b2e60dbe08 sched: avoid unnecessary multiplication and division
Avoid unnecessary multiplication and division when load scaling factor
is 1024.

Change-Id: If3cb63a77feaf49cc69ddec7f41cc3c1cabbfc5a
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:31 -07:00
Joonwoo Park
91a8710235 sched: precompute required frequency for CPU load
At present in order to estimate power cost of CPU load, HMP scheduler
converts CPU load to coresponding frequency on the fly which can be
avoided.

Optimize and reduce execution time of select_best_cpu() by precomputing
CPU load to frequency conversion.  This optimization reduces about ~20% of
execution time of select_best_cpu() on average.

Change-Id: I385c57f2ea9a50883b76ba6ca3deb673b827217f
[joonwoop@codeaurora.org: fixed minior conflict in kernel/sched/sched.h.
 stripped out codes for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:31 -07:00
Joonwoo Park
383ae6b29e sched: clean up fixup_hmp_sched_stats()
The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg
HMP statistics) introduced the callback function fixup_hmp_sched_stats()
so update_history() can avoid decrement and increment pair of HMP stat.
However the commit also made fixup function to do obscure p->ravg.demand
update which isn't the cleanest way.

Revise the function fixup_hmp_sched_stats() so the caller can update
p->ravg.demand directly.

Change-Id: Id54667d306495d2109c26362813f80f08a1385ad
[joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:30 -07:00
Joonwoo Park
446beddcd4 sched: account new task load so that governor can apply different policy
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.

To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced.  The tunable defines tasks as new
when the tasks are have been active less than configured windows.

Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
[joonwoop@codeaurora.org: ommited changes for
 drivers/cpufreq/cpufreq_interactive.c.  cpufreq changes needs to be
 applied separately later.  fixed conflict in include/linux/sched.h and
 include/linux/sched/sysctl.h.  omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:29 -07:00
Syed Rameez Mustafa
809ea3fd1e sched: Fix frequency change checks when affined tasks are migrating
On the 3.18 kernel version and beyond, when the affinity for an
enqueued task changes such that migration is required, the rq
variable gets updated to the destination rq. This means that
check_for_freq_change() skips the source CPU frequency check and
instead double checks the destination CPU. Fix this by using the
src_cpu variable instead.

Change-Id: I14727a34e22c50c9a839007d474802f96a2f49f6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict int __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:28 -07:00
Olav Haugan
03a683a55c sched: Add tunables for static cpu and cluster cost
Add per-cpu tunable to set the extra cost to use a CPU that is idle.
Add the same for a cluster.

Change-Id: I4aa53f3c42c963df7abc7480980f747f0413d389
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted changes for qhmp*.[c,h]  stripped out
 CONFIG_SCHED_QHMP in drivers/base/cpu.c and include/linux/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:27 -07:00
Olav Haugan
4996dafe68 sched/core: Add API to set cluster d-state
Add new API to the scheduler to allow low power mode driver to inform
the scheduler about the d-state of a cluster. This can be leveraged by
the scheduler to make an informed decision about the cost of placing a task
on a cluster.

Change-Id: If0fe0fdba7acad1c2eb73654ebccfdb421225e62
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted fixes for qhmp_core.c and qhmp_core.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:26 -07:00
Joonwoo Park
b4627e0104 sched: take into account of governor's frequency max load
At present HMP scheduler packs tasks to busy CPU till the CPU's load is
100% to avoid waking up of idle CPU as much as possible.  Such aggressive
packing leads unintended CPU frequency raise as governor raises the busy
CPU's frequency when its load is more than configured frequency max load
which can be less than 100%.

Fix to take into account of governor's frequency max load and pack tasks
only when the CPU's projected load is less than max load to avoid
unnecessary frequency raise.

Change-Id: I4447e5e0c2fa5214ae7a9128f04fd7585ed0dcac
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:25 -07:00
Joonwoo Park
28f67e5a50 sched: set HMP scheduler's default initial task load to 100%
Set init_task_load to 100% to allow new tasks to wake up on the best
performance CPUs.

Change-Id: Ie762a3f629db554fb5cfa8c1d7b8b2391badf573
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:24 -07:00
Joonwoo Park
44af3b5e03 sched: add preference for prev and sibling CPU in HMP task placement
At present HMP task placement algorithm places wake-up task on any lowest
power cost CPU in the system even if the task's previous CPU is also one
of the lowest power cost CPU.  Placing task on the previous CPU can reduce
cache bouncing.

Add a bias towards the task's previous CPU and CPU in the same cache domain
with previous CPU.

Change-Id: Ieab3840432e277048058da76764b3a3f16e20c56
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:23 -07:00
Olav Haugan
8623286277 sched: Update task->on_rq when tasks are moving between runqueues
Task->on_rq has three states:
	0 - Task is not on runqueue (rq)
	1 (TASK_ON_RQ_QUEUED) - Task is on rq
	2 (TASK_ON_RQ_MIGRATING) - Task is on rq but in the
	process of being migrated to another rq

When a task is moving between rqs task->on_rq state should be
TASK_ON_RQ_MIGRATING

CRs-fixed: 884720
Change-Id: I1572aba00a0273d4ad5bc9a3dd60fb68e2f0b895
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:02:23 -07:00
Syed Rameez Mustafa
425e3f0cc4 sched: remove temporary demand fixups in fixup_busy_time()
On older kernel versions p->on_rq was a binary value that did not
allow distinguishing between enqueued and migrating tasks. As a result
fixup_busy_time would have to do temporary load adjustments to ensure
that update_history does not do incorrect demand adjustments for
migrating tasks. Since p->on_rq can now be used make a distinction
between migrating and enqueued tasks, there is no need to do these
temporary load calculations. Instead make sure update_history() only
does load adjustments on enqueued tasks.

Change-Id: I1f800ac61a045a66ab44b9219516c39aa08db087
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:22 -07:00
Syed Rameez Mustafa
ca42a1bec8 sched: add frequency zone awareness to the load balancer
Add zone awareness to the load balancer. Remove all earlier restrictions
that the load balancer had for inter cluster kicks and migration.

Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due
 to its return type change.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:21 -07:00
Syed Rameez Mustafa
87fe20de7e sched: Update the wakeup placement logic for fair and rt tasks
For the fair sched class, update the select_best_cpu() policy to do
power based placement. The hope is to minimize the voltage at which
the CPU runs.

While RT tasks already do power based placement, their placement
preference has to now take into account the power cost of all tasks
on a given CPU. Also remove the check for sched_boost since
sched_boost no longer intends to elevate all tasks to the highest
capacity cluster.

Change-Id: Ic6a7625c97d567254d93b94cec3174a91727cb87
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:20 -07:00
Syed Rameez Mustafa
d590f25153 sched: remove the notion of small tasks and small task packing
Task packing will now be determined solely on the basis of the
power cost of task placement. All tasks are eligible for packing.
Remove the notion of "small" tasks from the scheduler.

Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:19 -07:00
Syed Rameez Mustafa
f2ea07a155 sched: Rework energy aware scheduling
Energy aware core rotation is not compatible with the power
based task placement being introduced in subsequent patches.
Remove all existing EA based task placement/migration logic.
power_cost() is the only function remaining. This function has
been modified to return the total power cost associated with a
task on a given CPU taking existing load on that CPU into
account.

Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor
 conflict in features.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:18 -07:00
Joonwoo Park
d5bf122db1 sched: encourage idle load balance and discourage active load balance
Encourage IDLE and NEWLY_IDLE load balance by ignoring cache hotness and
discourage active load balance by increasing busy balancing failure
threshold.  Such changes are for idle CPUs to help out busy CPUs more
aggressively and reduce unnecessary active load balance within the
same CPU domain.

Change-Id: I22f6aba11932ccbb82a436c0532589c46f9148ed
[joonwoop@codeaurora.org: fixed conflict in need_active_balance() and
 can_migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:17 -07:00
Joonwoo Park
b5a9a7b1c7 sched: avoid stale cumulative_runnable_avg HMP statistics
When a new window starts for a task and the task is on a rq, scheduler
decreases rq's cumulative_runnable_avg momentarily, re-account task's
demand and increases rq's cumulative_runnable_avg with newly accounted
task's demand.  Therefore there is short time period that rq's
cumulative_runnable_avg is less than what it's supposed to be.
Meanwhile, there is chance that other CPU is in search of best CPU to place
a task and makes suboptimal decision with momentarily stale
cumulative_runnable_avg.

Fix such issue by adding or subtracting of delta between task's old
and new demand instead of decrementing and incrementing of entire task's
load.

Change-Id: I3c9329961e6f96e269fa13359e7d1c39c4973ff2
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:16 -07:00
Syed Rameez Mustafa
d109fbbf71 sched: Add load based placement for RT tasks
Currently RT tasks prefer to go to the lowest power CPU in the
system. This can end up causing contention on the lowest power
CPU. Instead ensure that RT tasks end up on the lowest power
cluster and the least loaded CPU within that cluster.

Change-Id: I363b3d43236924962c67d2fb5d3d2d09800cd994
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:15 -07:00
Syed Rameez Mustafa
17bb9bcd54 sched: Avoid running idle_balance() consecutively
With the introduction of "6dd123a sched: update ld_moved for active
balance from the load balancer" the function load_balance() returns a
non zero number of migrated tasks in anticipation of tasks that will
end up on that CPU via active migration. Unfortunately on kernel
versions 3.14 and beyond this ends up breaking pick_next_task_fair()
which assumes that the load balancer only returns non zero numbers for
tasks already migrated on to the destination CPU. A non zero number
then triggers a rerun of the pick_next_task_fair() logic so that it
can return one of the migrated tasks as the next task. When the load
balancer returns a non zero number for tasks that will be moved via
active migration, the rerun of pick_next_task_fair() finds the CPU to
still have no runnable tasks. This in turn causes a rerun of
idle_balance() and possibly migrating another task. Hence the
destination CPU can unintentionally end up pulling several tasks.

The intent of the change above is still necessary though to indicate
termination of load balance at higher scheduling domains when active
migration occurs. Achieve the same effect by using continue_balancing
instead of faking the number of pulled tasks. This way
pick_next_task_fair() stays happy and load balance stops at higher
scheduling domains.

Change-Id: Id223a3287e5d401e10fbc67316f8551303c7ff96
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:14 -07:00
Joonwoo Park
a509c84de7 sched: inline function scale_load_to_cpu()
Inline relatively small and frequently used function scale_load_to_cpu().

CRs-fixed: 849655
Change-Id: Id5f60595c394959d78e6da4cc4c18c338fec285b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:14 -07:00
Joonwoo Park
024505821e sched: look for least busy and fallback CPU only when it's needed
Function best_small_task_cpu() has bias on mostly idle CPUs and shallow
cstate CPUs.  Thus chance of needing to find the least busy or the least
power cost fallback CPU is quite rare typically.  At present, however,
the function finds those two CPUs always unnecessarily for most of time.

Optimize the function by amending it to look for the least busy CPU and
the least power cost fallback CPU only when those are in need.  This change
is solely for optimization and doesn't make functional changes.

CRs-fixed: 849655
Change-Id: I5eca11436e85b448142a7a7644f422c71eb25e8e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:13 -07:00
Joonwoo Park
82cf54e7d0 sched: iterate search CPUs starting from prev_cpu for optimization
Function best_small_task_cpu() looks for a mostly idle CPU and returns it
as the best CPU for a given small task.  At present, however, it cannot
break the CPU search loop when the function found a mostly idle CPU but
continues to iterate CPU search loop because the function needs to find
and return the given task's previous CPU as the best CPU to avoid
unnecessary task migration when the previous CPU is mostly idle.

Optimize the function best_small_task_cpu() to iterate search CPUs
starting from the given task's CPU so it can break the loop as soon as
mostly idle CPU found.  This optimization saves few hundreds ns spent by
the function and doesn't make any functional change.

CRs-fixed: 849655
Change-Id: I8c540963487f4102dac4d54e9f98e24a4a92a7b3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:12 -07:00
Syed Rameez Mustafa
de52c5fce5 sched: Optimize the select_best_cpu() "for" loop
select_best_cpu() is agnostic of the hardware topology. This means that
certain functions such as task_will_fit() and skip_cpu() are run
unnecessarily for every CPU in a cluster whereas they need to run only
once per cluster. Reduce the execution time of select_best_cpu() by
ensuring these functions run only once per cluster. The frequency domain
mask is used to identify CPUs that fall in the same cluster.

CRs-fixed: 849655
Change-Id: Id24208710a0fc6321e24d9a773f00be9312b75de
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: added continue after clearing search_cpus.
 fixed indentations with space.  fixed skip_cpu() to return true when rq ==
 task_rq.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:11 -07:00
Syed Rameez Mustafa
7ebc066cdb sched: Optimize select_best_cpu() to reduce execution time
select_best_cpu() is a crucial wakeup routine that determines the
time taken by the scheduler to wake up a task. Optimize this routine
to get higher performance. The following changes have been made as
part of the optimization listed in order of how they built on top of
one another:

* Several routines called by select_best_cpu() recalculate task load
  and CPU load even though these are already known quantities. For
  example mostly_idle_cpu_sync() calculates CPU load; task_will_fit()
  calculates task load before spill_threshold_crossed() recalculates
  both. Remove these redundant calculations by moving the task load
  and CPU load computations to the select_best_cpu() 'for' loop and
  passing to any functions that need the information.

* Rewrite best_small_task_cpu() to avoid the existing two pass
  approach. The two pass approach was only in place to find the
  minimum power cluster for small task placement. This information
  can easily be established by looking at runqueue capacities. The
  cluster with not the highest capacity constitutes the minimum power
  cluster. A special CPU mask is called the mpc_mask required to safeguard
  against undue side effects on SMP systems. Also terminate the function
  early if the previous CPU is found to be mostly_idle.

* Reorganize code to ensure that no unnecessary computations or
  variable assignments are done. For example there is no need to
  compute CPU load if that information does not end up getting used
  in any iteration of the 'for' loop.

* The tick logic for EA migrations unnecessarily checks for the power
  of all CPUs only for skip_cpu() to throw away the result later.
  Ensure that for EA we only check CPUs within the same cluster
  and avoid running select_best_cpu() whenever possible.

CRs-fixed: 849655
Change-Id: I4e722912fcf3fe4e365a826d4d92a4dd45c05ef3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed cpufreq_notifier_policy() to set mpc_mask.
 added a comment about prerequisite of lower_power_cpu_available().
 s/struct rq * rq/struct rq *rq/. s/TASK_NICE/task_nice/]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:10 -07:00
Matt Wagantall
61c44b5807 sched/debug: Add Kconfig to trigger panics on all 'BUG:' conditions
Introduce CONFIG_PANIC_ON_SCHED_BUG to trigger panics along with all
'BUG:' prints from the scheduler core, even potentially-recoverable
ones such as scheduling while atomic, sleeping from invalid context,
and detection of broken arch topologies.

Change-Id: I5d2f561614604357a2bc7900b047e53b3a0b7c6d
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
[joonwoop@codeaurora.org: fixed trivial merge conflict in
 lib/Kconfig.debug.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:09 -07:00
Joonwoo Park
b1fb594df9 sched: fix incorrect prev_runnable_sum accounting with long ISR run
At present, when IRQ handler spans multiple scheduler windows, HMP
scheduler resets the IRQ CPU's prev_runnable_sum with its current max
capacity under the assumption that there is no other possible contribution
to the CPU's prev_runnable_sum.  This isn't correct as another CPU can
migrate tasks to the IRQ CPU.

Furthermore such incorrectness can trigger BUG_ON() if the migrated task's
prev_window is larger than migrating CPU's current capacity in following
scenario.

1. ISR on the power efficient CPU has been running for multiple windows.
2. A task which has prev_window higher than IRQ CPU's current capacity
   migrated to the IRQ CPU.
3. Servicing IRQ is done and the IRQ CPU resets its prev_runnable_rum =
   CPU's current capacity.
4. Before window rollover, the task on the IRQ CPU migrates to other CPU
   and fixes up source and destnation CPUs' busy time.
5. BUG_ON(src_rq->prev_runnable_sum < 0) triggers as p->ravg.prev_window
   is larger than src_rq->prev_runnable_sum.

Fix such incorrectness by preserving prev_runnable_sum when ISR spans
multiple scheduler windows.  There is no need to reset it.

CRs-fixed: 828055
Change-Id: I1f95ece026493e49d3810f9c940ec5f698cc0b81
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:08 -07:00
Joonwoo Park
a4c475e43d sched: prevent task migration while governor queries CPUs' load
At present, governor retrieves each CPUs' load sequentially.  In this
way, there is chance of race between governor's CPU load query and task
migration that would result in reporting of lesser CPUs' load than actual.

For example,
CPU0 load = 30%.  CPU1 load = 50%.
Governor                               Load balancer
- sched_get_busy(cpu 0) = 30%.
                                       - A task 'p' migrated from CPU 1 to
                                         CPU 0.  p->ravg->prev_window = 50.
                                         Now CPU 0's load = 80%,
                                         CPU 1's load = 0%.
- sched_get_busy(cpu 1) = 0%
  50% of load from CPU 1 to 0 never
  accounted.

Fix such issues by introducing a new API sched_get_cpus_busy() which
makes for governor to be able to get set of CPUs' load.  The loads set
internally constructed with blocking load balancer to ensure migration
cannot occur in the meantime.

Change-Id: I4fa4dd1195eff26aa603829aca2054871521495e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:07 -07:00
Srivatsa Vaddagiri
eca78aaf84 sched: report loads greater than 100% only during load alert notifications
The busy time of CPUs is adjusted during task migrations. This can
result in reporting the load greater than 100% to the governor and
causes direct jumps to the higher frequencies during the intra cluster
migrations. Hence clip the load to 100% during the load reporting at
the end of the window. The load is not clipped for load alert notifications
which allows ramping up the frequency faster for inter cluster migrations
and heavy task wakeup scenarios.

Change-Id: I7347260aa476287ecfc706d4dd0877f4b75a1089
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:07 -07:00
Syed Rameez Mustafa
371435451a sched: turn off the TTWU_QUEUE feature
While the feature TTWU_QUEUE has the advantage of reducing cache
bouncing of runqueue locks, it has the side effect that runqueue
statistics are not updated until the remote CPU has a chance to
enqueue the task. Since there is no upper bound on the amount of
time it can take the remote CPU to enqueue the task, several
sequential wakeups can result in suboptimal task placement based
on the stale statistics. Turn off the feature as the cost of
sub-optimal placement is much higher than the cost of cache bouncing
spinlocks for msm based systems.

Change-Id: I0b85c0225237b2bc44f54934769f5e3750c0f3d6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:06 -07:00
Joonwoo Park
c459d15628 sched: avoid unnecessary HMP scheduler stat re-accounting
When sched_entity's runnable average changes, before and after, we
decrease and increase HMP scheduler's statistics of the sched_entity to
take into account of updated runnable average.  In that period, however,
other CPUs would see that the runnable average updating CPU's load as
less than actual.  This is suboptimal and can lead improper task placement
and load balance decision.

We can avoid such situation at least with window based load tracking as
sched_entity's load average which is for PELT won't affect to HMP
scheduler's load tracking statistics.  Thus fix to update HMP statistics
only when HMP scheduler uses PELT based load statistics.

Change-Id: I9eb615c248c79daab5d22cbb4a994f94be6a968d
[joonwoop@codeaurora.org: applied fix into __update_load_avg() instead of
 update_entity_load_avg().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:05 -07:00
Syed Rameez Mustafa
e9c6508168 sched/fair: Fix capacity and nr_run comparisons in can_migrate_task()
Kernel version 3.18 and beyond alter the definition of sgs->group_capacity
whereby it reflects the load a group is capable of taking. In previous
kernel versions the term used to refer to the number of effective CPUs
available. This change breaks the comparison of capacity with the number
of running tasks on a group. To fix this convert the capacity metric
before doing the comparison.

Change-Id: I3ebd941273edbcc903a611d9c883773172e86c8e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:04 -07:00
Joonwoo Park
e61ddbb14c Revert "sched: Use only partial wait time as task demand"
This reverts commit 0e2092e47488 ("sched: Use only partial wait time as
task demand") as it causes performance regression.

Change-Id: I3917858be98530807c479fc31eb76c0f22b4ea89
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:03 -07:00
Syed Rameez Mustafa
1e920c5995 sched/deadline: Add basic HMP extensions
Some HMP extensions have to be supported by all scheduling classes
irrespective of them using HMP task placement or not. Add these
basic extensions to make deadline scheduling work.

Also during the tick, if a deadline task gets throttled, its HMP
stats get decremented as part of the dequeue. However, the throttled
task does not update its on_rq flag causing HMP stats to be double
decremented when update_history() is called as part of a window rollover.
Avoid this by checking for throttled deadline tasks before subtracting
and adding the deadline tasks load from the rq cumulative runnable avg.

Change-Id: I9e2ed6675a730f2ec830f764f911e71c00a7d87a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:02 -07:00
Vikram Mulukutla
09417ad30e sched: Fix racy invocation of fixup_busy_time via move_queued_task
set_task_cpu uses fixup_busy_time to redistribute a task's load
information between source and destination runqueues. fixup_busy_time
assumes that both source and destination runqueue locks have been
acquired if the task is not being concurrently woken up. However
this is no longer true, since move_queued_task does not acquire the
destination CPU's runqueue lock due to optimizations brought in by
recent kernels.

Acquire both source and destination runqueue locks before invoking
set_task_cpu in move_queued_tasks.

Change-Id: I39fadf0508ad42e511db43428e52c8aa8bf9baf6
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in move_queued_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:01 -07:00
Pavankumar Kondeti
8d4dce6c80 sched: don't inflate the task load when the CPU max freq is restricted
When the CPU max freq is restricted and the CPU is running at the
max freq, the task load is inflated by max_possible_freq/max_freq
factor. This results in tasks migrating early to the better capacity
CPUs which makes things worse if the frequency restriction is due
to the thermal condition.

Change-Id: Ie0ea405d7005764a6fb852914e88cf97102c138a
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23 20:02:00 -07:00
Pavankumar Kondeti
c17d7d3c40 sched: auto adjust the upmigrate and downmigrate thresholds
The load scale factor of a CPU gets boosted when its max freq
is restricted. A task load at the same frequency is scaled higher
than normal under this scenario. This results in tasks migrating
early to the better capacity CPUs and their residency over there
also gets increased as their inflated load would be relatively
higher than than the downmigrate threshold.

Auto adjust the upmigrate and downmigrate thresholds by a factor
equal to  rq->max_possible_freq/rq->max_freq of a lower capacity CPU.
If the adjusted upmigrate threshold exceeds the window size, it is
clipped to the window size. If the adjusted downmigrate threshold
decreases the difference between the upmigrate and downmigrate, it is
clipped to a value such that the difference between the modified
and the original thresholds is same.

Change-Id: Ifa70ee5d4ca5fe02789093c7f070c77629907f04
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23 20:01:59 -07:00
Pavankumar Kondeti
37921ca6be sched: don't inherit initial task load from the parent
child task is not supposed to inherit initial task load attribute
from the parent. Reset the child's init_load_pct attribute during
fork.

Change-Id: I458b121f10f996fda364e97b51aaaf6c345c1dbb
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-03-23 20:01:58 -07:00
Olav Haugan
6e08a8c77a sched/fair: Add irq load awareness to the tick CPU selection logic
IRQ load is not taken into account when determining whether a task
should be migrated to a different CPU.  A task that runs for a long time
could get stuck on CPU with high IRQ load causing degraded performance.

Add irq load awareness to the tick CPU selection logic.

CRs-fixed: 809119
Change-Id: I7969f7dd947fb5d66fce0bedbc212bfb2d42c8c1
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:01:57 -07:00
Steve Muckle
652e8bc905 sched: disable IRQs in update_min_max_capacity
IRQs must be disabled while locking runqueues since an
interrupt may cause a runqueue lock to be acquired.

CRs-fixed: 828598
Change-Id: Id66f2e25ed067fc4af028482db8c3abd3d10c20f
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:56 -07:00
Syed Rameez Mustafa
38f3da47d7 sched: Use only partial wait time as task demand
The scheduler currently either considers a tasks entire wait time as
task demand or completely ignores wait time based on the tunable
sched_account_wait_time. Both approaches have their limitations,
however. The former artificially boosts tasks demand when it may not
actually be justified. With the latter, the scheduler runs the risk
of never being able to recognize true load (consider two CPU hogs on
a single little CPU). To achieve a compromise between these two
extremes, change the load tracking algorithm to only consider part of
a tasks wait time as its demand. The portion of wait time accounted
as demand is determined by each tasks percent load, i.e. a task that
waits for 10ms and has 60 % task load, only 6 ms of the wait will
contribute to task demand. This approach is more fair as the scheduler
now tries to determine how much of its wait time would a task actually
have been using the CPU if it had been executing. It ensures that tasks
with high demand continue to see most of the benefits of accounting
wait time as busy time, however, lower demand tasks don't experience a
disproportionately high boost to demand triggering unjustified big CPU
usage. Note that this new approach is only applicable to wait time
being considered as task demand and not wait time considered as CPU
busy time.

To achieve the above effect, ensure that anytime a task is waiting, its
runtime in every relevant window segment is appropriately adjusted using
its pct load.

Change-Id: I6a698d6cb1adeca49113c3499029b422daf7871f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:55 -07:00
Joonwoo Park
1cac3260d4 sched: fix race conditions where HMP tunables change
When multiple threads race to update HMP scheduler tunables, at present,
the tunables which require big/small task count fix-up can be updated
without fix-up and it can trigger BUG_ON().
This happens because sched_hmp_proc_update_handler() acquires rq locks and
does fix-up only when number of big/small tasks affecting tunables are
updated even though the function sched_hmp_proc_update_handler() calls
set_hmp_defaults() which re-calculates all sysctl input data at that
point.  Consequently a thread that is trying to update a tunable which does
not affect big/small task count can call set_hmp_defaults() and update
big/small task count affecting tunable without fix-up if there is another
thread and it just set fix-up needed sysctl value.

Example of problem scenario :
thread 0                               thread 1
Set sched_small_task – needs fix up.
                                       Set sched_init_task_load – no fix
                                       up needed.
proc_dointvec_minmax() completed
which means sysctl_sched_small_task has
new value.
                                       Call set_hmp_defaults() without
                                       lock/fixup. set_hmp_defaults() still
                                       updates sched_small_tasks with new
                                       sysctl_sched_small_task value by
                                       thread 0.

Fix such issue by embracing proc update handler with already existing
policy mutex.

CRs-fixed: 812443
Change-Id: I7aa4c0efc1ca56e28dc0513480aca3264786d4f7
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:54 -07:00
Joonwoo Park
81280a6963 sched: check HMP scheduler tunables validity
Check tunables validity to take valid values only.

CRs-fixed: 812443
Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:53 -07:00
Syed Rameez Mustafa
f0ddb64b10 sched: Update max_capacity when an entire cluster is hotplugged
When an entire cluster is hotplugged, the scheduler's notion of
max_capacity can get outdated. This introduces the following
inefficiencies in behavior:

* task_will_fit() does not return true on all tasks. Consequently
  all big tasks go through fallback CPU selection logic skipping
  C-state and power checks in select_best_cpu().

* During boost, migration_needed() return true unnecessarily
  causing an avoidable rerun of select_best_cpu().

* An unnecessary kick is sent to all little CPUs when boost is set.

* An opportunity for early bailout from nohz_kick_needed() is lost.

Start handling CPUFREQ_REMOVE_POLICY in the policy notifier callback
which indicates the last CPU in a cluster being hotplugged out. Also
modify update_min_max_capacity() to only iterate through online CPUs
instead of possible CPUs. While we can't guarantee the integrity of
the cpu_online_mask in the notifier callback, the scheduler will fix
up all state soon after any changes to the online mask.

The change does have one side effect; early termination from the
notifier callback when min_max_freq or max_possible_freq remain
unchanged is no longer possible. This is because when the last CPU
in a cluster is hot removed, only max_capacity is updated without
affecting min_max_freq or max_possible_freq. Therefore, when the
first CPU in the same cluster gets hot added at a later point
max_capacity must once again be recomputed despite there being no
change in min_max_freq or max_possible_freq.

Change-Id: I9a1256b5c2cd6fcddd85b069faf5e2ace177e122
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:52 -07:00