Commit graph

21903 commits

Author SHA1 Message Date
Steve Muckle
c4a015173e sched: balance power inefficient CPUs with one task
Normally the load balancer does not pay attention to CPUs with one
task since it is not possible to subdivide that load any further
to achieve better balance. With power aware scheduling however it
may be desirable to relocate that one task if the CPU it is currently
executing on is less power efficient than other CPUs in the system.

Change-Id: Idf3f5e22b88048184323513f0052827b884526b6
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: group_capacity_factor changed to
 group_no_capacity.  group_classify is called by update_sg_lb_stats() too.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:25 -07:00
Steve Muckle
5b7076b4e5 sched: check for power inefficient task placement in tick
Although tasks are routed to the most power-efficient CPUs during
task wakeup, a CPU-bound task will not go through this decision point.
Load balancing can help if it is modified to dislodge a single task
from an inefficient CPU. The situation can be further improved if
during the tick, the task placement is checked to see if it is
optimal.

This sort of checking is already being done to ensure proper
task placement in heterogneous CPU topologies, so checking for
power efficient task placement fits pretty well.

Change-Id: I71e56d406d314702bc26dee1438c0eeda7699027
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:25 -07:00
Steve Muckle
9931863046 sched: do nohz load balancing in order of power efficiency
The nohz load balancer CPU does load balancing on behalf of all
idle tickless CPUs. In the interest of power efficiency though, we
should do load balancing on the most power efficient idle tickless
CPU first, and then work our way towards the least power efficient
idle tickless CPU. This will help load find its way to the most
power efficient CPUs in the system.

Since when selecting the CPU to balance next it is unknown what
task load would be pulled, a frequency must be assumed in order
to do a comparison of CPU power consumption. The maximum
freqeuncy supported by all CPUs is used for this.

Change-Id: I96c7f4300fde2c677c068dc10fc0e57f763eb9b2
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in nohz_idle_balance().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:24 -07:00
Steve Muckle
a31debf1c9 sched: run idle_balance() on most power-efficient CPU
When a CPU goes idle, it checks to see whether it can pull any load
from other busy CPUs. The CPU going idle may not be the most
power-efficient idle CPU in the system however.

This patch causes the CPU going idle to check to see whether
there is a more power-efficient idle CPU within the same
lowest sched domain. If there is, then it runs the load balancer
on behalf of that CPU instead of itself.

Since it is unknown at this point what task load would be pulled,
a frequency must be assumed for this in order to do a comparison
of CPU power consumption. The maximum freqeuncy supported by all
CPUs is used for this.

Change-Id: I5eedddc1f7d10df58ecd358f37dba563eeecf4fc
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: fixed minor conflict around comment.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:23 -07:00
Steve Muckle
1e19d2f48f sched: add hook for platform-specific CPU power information
To enable power-aware scheduling, provide a hook/infrastructure
for platforms to communicate CPU power requirements for each
supported CPU frequency. This information is then used to estimate
the cost of running a task on a given CPU.

Currently, an assumption is made that the task will be running
by itself on the CPU. Given the current policy tries to spread
tasks as much as possible this assumption should not be too
far off.

Change-Id: I19f1fa760a0d43222d2880f8aec0508c468b39bb
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: return rq->capacity as power cost with
 sched_use_pelt=1.  se.avg.runnable_avg_period is deprecated and
 power_cost() will be changed by subsequent change anyway./
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:22 -07:00
Steve Muckle
ec7d8cc076 sched: add power aware scheduling sysctl
The sched_enable_power_aware sysctl will control whether
or not scheduling decisions are influenced by the power
consumption of individual CPUs.

Change-Id: I312f892cf76a3fccc4ecc8aa6703908b205267f0
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:21 -07:00
Srivatsa Vaddagiri
6e8842f8be sched: Extend update_task_ravg() to accept wallclock as argument
This will make it easier to account interrupt time on a cpu,
introduced in a subsequent patch.

Change-Id: I0e1fb5255c280ca374fd255e7fc19d5de9f8b045
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 19:59:20 -07:00
Srivatsa Vaddagiri
188a6bc174 sched: add sched_get_busy, sched_set_window APIs
sched_get_busy() returns the busy time of a cpu during the most
recent completed window.
sched_set_window() will set window size and aligns windows across
all CPUs.

Change-Id: Ic53e27f43fd4600109b7b6db979e1c52c7aca103
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:19 -07:00
Steve Muckle
cebda4b7a3 sched: window-stats: adjust RQ curr, prev sums on task migration
Adjust cpu's busy time in its recent and previous window upon task
migration. This would enable scheduler to provide better inputs to
cpufreq governor on a cpu's busy time in a given window.

Change-Id: Idec2ca459382e9f46d882da3af53148412d631c6
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:18 -07:00
Steve Muckle
bb0b8e9859 sched: window-stats: Add aggregated runqueue windowed stats
Add counters per-cpu to track its busy time in the latest window and
one previous to that. This would be needed to track accurate busy time
per-cpu that accounts for migrations. Basically once a task migrates,
its execution time in current window is migrated as well to new cpu.

The idle task's runtime is not accounted since it should not count
towards runqueue busy time.

Change-Id: I4014dd686f95dbbfaa4274269bc36ed716573421
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:17 -07:00
Srivatsa Vaddagiri
9427c55650 sched: window-stats: add prev_window counter per-task
Currently windows where tasks had no execution time are ignored.
However accurate accounting of cpu busy time that factors in migration
would need to know actual utilization of a task in the window previous
to the latest one. This would help scheduler guide cpufreq governor on
busy time per-cpu that is not subject to migration induced errors.

Change-Id: I5841b1732c83e83d69002139de3bdb93333ce347
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:17 -07:00
Srivatsa Vaddagiri
33e7100103 sched: window-stats: synchronize windows across cpus
Synchronizing windows across cpus for task load measurements
simplifies cpu busy time accounting during migrations. For task
migrations, its usage in current window can be carried over to its new
cpu. This lets cpufreq governor see a correct picture of cpu busy time
that is not affected by migrations.

This patch lines up windows across cpus. One of the cpu, sync_cpu,
serves as a reference for all others. During bootup sync_cpu would
initialize its window_start (from its sched_clock()). Other cpus will
synchronize their window_start in reference to sync_cpu. This patch
assumes synchronous sched_clock() across cpus and may need some change
to address architectures which do not provide such synchronized
sched_clock().

Change-Id: I13381389a72f5f9f85cc2446401d493a55c78ab7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:16 -07:00
Srivatsa Vaddagiri
490b1d23cc sched: window-stats: Do not account wait time
Task load statistics are used for two purposes : cpu frequency
management and placement. Task's load can't be accurately judged by
its wait time. For ex: a task could have waited for 10ms and when given
opportunity to run, could just execute for 1ms. Accounting for 11ms as
task's demand could be over-stating its needs in this example. For
now, remove wait time from task demand and instead let task load be
derived from its actual exec time. This may need to become a tunable
feature.

Change-Id: I47e94c444c6b44c3b0810347287d50f1ee685038
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:15 -07:00
Srivatsa Vaddagiri
8319c3a580 sched: window-stats: update during migration and earlier at wakeup
During migrations accounting needs to be done in set_task_cpu() to
subtract the task activity from the source CPU and add it to the
destination CPU. This accounting will require that the task's window
based load statistics be up to date.

Unfortunately, the window-based statistics cannot always be updated in
set_task_cpu() because they are already being updated in the wakeup
path. We cannot update the statistics solely in the wakeup path
because not all wakeups are migrations. Those non-migrating wakeups
will not enter set_task_cpu().

To ensure the window-based stats are always updated for both wakeup
migrations and regular migrations, they are updated earlier in the
wakeup path, and also updated in set_task_cpu if the task is already
runnable (this ensures it is not a wakeup migration, but a regular
migration).

Change-Id: Ib246028741d0be9bb38ce93679d6e6ba25b10756
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:14 -07:00
Srivatsa Vaddagiri
29cf4585f5 sched: move definition of update_task_ravg()
set_task_cpu() will need to call update_task_ravg(). Move up
definition to make it easy.

Change-Id: I95c1c9e009bd1805f28708e8d6fd3b7b2166410e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:13 -07:00
Srivatsa Vaddagiri
005dd9b9c3 sched: Switch to windows based load stats by default
Set window-based load stats to be the default mechanism under which
tasks get classified (as big/small) and which will drive frequency
demand for tasks.  sched_ravg_window kernel parameter can be used to
change this default setting to use PELT (per-entity load tracking)
scheme instead.

Change-Id: I626110daa0bb2b53172bedea829d31877255ceaa
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:12 -07:00
Srivatsa Vaddagiri
fb9ab2a720 sched: Provide tunable to switch between PELT and window-based stats
Provide a runtime tunable to switch between using PELT-based load
stats and window-based load stats. This will be needed for runtime
analysis of the two load tracking schemes.

Change-Id: I018f6a90b49844bf2c4e5666912621d87acc7217
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:11 -07:00
Srivatsa Vaddagiri
bf863e333f sched: Provide scaled load information for tasks in /proc
Extend "sched" file in /proc for every task to provide information on
scaled load statistics and percentage-scaled based load (load_avg) for
a task. This will be valuable debug aid.

Change-Id: I6ee0394b409c77c7f79f5b9ac560da03dc879758
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:10 -07:00
Srivatsa Vaddagiri
1bea4eae33 sched: Add additional ftrace events
This patch adds two ftrace events:

sched_task_load -> records information of a task, such as scaled demand
sched_cpu_load  -> records information of a cpu, such as nr_running,
		   nr_big_tasks etc

This will be useful to debug HMP related task placement decisions by
scheduler.

Change-Id: If91587149bcd9bed157b5d2bfdecc3c3bf6652ff
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:09 -07:00
Srivatsa Vaddagiri
47d2c533b2 sched: Extend /proc/sched_debug with additional information
Provide additional information in /proc/sched_debug for every cpu.
This will be a valuable debug aid.

Change-Id: If22ee530e880cd21719242be7bc2c41308ad4186
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:08 -07:00
Srivatsa Vaddagiri
7379f7f110 sched: Tighten controls for tasks spillover to idle cluster
Several conditions can cause an idle cluster to pick up load from a
busy cluster. One such condition is when busy cluster has number of
tasks that exceeds its capacity (or number of cpus). This patch
extends that condition to consider small and big tasks on a cluster.
Too many "small" tasks should not cause them to spill over to another
idle cluster. Like-wise presence of big tasks should be considered
by a cluster to pick up load from another another cluster with lower
capacity.

Change-Id: I0545bf2989c37217d84ed18756c6f5c8946d5ae5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minior conflict in fair.c.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:07 -07:00
Srivatsa Vaddagiri
e1448aaf48 sched: Track number of big and small tasks on a cpu
This patch adds 'nr_big_tasks' and 'nr_small_tasks' per-cpu counters
that tracks number of big and small tasks on a cpu respectively. This
will be used in load balance decisions introduced in a subsequent
patch.

Change-Id: Ia174904140f81dd6d1946286889a50be3f16ea83
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fix conflicts in fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:07 -07:00
Srivatsa Vaddagiri
06a5fb422d sched: Handle cpu-bound tasks stuck on wrong cpu
CPU-bound tasks that don't sleep for long intervals can stay stuck on
the wrong cpu, as the selection of "ideal" cpu for tasks largely
happens during task wakeup time. This patch adds a check in the
scheduler tick for task/cpu mismatch (big task on little cpu OR
little task on big cpu) and forces migration of such tasks to their
ideal cpu (via select_best_cpu()).

Change-Id: Icac3485b6aa4b558c4ed9df23c2e81fb8f4bb9d9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:06 -07:00
Srivatsa Vaddagiri
45acc2457b sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:05 -07:00
Srivatsa Vaddagiri
2a17b80545 sched: Send NOHZ kick to idle cpu in same cluster
A busy cpu will kick (via IPI) one of the idle cpus in tickless state
to run load balance and help move tasks off itself. The cpu chosen to
receive kick is simply the "first" idle cpu found in
nohz.idle_cpus_mask. This could cause unnecessary wakeups of a
cluster. A better choice would be to look for an idle cpu that is in
the same cluster as busy cpu, which should minimize cluster wakeups.

Change-Id: Ia63038d7c34b416b53c8feef3c3b31dab5200e42
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict about return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:04 -07:00
Srivatsa Vaddagiri
7cec8569e3 sched: Basic task placement support for HMP systems
HMP systems have cpus with different power and performance
characteristics. Some cpus could offer better power at cost of lower
performance while other cpus could offer better performance at cost of
higher power. As a result, bandwidth consumed by a task to do some
"fixed" amount of work could vary across cpus.

Optimal task placement on HMP would involve placing a task on a cpu
where it can meet its performance goals at lowest power cost. Since
kernel has little to no awareness of performance goals of
applications, we guestimate whether task is meeting its performance
goals or not by looking at its cpu bandwidth consumption. High
bandwidth consumption could imply that task's performance can improve
by running on cpus with better capacity/performance-characterisitcs.

This patch makes the basic changes to support HMP. It provides a
configurable threshold and any task consuming bandwidth in excess of
threshold will be placed on a cpu with better capacity.

Change-Id: I3fd98edd430f73342fbef06411e8b2d1cf2f56fa
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict about members of p->se which
 are not available anymore.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:03 -07:00
Srivatsa Vaddagiri
0b8de33b59 sched: Use rq->efficiency in scaling load stats
Extend task load scaling function to account for cpu efficiency
factor. Task load is scaled in reference to "most" efficient cpu.

Change-Id: I7bf829211a6e1293076e8ba0f93b4f6abcf20d92
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:02 -07:00
Srivatsa Vaddagiri
f1018e8b36 sched: Introduce efficiency, load_scale_factor and capacity
Efficiency reflects instructions per cycle capability of a cpu.

load_scale_factor reflects magnification factor that is applied for
task load when estimating bandwidth it will consume on a cpu. It
accounts for the fact that task load is scaled in reference to "best"
cpu that has best efficiency factor and also best possible max_freq.
Note that there may be no single CPU in the system that has both the
best efficiency and best possible max_freq, but that is still the
combination that all task load in the system is scaled against.

capacity reflects max_freq and efficiency metric of a cpu. It is
defined such that the "least" performing cpu (one with lowest
efficiency factor and max_freq) gets capacity of 1024. Again, there
may not be a CPU in the system that has both the lowest efficiency
and lowest max_freq. This is still the combination that is assigned
a capacity of 1024 however, other CPU capacities are relative to this.

Change-Id: I4a853f1f0f90020721d2a4ee8b10db3d226b287c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:01 -07:00
Srivatsa Vaddagiri
551f83f5d6 sched: Add CONFIG_SCHED_HMP Kconfig option
Add a compile-time flag to enable or disable scheduler features for
HMP (heterogenous multi-processor) systems. Main feature deals with
optimizing task placement for best power/performance tradeoff.

Also extend features currently dependent on CONFIG_SCHED_FREQ_INPUT to
be enabled for CONFIG_HMP as well.

Change-Id: I03b3942709a80cc19f7b934a8089e1d84c14d72d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor ifdefry conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:00 -07:00
Srivatsa Vaddagiri
025dedac36 sched: Add scaled task load statistics
Scheduler guided frequency selection as well as task placement on
heterogeneous systems require scaled task load statistics. This patch
adds a 'runnable_avg_sum_scaled' metric per task that is a scaled
derivative of 'runnable_avg_sum'. Load is scaled in reference to
"best" cpu, i.e one with best possible max_freq

Change-Id: Ie8ae450d0b02753e9927fb769aee734c6d33190f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: incoporated with change 9d89c257df
 (" sched/fair: Rewrite runnable load and utilization average
 tracking").  Used container_of() to get sched_entity.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:59 -07:00
Srivatsa Vaddagiri
77fe8dd14d sched: Introduce CONFIG_SCHED_FREQ_INPUT
Introduce a compile time flag to enable scheduler guidance of
frequency selection. This flag is also used to turn on or off
window-based load stats feature.

Having a compile time flag will let some platforms avoid any
overhead that may be present with this scheduler feature.

Change-Id: Id8dec9839f90dcac82f58ef7e2bd0ccd0b6bd16c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict around
 sysctl_timer_migration.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:59 -07:00
Srivatsa Vaddagiri
a25a5c1c30 sched: window-based load stats improvements
Following cleanups and improvements are made to window-based load
stats feature:

* Add sysctl to pick max, avg or most recent samples as task's
  demand.

* Fix overflow possibility in calculation of sum for average policy.

* Use unscaled statistics when a task is running on a CPU which is
thermally throttled.

Change-Id: I8293565ca0c2a785dadf8adb6c67f579a445ed29
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:58:58 -07:00
Srivatsa Vaddagiri
a7103955a7 sched: Add min_max_freq and rq->max_possible_freq
rq->max_possible_freq represents the maximum frequency a cpu is
capable of attaining, while rq->max_freq represents the maximum
frequency a cpu can attain at a given instant. rq->max_freq includes
constraints imposed by user or thermal driver.
rq->max_freq <= rq->max_possible_freq.

max_possible_freq is derived as max(rq->max_possible_freq) and
represents the "best" cpu that can attain best possible frequency.

min_max_freq is derived as min(rq->max_possible_freq). For homogeneous
systems, max_possible_freq and min_max_freq will be same, while they
could be different on heterogeneous systems.

Change-Id: Iec485fde35cfd33f55ebf2c2dce4864faa2083c5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict around max_possible_freq.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:57 -07:00
Steve Muckle
1559954841 sched: move task load based functions
The task load based functions will need to make use of LOAD_AVG_MAX
in a subsequent patch, so move them below the definition of that
macro.

Change-Id: I02f18ba069b81033e611f8f8bba6dccd7cd81252
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 19:58:56 -07:00
Steve Muckle
f8da269c3c sched: fix race between try_to_wake_up() and move_task()
Until a task's state has been seen as interruptible/uninterruptible
and it is no longer on_cpu, it is possible that the task may move
to another CPU (load balancing may cause this). Here is an example
where the race condition results in incorrect operation:

- cpu 0 calls put_prev_task on task A, task A's state is TASK_RUNNING
- cpu 0 runs task B, which attempts to wake up A
- cpu 0 begins try_to_wake_up(), recording src_cpu for task A as cpu 0
- cpu 1 then pulls task A (perhaps due to idle balance)
- cpu 1 runs task A, which then sleeps, becoming INTERRUPTIBLE
- cpu 0 continues in try_to_wake_up(), thinking task A's previous
  cpu is 0, where it is actually 1
- if select_task_rq returns cpu 0, task A will be woken up on cpu 0
  without properly updating its cpu to 0 in set_task_cpu()

CRs-Fixed: 665958
Change-Id: Icee004cb320bd8edfc772d9f74e670a9d4978a99
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 19:58:55 -07:00
Srivatsa Vaddagiri
c413707037 sched: Skip load update for idle task
Load statistics for idle tasks is not useful in any manner. Skip load
update for such idle tasks.

CRs-Fixed: 665706
Change-Id: If3a908bad7fbb42dcb3d0a1d073a3750cf32fcf9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:58:54 -07:00
Srivatsa Vaddagiri
3967da2dd1 sched: Window-based load stat improvements
Some tasks can have a sporadic load pattern such that they can suddenly
start running for longer intervals of time after running for shorter
durations. To recognize such sharp increase in tasks' demands, max
between the average of 5 window load samples and the most recent sample
is chosen as the task demand.

Make the window size (sched_ravg_window) configurable at boot up
time. To prevent users from setting inappropriate values for window
size, min and max limits are defined. As 'ravg' struct tracks load for
both real-time and non real-time tasks it is moved out of sched_entity
struct.

In order to prevent changing function signatures for move_tasks() and
move_one_task() per-cpu variables are defined to track the total load
moved. In case multiple tasks are selected to migrate in one load
balance operation, loads > 100 could be sent through migration notifiers.
Prevent this scenario by setting mnd.load to 100 in such cases.

Define wrapper functions to compute cpu demands for tasks and to change
rq->cumulative_runnable_avg.

Change-Id: I9abfbf3b5fe23ae615a6acd3db9580cfdeb515b4
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18 and squash "dcf7256 sched:
			window-stats: Fix overflow bug" into this patch.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:53 -07:00
Rohit Gupta
e3fe80da05 sched: Call the notify_on_migrate notifier chain for wakeups as well
Add a change to send notify_on_migrate hints on wakeups of
foreground tasks from scheduler if their load is above
wakeup_load_thresholds (default value is 60).
These hints can be used to choose an appropriate CPU frequency
corresponding to the load of the task being woken up.

By default sched_wakeup_load_threshold is set to 60 and therefore
wakeup hints are sent out for those tasks whose loads are higher
that value. This might cause unnecessary wakeup boosts to happen
when load based syncing is turned ON for cpu-boost.
Disable the wake up hints by setting the sched_wakeup_load_threshold
to a value higher than 100 so that wakeup boost doesnt happen unless
it is explicitly turned ON from adb shell.

Change-Id: Ieca413c1a8bd2b14a15a7591e8e15d22925c42ca
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org: Squash "a26fcce sched: Disable wakeup
			hints for foreground tasks by default" into
			this patch and update commit text.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:52 -07:00
Rohit Gupta
48056d2399 cpufreq: cpu-boost: Introduce scheduler assisted load based syncs
Previously, on getting a migration notification cpu-boost changed
the scaling min of the destination frequency to match that of the
source frequency or sync_threshold whichever was minimum.

If the scheduler migration notification is extended with task load
(cpu demand) information, the cpu boost driver can use this load to
compute a suitable frequency for the migrating task. The required
frequency for the task is calculated by taking the load percentage
of the max frequency and no sync is performed if the load is less
than a particular value (migration_load_threshold).This change is
beneficial for both perf and power as demand of a task is taken into
consideration while making cpufreq decisions and unnecessary syncs
for lightweight tasks are avoided.

The task load information provided by scheduler comes from a
window-based load collection mechanism which also normalizes the
load collected by the scheduler to the max possible frequency
across all CPUs.

Change-Id: Id2ba91cc4139c90602557f9b3801fb06b3c38992
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:46 -07:00
Srivatsa Vaddagiri
74463329e4 sched: window-based load stats for tasks
Provide a metric per task that specifies how cpu bound a task is. Task
execution is monitored over several time windows and the fraction of
the window for which task was found to be executing or wanting to run
is recorded as task's demand. Windows over which task was sleeping are
ignored. We track last 5 recent windows for every task and the maximum
demand seen in any of the previous 5 windows (where task had some
activity) drives freq demand for every task.

A per-cpu metric (rq->cumulative_runnable_avg) is also provided which
is an aggregation of cpu demand of all tasks currently enqueued on it.
rq->cumulative_runnable_avg will be useful to know if cpu frequency
will need to be changed to match task demand.

Change-Id: Ib83207b9ba8683cd3304ee8a2290695c34f08fe2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in ttwu_do_wakeup() to
 incorporate with changed trace_sched_wakeup() location.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:39 -07:00
Srivatsa Vaddagiri
97ae7bae2c sched: Make scheduler aware of cpu frequency state
Capacity of a cpu (how much performance it can deliver) is partly
determined by its frequency (P) state, both current frequency as well
as max frequency it can reach.  Knowing frequency state of cpus will
help scheduler optimize various functions such as tracking every
task's cpu demand and placing tasks on various cpus.

This patch has scheduler registering for cpufreq notifications to
become aware of cpu's frequency state. Subsequent patches will make
use of derived information for various purposes, such as task's scaled
load (cpu demand) accounting and task placement.

Change-Id: I376dffa1e7f3f47d0496cd7e6ef8b5642ab79016
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/core.c.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:38 -07:00
Matt Wagantall
5f10bd75eb sched/debug: Make sysrq prints of sched debug data optional
Calls to sysrq_sched_debug_show() can yield rather verbose output
which contributes to log spew and, under heavy load, may increase
the chances of a watchdog bark.

Make printing of this data optional with the introduction of a
new Kconfig, CONFIG_SYSRQ_SCHED_DEBUG.

Change-Id: I5f54d901d0dea403109f7ac33b8881d967a899ed
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2016-03-23 19:58:37 -07:00
Steve Muckle
521a572def tracing/sched: add load balancer tracepoint
When doing performance analysis it can be useful to see exactly
what is going on with the load balancer - when it runs and why
exactly it may not be redistributing load.

This additional tracepoint will show the idle context of the
load balance operation (idle, not idle, newly idle), various
values from the load balancing operation, the final result,
and the new balance interval.

Change-Id: I1538c411c5f9d17d7d37d84ead6210756be2d884
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org: Initialize variables in load_balance() to
				avoid crashes and inaccurate tracing.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in
 include/trace/events/sched.h.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:36 -07:00
Steve Muckle
387dcd0663 sched: change WARN_ON_ONCE to printk_deferred() in try_to_wake_up_local()
try_to_wake_up_local() is called with the rq lock held. Printing to
console in this context can result in a deadlock if klogd needs to
be woken up. Print to the kernel log buffer via printk_sched()
instead which avoids the wakeup.

Change-Id: Ia07baea3cb7e0b158545207fdbbb866203256d3c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:35 -07:00
Arun Bharadwaj
9f6eb26ae8 tracing/sched: Track per-cpu rt and non-rt cpu_load.
Add a new tracepoint trace_sched_enq_deq_task to track
per-cpu rt and non-rt cpu_load during task enqueue
and dequeue.

This is useful to visualize and compare the load on
different cpus and also to understand how balanced
the load is at any point of time.

Note: We only print cpu_load[0] because we only care about
the most recent load history for tracking load balancer
effectiveness.

Change-Id: I46f0bb84e81652099ed5edf8c2686c70c8b8330c
Signed-off-by: Arun Bharadwaj <abharadw@codeaurora.org>
2016-03-23 19:58:34 -07:00
Srivatsa Vaddagiri
0ec4cf2484 sched: re-calculate a cpu's next_balance point upon sched domain changes
A cpus next_balance point could be stale when its being attached to a
sched domain hierarchy. That would lead to undesirable delay in cpu
doing a load balance and hence potentially affect scheduling
latencies for tasks. Fix that by initializing cpu's next_balance point
when its being attached to a sched domain hierarchy.

Change-Id: I855cff8da5ca28d278596c3bb0163b839d4704bc
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Modify commit text to reflect dropped patches]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:34 -07:00
Steve Muckle
63249df6b2 sched: provide per cpu-cgroup option to notify on migrations
On systems where CPUs may run asynchronously, task migrations
between CPUs running at grossly different speeds can cause
problems.

This change provides a mechanism to notify a subsystem
in the kernel if a task in a particular cgroup migrates to a
different CPU. Other subsystems (such as cpufreq) may then
register for this notifier to take appropriate action when
such a task is migrated.

The cgroup attribute to set for this behavior is
"notify_on_migrate" .

Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org: Use new cgroup APIs, fix 64-bit
			compilation issues and resolve some merge
			conflicts. Also squash "2bd8075 sched:
			remove migration notification from RT class"
			into this patch.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Incorporated with new __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:33 -07:00
Srivatsa Vaddagiri
82f6fb4f5c sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.

Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Change-Id: I1f23680a64872f8ce0f451ac4bcae28e8967918f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Squash "c24fb502 sched: fix reference to
				wrong cfs_rq" into this patch]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:32 -07:00
Steve Muckle
2c04026b17 kernel: reduce sleep duration in wait_task_inactive
Sleeping for an entire tick adds unnecessary latency to
hotplugging a cpu (cpu_up).

Change-Id: Iab323a79f4048bc9101ecfd368e0f275827ed4ab
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:31 -07:00
Steve Muckle
03ab01a318 sched: add sysctl for controlling task migrations on wake
The PF_WAKE_UP_IDLE per-task flag made it impossible to enable
the old behavior of SD_SHARE_PKG_RESOURCES, where every task
migrates to an idle CPU on wakeup.

The sched_wake_to_idle sysctl value, when made nonzero, will cause
all tasks to migrate to an idle CPU if one is available when the
task is woken up. This is regardless of how PF_WAKE_UP_IDLE is
configured for tasks in the system. Similar to PF_WAKE_UP_IDLE,
the SD_SHARE_PKG_RESOURCES scheduler domain flag must be enabled
for the sysctl value to have an effect.

Change-Id: I23bed846d26502c7aed600bfcf1c13053a7e5f61
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
(cherry picked from commit 9d5b38dc0025d19df5b756b16024b4269e73f282)
2016-03-23 19:58:30 -07:00