Commit graph

564557 commits

Author SHA1 Message Date
Srivatsa Vaddagiri
47d2c533b2 sched: Extend /proc/sched_debug with additional information
Provide additional information in /proc/sched_debug for every cpu.
This will be a valuable debug aid.

Change-Id: If22ee530e880cd21719242be7bc2c41308ad4186
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:08 -07:00
Srivatsa Vaddagiri
7379f7f110 sched: Tighten controls for tasks spillover to idle cluster
Several conditions can cause an idle cluster to pick up load from a
busy cluster. One such condition is when busy cluster has number of
tasks that exceeds its capacity (or number of cpus). This patch
extends that condition to consider small and big tasks on a cluster.
Too many "small" tasks should not cause them to spill over to another
idle cluster. Like-wise presence of big tasks should be considered
by a cluster to pick up load from another another cluster with lower
capacity.

Change-Id: I0545bf2989c37217d84ed18756c6f5c8946d5ae5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minior conflict in fair.c.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:07 -07:00
Srivatsa Vaddagiri
e1448aaf48 sched: Track number of big and small tasks on a cpu
This patch adds 'nr_big_tasks' and 'nr_small_tasks' per-cpu counters
that tracks number of big and small tasks on a cpu respectively. This
will be used in load balance decisions introduced in a subsequent
patch.

Change-Id: Ia174904140f81dd6d1946286889a50be3f16ea83
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fix conflicts in fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:07 -07:00
Srivatsa Vaddagiri
06a5fb422d sched: Handle cpu-bound tasks stuck on wrong cpu
CPU-bound tasks that don't sleep for long intervals can stay stuck on
the wrong cpu, as the selection of "ideal" cpu for tasks largely
happens during task wakeup time. This patch adds a check in the
scheduler tick for task/cpu mismatch (big task on little cpu OR
little task on big cpu) and forces migration of such tasks to their
ideal cpu (via select_best_cpu()).

Change-Id: Icac3485b6aa4b558c4ed9df23c2e81fb8f4bb9d9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:06 -07:00
Srivatsa Vaddagiri
45acc2457b sched: Extend active balance to accept 'push_task' argument
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.

Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:05 -07:00
Srivatsa Vaddagiri
2a17b80545 sched: Send NOHZ kick to idle cpu in same cluster
A busy cpu will kick (via IPI) one of the idle cpus in tickless state
to run load balance and help move tasks off itself. The cpu chosen to
receive kick is simply the "first" idle cpu found in
nohz.idle_cpus_mask. This could cause unnecessary wakeups of a
cluster. A better choice would be to look for an idle cpu that is in
the same cluster as busy cpu, which should minimize cluster wakeups.

Change-Id: Ia63038d7c34b416b53c8feef3c3b31dab5200e42
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict about return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:04 -07:00
Srivatsa Vaddagiri
7cec8569e3 sched: Basic task placement support for HMP systems
HMP systems have cpus with different power and performance
characteristics. Some cpus could offer better power at cost of lower
performance while other cpus could offer better performance at cost of
higher power. As a result, bandwidth consumed by a task to do some
"fixed" amount of work could vary across cpus.

Optimal task placement on HMP would involve placing a task on a cpu
where it can meet its performance goals at lowest power cost. Since
kernel has little to no awareness of performance goals of
applications, we guestimate whether task is meeting its performance
goals or not by looking at its cpu bandwidth consumption. High
bandwidth consumption could imply that task's performance can improve
by running on cpus with better capacity/performance-characterisitcs.

This patch makes the basic changes to support HMP. It provides a
configurable threshold and any task consuming bandwidth in excess of
threshold will be placed on a cpu with better capacity.

Change-Id: I3fd98edd430f73342fbef06411e8b2d1cf2f56fa
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict about members of p->se which
 are not available anymore.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:03 -07:00
Srivatsa Vaddagiri
0b8de33b59 sched: Use rq->efficiency in scaling load stats
Extend task load scaling function to account for cpu efficiency
factor. Task load is scaled in reference to "most" efficient cpu.

Change-Id: I7bf829211a6e1293076e8ba0f93b4f6abcf20d92
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:59:02 -07:00
Srivatsa Vaddagiri
f1018e8b36 sched: Introduce efficiency, load_scale_factor and capacity
Efficiency reflects instructions per cycle capability of a cpu.

load_scale_factor reflects magnification factor that is applied for
task load when estimating bandwidth it will consume on a cpu. It
accounts for the fact that task load is scaled in reference to "best"
cpu that has best efficiency factor and also best possible max_freq.
Note that there may be no single CPU in the system that has both the
best efficiency and best possible max_freq, but that is still the
combination that all task load in the system is scaled against.

capacity reflects max_freq and efficiency metric of a cpu. It is
defined such that the "least" performing cpu (one with lowest
efficiency factor and max_freq) gets capacity of 1024. Again, there
may not be a CPU in the system that has both the lowest efficiency
and lowest max_freq. This is still the combination that is assigned
a capacity of 1024 however, other CPU capacities are relative to this.

Change-Id: I4a853f1f0f90020721d2a4ee8b10db3d226b287c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:59:01 -07:00
Srivatsa Vaddagiri
551f83f5d6 sched: Add CONFIG_SCHED_HMP Kconfig option
Add a compile-time flag to enable or disable scheduler features for
HMP (heterogenous multi-processor) systems. Main feature deals with
optimizing task placement for best power/performance tradeoff.

Also extend features currently dependent on CONFIG_SCHED_FREQ_INPUT to
be enabled for CONFIG_HMP as well.

Change-Id: I03b3942709a80cc19f7b934a8089e1d84c14d72d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor ifdefry conflict.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:59:00 -07:00
Srivatsa Vaddagiri
025dedac36 sched: Add scaled task load statistics
Scheduler guided frequency selection as well as task placement on
heterogeneous systems require scaled task load statistics. This patch
adds a 'runnable_avg_sum_scaled' metric per task that is a scaled
derivative of 'runnable_avg_sum'. Load is scaled in reference to
"best" cpu, i.e one with best possible max_freq

Change-Id: Ie8ae450d0b02753e9927fb769aee734c6d33190f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: incoporated with change 9d89c257df
 (" sched/fair: Rewrite runnable load and utilization average
 tracking").  Used container_of() to get sched_entity.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:59 -07:00
Srivatsa Vaddagiri
77fe8dd14d sched: Introduce CONFIG_SCHED_FREQ_INPUT
Introduce a compile time flag to enable scheduler guidance of
frequency selection. This flag is also used to turn on or off
window-based load stats feature.

Having a compile time flag will let some platforms avoid any
overhead that may be present with this scheduler feature.

Change-Id: Id8dec9839f90dcac82f58ef7e2bd0ccd0b6bd16c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict around
 sysctl_timer_migration.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:59 -07:00
Srivatsa Vaddagiri
a25a5c1c30 sched: window-based load stats improvements
Following cleanups and improvements are made to window-based load
stats feature:

* Add sysctl to pick max, avg or most recent samples as task's
  demand.

* Fix overflow possibility in calculation of sum for average policy.

* Use unscaled statistics when a task is running on a CPU which is
thermally throttled.

Change-Id: I8293565ca0c2a785dadf8adb6c67f579a445ed29
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:58:58 -07:00
Srivatsa Vaddagiri
a7103955a7 sched: Add min_max_freq and rq->max_possible_freq
rq->max_possible_freq represents the maximum frequency a cpu is
capable of attaining, while rq->max_freq represents the maximum
frequency a cpu can attain at a given instant. rq->max_freq includes
constraints imposed by user or thermal driver.
rq->max_freq <= rq->max_possible_freq.

max_possible_freq is derived as max(rq->max_possible_freq) and
represents the "best" cpu that can attain best possible frequency.

min_max_freq is derived as min(rq->max_possible_freq). For homogeneous
systems, max_possible_freq and min_max_freq will be same, while they
could be different on heterogeneous systems.

Change-Id: Iec485fde35cfd33f55ebf2c2dce4864faa2083c5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict around max_possible_freq.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:57 -07:00
Steve Muckle
1559954841 sched: move task load based functions
The task load based functions will need to make use of LOAD_AVG_MAX
in a subsequent patch, so move them below the definition of that
macro.

Change-Id: I02f18ba069b81033e611f8f8bba6dccd7cd81252
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 19:58:56 -07:00
Steve Muckle
f8da269c3c sched: fix race between try_to_wake_up() and move_task()
Until a task's state has been seen as interruptible/uninterruptible
and it is no longer on_cpu, it is possible that the task may move
to another CPU (load balancing may cause this). Here is an example
where the race condition results in incorrect operation:

- cpu 0 calls put_prev_task on task A, task A's state is TASK_RUNNING
- cpu 0 runs task B, which attempts to wake up A
- cpu 0 begins try_to_wake_up(), recording src_cpu for task A as cpu 0
- cpu 1 then pulls task A (perhaps due to idle balance)
- cpu 1 runs task A, which then sleeps, becoming INTERRUPTIBLE
- cpu 0 continues in try_to_wake_up(), thinking task A's previous
  cpu is 0, where it is actually 1
- if select_task_rq returns cpu 0, task A will be woken up on cpu 0
  without properly updating its cpu to 0 in set_task_cpu()

CRs-Fixed: 665958
Change-Id: Icee004cb320bd8edfc772d9f74e670a9d4978a99
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 19:58:55 -07:00
Srivatsa Vaddagiri
c413707037 sched: Skip load update for idle task
Load statistics for idle tasks is not useful in any manner. Skip load
update for such idle tasks.

CRs-Fixed: 665706
Change-Id: If3a908bad7fbb42dcb3d0a1d073a3750cf32fcf9
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:58:54 -07:00
Srivatsa Vaddagiri
3967da2dd1 sched: Window-based load stat improvements
Some tasks can have a sporadic load pattern such that they can suddenly
start running for longer intervals of time after running for shorter
durations. To recognize such sharp increase in tasks' demands, max
between the average of 5 window load samples and the most recent sample
is chosen as the task demand.

Make the window size (sched_ravg_window) configurable at boot up
time. To prevent users from setting inappropriate values for window
size, min and max limits are defined. As 'ravg' struct tracks load for
both real-time and non real-time tasks it is moved out of sched_entity
struct.

In order to prevent changing function signatures for move_tasks() and
move_one_task() per-cpu variables are defined to track the total load
moved. In case multiple tasks are selected to migrate in one load
balance operation, loads > 100 could be sent through migration notifiers.
Prevent this scenario by setting mnd.load to 100 in such cases.

Define wrapper functions to compute cpu demands for tasks and to change
rq->cumulative_runnable_avg.

Change-Id: I9abfbf3b5fe23ae615a6acd3db9580cfdeb515b4
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18 and squash "dcf7256 sched:
			window-stats: Fix overflow bug" into this patch.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:53 -07:00
Rohit Gupta
e3fe80da05 sched: Call the notify_on_migrate notifier chain for wakeups as well
Add a change to send notify_on_migrate hints on wakeups of
foreground tasks from scheduler if their load is above
wakeup_load_thresholds (default value is 60).
These hints can be used to choose an appropriate CPU frequency
corresponding to the load of the task being woken up.

By default sched_wakeup_load_threshold is set to 60 and therefore
wakeup hints are sent out for those tasks whose loads are higher
that value. This might cause unnecessary wakeup boosts to happen
when load based syncing is turned ON for cpu-boost.
Disable the wake up hints by setting the sched_wakeup_load_threshold
to a value higher than 100 so that wakeup boost doesnt happen unless
it is explicitly turned ON from adb shell.

Change-Id: Ieca413c1a8bd2b14a15a7591e8e15d22925c42ca
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org: Squash "a26fcce sched: Disable wakeup
			hints for foreground tasks by default" into
			this patch and update commit text.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:52 -07:00
Rohit Gupta
624f7a0869 cpufreq: cpu-boost: Use one work to remove input boost for all CPUs
Currently each CPU queues up its own work for removing input boost.
This is not really required as boost removal for all the CPUs can
be done at the same time. So this patch uses a single work to
remove input boost for all the CPUs and updates the policy for
the online ones.

Change-Id: I37c809f2f155548b1d8c1b9aa7626c8852b3acc6
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-03-23 19:58:52 -07:00
Junjie Wu
1bf8600f7c cpufreq: cpu-boost: Support separate input_boost_freq for different CPUs
Different types of CPUs could have different frequency to satisfy same
input workload. Add support for using different input_boost_freq on
different CPUs.

input_boost_freq now either takes a single number which applies to all
CPUs, or cpuid:freq pairs separated by space for different CPUs.

Change-Id: I20506a9fbdb4d532d94168bbd61744595bebc8e5
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
2016-03-23 19:58:51 -07:00
Rohit Gupta
e1d864dc94 cpufreq: cpu-boost: Make the code 64 bit compatible
As the pointers' size change to 64 bits in the 64 bit kernel, the
int declarations for them from the legacy code give compilation
warnings which get flagged as errors.
Replace int casting of pointers with long to get rid of these
warnings.

Change-Id: I96b6cf342c2bf110220eac0addfb72fbdd969c6e
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-03-23 19:58:50 -07:00
Swetha Chikkaboraiah
d705a44178 cpufreq: cpu-boost: Use interruptible wait to not affect load average
Using the function wait_event in cpu_boost puts the
process enter to 'D' state which contribute to the
high load average. This change will put the process
boost_sync in the 'S' state (interruptible sleep)

Change-Id: Ie121adbe1fac1d2862ac5342bb97c7c926f7d7a8
CRs-Fixed: 655484
Signed-off-by: Swetha Chikkaboraiah <schikk@codeaurora.org>
Signed-off-by: Raghavendra Ambadas <rambad@codeaurora.org>
2016-03-23 19:58:49 -07:00
Girish S Ghongdemath
e6d5539f2e cpufreq: cpu-boost: Consider only task load to decide on sync frequency
Currently we take the maximum between the source CPU frequency and
the calculated frequency based on the migrating task load to decide on
the frequency to sync the destination CPU to. This was done to handle
short bursts in workloads of tasks which migrated immediately after
causing a ramp up on the source CPU. Since their load history wasn't
high enough, the destination CPU synced to a lower frequency which
wasn't sufficient for the spike in workload.

But as such cases are rare, taking the higher of source and calculated
frequency can lead to destination CPU unnecessarily spending a
considerable amount of time at higher frequencies which in turn can
hurt power.

With this change we make sure only the migrating task load is used
to calculate the sync frequency for destination CPU when load based
syncing is enabled.

Change-Id: Ib1489d256c42ea7712aad2179aebffc87c549836
Signed-off-by: Girish Ghongdemath <girishsg@codeaurora.org>
2016-03-23 19:58:48 -07:00
Rohit Gupta
f117dea08f cpufreq: cpu-boost: Handle wakeup hints received for foreground tasks
A previous change modifies the notification conditions in the scheduler
to call the notifier chain even on foreground_thread wakeups for tasks
having load more than a threshold value.

If load_based_syncs is turned OFF then we do not need to perform
cpu boost for wakeup hints for foreground tasks from scheduler

Change-Id: Ifbdd3eccac5c9892dfc3a3c3edbfc0df766478ed
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-03-23 19:58:47 -07:00
Rohit Gupta
48056d2399 cpufreq: cpu-boost: Introduce scheduler assisted load based syncs
Previously, on getting a migration notification cpu-boost changed
the scaling min of the destination frequency to match that of the
source frequency or sync_threshold whichever was minimum.

If the scheduler migration notification is extended with task load
(cpu demand) information, the cpu boost driver can use this load to
compute a suitable frequency for the migrating task. The required
frequency for the task is calculated by taking the load percentage
of the max frequency and no sync is performed if the load is less
than a particular value (migration_load_threshold).This change is
beneficial for both perf and power as demand of a task is taken into
consideration while making cpufreq decisions and unnecessary syncs
for lightweight tasks are avoided.

The task load information provided by scheduler comes from a
window-based load collection mechanism which also normalizes the
load collected by the scheduler to the max possible frequency
across all CPUs.

Change-Id: Id2ba91cc4139c90602557f9b3801fb06b3c38992
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:46 -07:00
Patrick Cain
158eff54ea cpufreq: cpu-boost: Re-issue boosts above minimum frequency
Frequency boosts where the source CPU frequency is greater than CPU's
minimum frequency should always go through regardless of the destination
CPU's current frequency. This fixes a performance issue where the governor
lowers the CPU frequency shortly after a thread is migrated to it because
the boost wasn't re-issued.

Change-Id: I449545a688d84b0a6e834f5a51dcb499caa84d29
Signed-off-by: Patrick Cain <pcain@codeaurora.org>
2016-03-23 19:58:46 -07:00
Saravana Kannan
2eb27275fc cpufreq: cpu-boost: Don't register for cpufreq notifiers too early
The cpufreq notifiers should be registered only after all the data
structures used in the notifier callbacks have been initialized. So, move
the cpufreq notifier registration to a later point in the init function.

Change-Id: I043ab5bc0ebb98164c40549fe151a8d801c8c186
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
2016-03-23 19:58:45 -07:00
Saravana Kannan
2f97ec6d6e cpufreq: cpu-boost: Fix deadlock in wake_up of sync threads
If wake_up() is called on the current task on a CPU, the call will wait
until the current task is switched out before it wakes it up again and
returns.

The sync notifier for a CPU always runs on that CPU.

These two together can result in a deadlock if the sync notifier on CPU A
tries to wake up the sync thread of CPU A as it goes to sleep (is the
current task). A previous commit fixed this by adding a check to the sync
notifier to not wake up the sync thread of CPU A if it's the current task.

But this is still not sufficient to prevent deadlocks.

Sync thread of CPU A could be the current task on CPU B and sync thread of
CPU B could be the current task on CPU A.  At this point, if sync notifier
of CPU A and B try to wake up the sync threads of CPU A and B, it will
result in CPU A waiting for the current task in CPU B to get switched out
and CPU B waiting for the current task in CPU A to get switched out.  This
will result in a deadlock.

Prevent this scenario from happening by pinning the sync threads of each
CPU to run on that CPU. By doing this, we guarantee that sync notifiers
will only try to wake up sync threads running on that CPU. The fix added by
"cpufreq: cpu-boost: Resolve deadlock when waking up sync thread" ensures a
deadlock doesn't happen when a sync notifier tries to wake up a sync thread
running on that CPU.

Change-Id: I864e545529722a23886dd5a82f66089155d2d193
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
2016-03-23 19:58:44 -07:00
Saravana Kannan
6f13a3351a cpufreq: cpu-boost: Fix queue_delayed_work_on() race with hotplug
Calling queue_delayed_work_on() on a CPU that's in the process of getting
hotplugged out can result in that CPU infinitely looping in
msm_pm_wait_cpu_shutdown(). If queue_delayed_work_on() is called after the
CPU is hotplugged out, it could wake up the CPU without going through the
hotplug path and cause instability. To avoid this, make sure the CPU is and
stays online while queuing a work on it.

Change-Id: I1b4aae3db803e476b1a7676d08f495c1f38bb154
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
2016-03-23 19:58:43 -07:00
Srivatsa Vaddagiri
df74672faf cpufreq: cpu-boost: Resolve deadlock when waking up sync thread
CPU boost driver receives notification from scheduler when threads
migrate towards a cpu and in turn wakes up a sync thread associated
with that cpu to handle any required frequency transitions. The wakeup
call however can lead to a deadlock inside scheduler under some
circumstance. The deadlock is seen when sync thread is the only thread
running on a cpu and goes to sleep (say by calling wait_event() ->
schedule()). Midway through this sleep (schedule()) call, while cpu is
still running in context of sync thread, scheduler attempts a load
balance (realizing that cpu is about to become idle) which can result
in tasks being migrated towards the cpu going idle. This will cause
migration notification to be issued and in turn a wakeup on sync
thread. The wakeup call however gets stuck in below while() loop
inside scheduler:

try_to_wake_up(struct task_struct *p, ...)
{

        /*
         * If the owning (remote) cpu is still in the middle of
	 * schedule() with this task as prev, wait until its done
	 * referencing the task.
         */
	while (p->on_cpu)
		cpu_relax();

}

A possible fix could be to teach try_to_wake_up() about this
special case. Another fix, implemented in this patch and that helps
minimize scheduler changes, is to have cpu boost driver not issue a
wakeup under this special circumstance, which was found to occur very
infrequently.

Change-Id: I92bc68a22d51595a208673fe2a1eedfa97004f9e
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 19:58:42 -07:00
Rohit Gupta
e80ff9a87c cpufreq: Add Input Boost feature to the cpu-boost driver
On incoming input events boost the frequency of all online cpus
for at least input_boost_ms ms. This is accomplished by changing
the policy->min of all the online cpus to input_boost_freq.

Change-Id: Idb0ab75d68ae4ceff259cbbaaec1a9bb3bc871d3
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-03-23 19:58:41 -07:00
Rohit Gupta
43cc939d93 cpufreq: Add a sync limit to cpu-boost
Perform frequency synchronization only when source CPU's frequency
is less than sync_threshold, else sync to the sync_threshold.

Change-Id: I544c414568d4e015b80ce5891dd215275bac95da
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
2016-03-23 19:58:41 -07:00
Saravana Kannan
2014424f4a cpufreq: cpu-boost: Add cpu-boost driver
When certain bursty and important events take place, it might take a while
for the current cpufreq governor to notice the new load and react to it.
That would result in poor user experience. To alleviate this, the cpu-boost
driver boosts the frequency of a CPU for a short duration to maintain good
user experience while the governor catches up.

Specifically, this commit deals with ensuring that when "important" tasks
migrate from a fast CPU to a slow CPU, the frequency of the slow CPU is
boosted to be at least as high as the fast CPU for a short duration.

Since this driver enforces the boost by hooking into standard cpufreq
ADJUST notifiers, it has several advantages:
- More portable across kernel versions where the cpufreq internals might
  have been rewritten.
- Governor agnostic and hence works with multiple governors like
  conservative, ondemand, interactive, etc.
- Does not affect the sampling period/logic of existing governors.
- Can have the boost period adjusted independent of governor sampling
  period.

Change-Id: Ibd814a20043d0aba64ee7637a4a79b9ffa1b0991
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in drivers/cpufreq/Kconfig.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:40 -07:00
Srivatsa Vaddagiri
74463329e4 sched: window-based load stats for tasks
Provide a metric per task that specifies how cpu bound a task is. Task
execution is monitored over several time windows and the fraction of
the window for which task was found to be executing or wanting to run
is recorded as task's demand. Windows over which task was sleeping are
ignored. We track last 5 recent windows for every task and the maximum
demand seen in any of the previous 5 windows (where task had some
activity) drives freq demand for every task.

A per-cpu metric (rq->cumulative_runnable_avg) is also provided which
is an aggregation of cpu demand of all tasks currently enqueued on it.
rq->cumulative_runnable_avg will be useful to know if cpu frequency
will need to be changed to match task demand.

Change-Id: Ib83207b9ba8683cd3304ee8a2290695c34f08fe2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in ttwu_do_wakeup() to
 incorporate with changed trace_sched_wakeup() location.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:39 -07:00
Srivatsa Vaddagiri
97ae7bae2c sched: Make scheduler aware of cpu frequency state
Capacity of a cpu (how much performance it can deliver) is partly
determined by its frequency (P) state, both current frequency as well
as max frequency it can reach.  Knowing frequency state of cpus will
help scheduler optimize various functions such as tracking every
task's cpu demand and placing tasks on various cpus.

This patch has scheduler registering for cpufreq notifications to
become aware of cpu's frequency state. Subsequent patches will make
use of derived information for various purposes, such as task's scaled
load (cpu demand) accounting and task placement.

Change-Id: I376dffa1e7f3f47d0496cd7e6ef8b5642ab79016
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/core.c.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:38 -07:00
Matt Wagantall
5f10bd75eb sched/debug: Make sysrq prints of sched debug data optional
Calls to sysrq_sched_debug_show() can yield rather verbose output
which contributes to log spew and, under heavy load, may increase
the chances of a watchdog bark.

Make printing of this data optional with the introduction of a
new Kconfig, CONFIG_SYSRQ_SCHED_DEBUG.

Change-Id: I5f54d901d0dea403109f7ac33b8881d967a899ed
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
2016-03-23 19:58:37 -07:00
Steve Muckle
521a572def tracing/sched: add load balancer tracepoint
When doing performance analysis it can be useful to see exactly
what is going on with the load balancer - when it runs and why
exactly it may not be redistributing load.

This additional tracepoint will show the idle context of the
load balance operation (idle, not idle, newly idle), various
values from the load balancing operation, the final result,
and the new balance interval.

Change-Id: I1538c411c5f9d17d7d37d84ead6210756be2d884
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org: Initialize variables in load_balance() to
				avoid crashes and inaccurate tracing.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in
 include/trace/events/sched.h.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:36 -07:00
Steve Muckle
387dcd0663 sched: change WARN_ON_ONCE to printk_deferred() in try_to_wake_up_local()
try_to_wake_up_local() is called with the rq lock held. Printing to
console in this context can result in a deadlock if klogd needs to
be woken up. Print to the kernel log buffer via printk_sched()
instead which avoids the wakeup.

Change-Id: Ia07baea3cb7e0b158545207fdbbb866203256d3c
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:35 -07:00
Arun Bharadwaj
9f6eb26ae8 tracing/sched: Track per-cpu rt and non-rt cpu_load.
Add a new tracepoint trace_sched_enq_deq_task to track
per-cpu rt and non-rt cpu_load during task enqueue
and dequeue.

This is useful to visualize and compare the load on
different cpus and also to understand how balanced
the load is at any point of time.

Note: We only print cpu_load[0] because we only care about
the most recent load history for tracking load balancer
effectiveness.

Change-Id: I46f0bb84e81652099ed5edf8c2686c70c8b8330c
Signed-off-by: Arun Bharadwaj <abharadw@codeaurora.org>
2016-03-23 19:58:34 -07:00
Srivatsa Vaddagiri
0ec4cf2484 sched: re-calculate a cpu's next_balance point upon sched domain changes
A cpus next_balance point could be stale when its being attached to a
sched domain hierarchy. That would lead to undesirable delay in cpu
doing a load balance and hence potentially affect scheduling
latencies for tasks. Fix that by initializing cpu's next_balance point
when its being attached to a sched domain hierarchy.

Change-Id: I855cff8da5ca28d278596c3bb0163b839d4704bc
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Modify commit text to reflect dropped patches]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:34 -07:00
Steve Muckle
63249df6b2 sched: provide per cpu-cgroup option to notify on migrations
On systems where CPUs may run asynchronously, task migrations
between CPUs running at grossly different speeds can cause
problems.

This change provides a mechanism to notify a subsystem
in the kernel if a task in a particular cgroup migrates to a
different CPU. Other subsystems (such as cpufreq) may then
register for this notifier to take appropriate action when
such a task is migrated.

The cgroup attribute to set for this behavior is
"notify_on_migrate" .

Change-Id: Ie1868249e53ef901b89c837fdc33b0ad0c0a4590
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org: Use new cgroup APIs, fix 64-bit
			compilation issues and resolve some merge
			conflicts. Also squash "2bd8075 sched:
			remove migration notification from RT class"
			into this patch.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Incorporated with new __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:33 -07:00
Srivatsa Vaddagiri
82f6fb4f5c sched: Fix SCHED_HRTICK bug leading to late preemption of tasks
SCHED_HRTICK feature is useful to preempt SCHED_FAIR tasks on-the-dot
(just when they would have exceeded their ideal_runtime). It makes use
of a a per-cpu hrtimer resource and hence alarming that hrtimer should
be based on total SCHED_FAIR tasks a cpu has across its various cfs_rqs,
rather than being based on number of tasks in a particular cfs_rq (as
implemented currently). As a result, with current code, its possible for
a running task (which is the sole task in its cfs_rq) to be preempted
much after its ideal_runtime has elapsed, resulting in increased latency
for tasks in other cfs_rq on same cpu.

Fix this by alarming sched hrtimer based on total number of SCHED_FAIR
tasks a CPU has across its various cfs_rqs.

Change-Id: I1f23680a64872f8ce0f451ac4bcae28e8967918f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Squash "c24fb502 sched: fix reference to
				wrong cfs_rq" into this patch]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:32 -07:00
Steve Muckle
2c04026b17 kernel: reduce sleep duration in wait_task_inactive
Sleeping for an entire tick adds unnecessary latency to
hotplugging a cpu (cpu_up).

Change-Id: Iab323a79f4048bc9101ecfd368e0f275827ed4ab
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:31 -07:00
Steve Muckle
03ab01a318 sched: add sysctl for controlling task migrations on wake
The PF_WAKE_UP_IDLE per-task flag made it impossible to enable
the old behavior of SD_SHARE_PKG_RESOURCES, where every task
migrates to an idle CPU on wakeup.

The sched_wake_to_idle sysctl value, when made nonzero, will cause
all tasks to migrate to an idle CPU if one is available when the
task is woken up. This is regardless of how PF_WAKE_UP_IDLE is
configured for tasks in the system. Similar to PF_WAKE_UP_IDLE,
the SD_SHARE_PKG_RESOURCES scheduler domain flag must be enabled
for the sysctl value to have an effect.

Change-Id: I23bed846d26502c7aed600bfcf1c13053a7e5f61
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
(cherry picked from commit 9d5b38dc0025d19df5b756b16024b4269e73f282)
2016-03-23 19:58:30 -07:00
Matt Wagantall
cc06d4a91d sched/rt: Add Kconfig option to enable panicking for RT throttling
This may be useful for detecting and debugging RT throttling issues.

Change-Id: I5807a897d11997d76421c1fcaa2918aad988c6c9
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in lib/Kconfig.debug]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:29 -07:00
Matt Wagantall
841af4dbae sched/rt: print RT tasks when RT throttling is activated
Existing debug prints do not provide any clues about which tasks
may have triggered RT throttling. Print the names and PIDs of
all tasks on the throttled rt_rq to help narrow down the source
of the problem.

Change-Id: I180534c8a647254ed38e89d0c981a8f8bccd741c
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 19:58:28 -07:00
Steve Muckle
74b3b06c52 sched: add PF_WAKE_UP_IDLE
Certain workloads may benefit from the SD_SHARE_PKG_RESOURCES behavior
of waking their tasks up on idle CPUs. The feature has too much of a
negative impact on other workloads however to apply globally. The
PF_WAKE_UP_IDLE flag tells the scheduler to wake up tasks that have this
flag set, or tasks woken by tasks with this flag set, on an idle CPU
if one is available.

Change-Id: I20b28faf35029f9395e9d9f5ddd57ce2de795039
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict around set_wake_up_idle() in
 include/linux/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 19:58:27 -07:00
Srivatsa Vaddagiri
8da8122d5f sched: Make the scheduler aware of C-state for cpus
C-state represents a power-state of a cpu. A cpu could have one or
more C-states associated with it. C-state transitions are based on
various factors (expected sleep time for example). "Deeper" C-states
implies longer wakeup latencies.

Scheduler needs to know wakeup latency associated with various C-states.
Having this information allows the scheduler to make better decisions
during task placement. For example:

- Prefer an idle cpu that is in the least shallow C-state
- Avoid waking up small tasks on a idle cpu unless it is in the least
  shallow C-state

This patch introduces APIs in the scheduler that can be used by the
architecture specific power-management driver to inform the scheduler
about C-states for cpus.

Change-Id: I39c5ae6dbace4f8bd96e88f75cd2d72620436dd1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
2016-03-23 19:58:27 -07:00
Sreelakshmi Gownipalli
fc116784df diag: Add snap shot of diag driver
Add snap shot of diag driver

Signed-off-by: Sreelakshmi Gownipalli <sgownipa@codeaurora.org>
2016-03-23 19:58:26 -07:00