Commit graph

21901 commits

Author SHA1 Message Date
Syed Rameez Mustafa
b55f87849b sched: Ensure attempting load balance when HMP active balance flags are set
find_busiest_group() can end up returning a NULL group due to load based
checks even though there are tasks that can be migrated to higher capacity
CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible
(LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load
balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set.
Since sched boost also falls under the same category club it into the same
generic condition.

Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:51 -07:00
Joonwoo Park
b40bf941f6 sched: add scheduling latency tracking procfs node
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the
worst scheduling latency.  It provides easier way to identify maximum
scheduling latency seen across the CPUs.

Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:50 -07:00
Joonwoo Park
8f90803a45 sched: warn/panic upon excessive scheduling latency
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and
/proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the
cases that tasks are runnable but not scheduled more than configured time.

This helps to find out unacceptably high scheduling latency more easily.

Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:49 -07:00
Joonwoo Park
fa8dd7068a sched/core: Fix incorrect wait time and wait count statistics
At present scheduler resets task's wait start timestamp when the task
migrates to another rq.  This misleads scheduler itself into reporting
less wait time than actual by omitting time spent for waiting prior to
migration and also more wait count than actual by counting migration as
wait end event which can be seen by trace or /proc/<pid>/sched with
CONFIG_SCHEDSTATS=y.

Carry forward migrating task's wait time prior to migration and
don't count migration as a wait end event to fix such statistics error.

In order to determine whether task is migrating mark task->on_rq with
TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration.

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ohaugan@codeaurora.org
Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[joonwoop@codeaurora.org: fixed minor conflict in detach_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>

Change-Id: I2d7f7d9895815430ad61383e62d28d889cce66c3
2016-03-23 20:01:48 -07:00
Syed Rameez Mustafa
6832b1d70e sched: Update cur_freq in the cpufreq policy notifier callback
At boot, the cpufreq framework sends transition notifiers before
sending out the policy notifier. Since the scheduler relies on the
policy notifier to build up the frequency domain masks, when the
initial set of transition notifiers are sent, the scheduler has no
frequency domains. As a result the scheduler fails to update the
cur_freq information. Update cur_freq as part of the policy notifier
so that the scheduler always has the current frequency information.

Change-Id: I7bd2958dfeb064dd20b9ccebafd372436484e5d6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:47 -07:00
Joonwoo Park
3fe87bc057 sched: avoid CPUs with high irq activity for non-small tasks
The irq-aware scheduler is to achieve better performance by avoiding task
placement to the CPUs which have high irq activity.  However current
scheduler places tasks to the CPUs which are loaded by irq activity
preferably as opposed to what it is meant to be when the task is non-small.
This is suboptimal for both power and performance.
Fix task placement algorithm to avoid CPUs with significant irq activities.

Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:46 -07:00
Joonwoo Park
ce35afd096 sched: actively migrate big tasks on power CPU to idle performance CPU
When performance CPU runs idle or newly idle load balancer to pull a
task on power efficient CPU, the load balancer always fails and enters
idle mode if the big task on the power efficient CPU is running.  This is
suboptimal when the running task on the power efficient CPU doesn't fit
on the power efficient CPU as it's quite possible that the big task will
sustain on the power efficient CPU until it's preempted while there is
a performance CPU sitting idle.

Revise load balancer algorithm to actively migrate big tasks on power
efficient CPU to performance CPU when performance CPU runs idle or newly
idle load balancer.

Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in group_classify().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:45 -07:00
Srivatsa Vaddagiri
73b7708de7 sched: Add cgroup-based criteria for upmigration
It may be desirable to discourage upmigration of tasks belonging to
some cgroups. Add a per-cgroup flag (upmigrate_discourage) that
discourages upmigration of tasks of a cgroup. Tasks of the cgroup are
allowed to upmigrate only under overcommitted scenario.

Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Use new cgroup APIs]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:44 -07:00
Joonwoo Park
a5cb71df22 sched: avoid running idle_balance() on behalf of wrong CPU
With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most
power efficient idle CPU among the CPUs in its sched domain level under the
condition that the substitute idle CPU should be limited to a CPU which has
the same capacity with original idle CPU.
It is found that at present idle_balance() spans all the CPUs in its sched
domain and run idle balancer on behalf of any CPU within the domain which
could be all the CPUs in the system which consequently makes idle balancer
on a performance CPU always runs on behalf of a power efficient idle CPU.
This would cause for idle performance CPUs to fail to pull tasks from power
efficient CPUs always when there is only an online performance CPU.

Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to
run idle balancre on behalf of more power efficient CPU but still has the
same capacity with original CPU to fix such issue.

Change-Id: I0575290c24f28db011d9353915186e64df7e57fe
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:43 -07:00
Srivatsa Vaddagiri
c41a54cb8d sched: Keep track of average nr_big_tasks
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.

Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:42 -07:00
Srivatsa Vaddagiri
44d892787e sched: Fix bug in average nr_running and nr_iowait calculation
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.

* sched_update_nr_prod() needs to consider that nr_running count can
  change by more than 1 when CFS_BANDWIDTH feature is used

* sched_get_nr_running_avg() needs to sum up nr_iowait count across
  all cpus, rather than just one

* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
  as a result of which it could use curr_time which is behind a cpu's
  'last_time' value. That would lead to erroneous calculation of
  average nr_running or nr_iowait.

While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.

Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:41 -07:00
Syed Rameez Mustafa
b3f9e5ac26 sched: Avoid pulling all tasks from a CPU during load balance
When running load balance, the destination CPU checks the number
of running tasks on the busiest CPU without holding the busiest
CPUs runqueue lock. This opens the load balancer to a race whereby
a third CPU running load balance at the same time; having found the
same busiest group and queue, may have already pulled one of the
waiting tasks from the busiest CPU. Under scenarios where the source
CPU is running the idle task and only a single task remains waiting on
the busiest runqueue (nr_running = 1), the destination CPU will end
up pulling the only enqueued task from that CPU, leaving the destination
CPU with nothing left to run. Fix this race, by reconfirming nr_running
for the busiest CPU, after its runqueue lock has been obtained.

Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:40 -07:00
Syed Rameez Mustafa
fffa33d56a sched: Avoid pulling big tasks to the little cluster during load balance
When a lower capacity CPU attempts to pull work from a higher capacity CPU,
during load balance, it does not distinguish between tasks that will fit
or not fit on the destination CPU. This causes suboptimal load balancing
decisions whereby big tasks end up on the lower capacity CPUs and little
tasks remain on higher capacity CPUs. Avoid this behavior, by first
restricting search to only include tasks that fit on the destination CPU.
If such a task cannot be found, remove this restriction so that any task
can be pulled over to the destination CPU. This behavior is not applicable
during sched_boost, however, as none of the tasks will fit on a lower
capacity CPU.

Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:38 -07:00
Joonwoo Park
dbd548aed7 sched: fix rounding error on scaled execution time calculation
It's found that the scaled execution time can be less than its actual time
due to rounding errors.  The HMP scheduler accumulates scaled execution
time of tasks to determine if tasks are in need of up-migration.  But the
rounding error prevents the HMP scheduler from accumulating 100% load which
prevents us from ever reaching an up-migrate of 100%.
Fix rounding error by rounding quotient up.

CRs-fixed: 759041
Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:37 -07:00
Olav Haugan
e1e5891538 sched/fair: Respect wake to idle over sync wakeup
Sync wakeup currently takes precedence over wake to idle flag. A sync
wakeup causes a task to be placed on a non-idle CPU because we expect
this CPU to become idle very shortly. However, even though the sync flag
is set there is no guarantee that the task will go to sleep right away
As a consequence performance suffers.

Fix this by preferring an idle CPU over a potential busy cpu when both
wake to idle and sync wakeup are set.

Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d
CRs-fixed: 794424
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:01:36 -07:00
Srivatsa Vaddagiri
a6c5eb13da sched: Support CFS_BANDWIDTH feature in HMP scheduler
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.

Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in dequeue_task_fair().]
2016-03-23 20:01:35 -07:00
Srivatsa Vaddagiri
0a33ec2ea9 sched: Consolidate hmp stats into their own struct
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.

Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __update_load_avg().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:34 -07:00
Srivatsa Vaddagiri
207d78dd26 sched: Add userspace interface to set PF_WAKE_UP_IDLE
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.

Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:33 -07:00
Jeff Ohlstein
b0ccf5db31 sched_avg: add run queue averaging
Add code to calculate the run queue depth of a cpu and iowait
depth of the cpu.

The scheduler calls in to sched_update_nr_prod whenever there
is a runqueue change. This function maintains the runqueue average
and the iowait of that cpu in that time interval.

Whoever wants to know the runqueue average is expected to call
sched_get_nr_running_avg periodically to get the accumulated
runqueue and iowait averages for all the cpus.

Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 0299fcaaad80e2c0ac9aa583c95107f6edc27750)
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:32 -07:00
Joonwoo Park
f4a6c4e327 sched: add sched feature FORCE_CPU_THROTTLING_IMMINENT
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform
migration due to EA without checking frequency throttling.  This option
can give us better debugging and verification capability.

Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583
[joonwoop@codeaurora.org: fixed trivial conflict in
 kernel/sched/features.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:31 -07:00
Joonwoo Park
6939c5ae7e sched: continue to search less power efficient cpu for load balancer
When choosing a CPU to do power-aware active balance from the load
balancer currently selects the first eligible CPU it finds, even if
there is another eligible CPU which is higher-power. This can lead to
suboptimal load balancing behavior and extra migrations. Power and
performance will be impacted.

Achieve better power and performance by continuing to search the least
power efficient cpu as long as the cpu's load average is higher than or
equal to the busiest cpu found by far.

CRs-fixed: 777341
Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:30 -07:00
Syed Rameez Mustafa
2be279419f sched: Update cur_freq for offline CPUs in notifier callback
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.

Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:29 -07:00
Olav Haugan
f916962758 sched: Fix overflow in max possible capacity calculation
The max possible capacity calculation might overflow given large enough
max possible frequency and capacity. Fix potential for overflow.

Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:01:28 -07:00
Steve Muckle
61add5eb96 sched: add preference for prev_cpu in HMP task placement
At present the HMP task placement algorithm scans CPUs in numerical
order and if two identical options are found, the first one
encountered is chosen, even if it is different from the task's
previous CPU.

Add a bias towards the task's previous CPU in such situations. Any
time two or more CPUs are considered equivalent (load, C-state, power
cost), if one of them is the task's previous CPU, bias towards that
CPU. The algorithm is otherwise unchanged.

CRs-Fixed: 772033
Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:27 -07:00
Srivatsa Vaddagiri
5b45dc56e5 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:26 -07:00
Srivatsa Vaddagiri
ee87e1d7c4 sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.

This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.

CRs-Fixed: 773101
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:25 -07:00
Olav Haugan
3f947e7ba7 sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 20:01:24 -07:00
Olav Haugan
1bc662a791 sched: Ensure no active EA migration occurs when EA is disabled
There exists a flag called "sched_enable_power_aware" that is not honored
everywhere. Fix this.

Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 20:01:24 -07:00
Joonwoo Park
a4ca8c9b56 sched: take account of irq preemption when calculating irqload delta
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative.  This negative delta means there was recent irq occurence.
So remove improper BUG_ON().

CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:23 -07:00
Joonwoo Park
790d5d8a4a sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:22 -07:00
Olav Haugan
90dc3fa9a7 sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: fixed conflict in group_classify() and
 set_task_cpu().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:21 -07:00
Olav Haugan
c7b587d9aa sched: Avoid migrating tasks to little cores due to EA
If during the check whether migration is needed we find that there is a
lower power CPU available we commence to find a new CPU for this task.
However, by the time we search for a new CPU the lower power CPU might
no longer be available. We should abort the attempt to migrate a task in
this case.

CRs-Fixed: 764788
Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:01:20 -07:00
Olav Haugan
5a48aeb06c sched: Add temperature to cpu_load trace point
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.

CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-03-23 20:01:19 -07:00
Olav Haugan
72fa561b0d sched: Only do EA migration when CPU throttling is imminent
We do not want to migrate tasks unnecessary to avoid cache hit and other
migration latencies that could affect the performance of the system. Add
a check to only try EA migration when CPU frequency throttling is
imminent.

CRs-Fixed: 764788
Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 20:01:18 -07:00
Srivatsa Vaddagiri
29a412dffa sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and
 __schedule().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:17 -07:00
Steve Muckle
d1b240ccc7 sched: treat sync waker CPUs with 1 task as idle
When a CPU with one task performs a sync wakeup, its
one task is expected to sleep immediately so this CPU
should be treated as idle for the purposes of CPU selection
for the waking task.

This is only done when idle CPUs are the preferred targets
for non-small task wakeups. When prefer_idle is 0, the
CPU is left as non-idle in the selection logic so it is still
a preferred candidate for the sync wakeup.

Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:16 -07:00
Syed Rameez Mustafa
b9c3d7384d sched: extend sched_task_load tracepoint to indicate prefer_idle
Prefer idle determines whether the scheduler prefers an idle CPU
over a busy CPU or not to wake up a task on. Knowing the correct
value of this tunable is essential in understanding placement
decisions made in select_best_cpu().

Change-Id: I955d7577061abccb65d01f560e1911d9db70298a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:15 -07:00
Steve Muckle
636a5749c8 sched: extend sched_task_load tracepoint to indicate sync wakeup
Sync wakeups provide a hint to the scheduler about upcoming task
activity. Knowing which wakeups are sync wakeups from logs will
assist in workload analysis.

Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:14 -07:00
Steve Muckle
bd4d0eade7 sched: add sync wakeup recognition in select_best_cpu
If a wakeup is a sync wakeup, we need to discount the currently
running task's load from the waker's CPU as we calculate the best
CPU for the waking task to land on.

Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:13 -07:00
Srivatsa Vaddagiri
72b7c5d36c sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:12 -07:00
Steve Muckle
588055e8c7 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:11 -07:00
Steve Muckle
6bde9f65b3 sched: trace: extend sched_cpu_load to print irqload
The irqload is used in determining whether CPUs are mostly idle
so it is useful to know this value while viewing scheduler traces.

Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:10 -07:00
Steve Muckle
d3abb1dd6b sched: avoid CPUs with high irq activity
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.

Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:09 -07:00
Steve Muckle
4006da6ec4 sched: refresh sched_clock() after acquiring rq lock in irq path
The wallclock time passed to sched_account_irqtime() may be stale
after we wait to acquire the runqueue lock. This could cause problems
in update_task_ravg because a different CPU may have advanced
this CPU's window_start based on a more up-to-date wallclock value,
triggering a BUG_ON(window_start > wallclock).

Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:08 -07:00
Steve Muckle
3b5eac8886 sched: track soft/hard irqload per-RQ with decaying avg
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.

Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):

irqload per window        load value asymptote      # windows to > 10
2ms			  8			    n/a
3ms			  12			    7
4ms			  16			    4
5ms			  20			    3

Of course irqload will not be constant in each window, these are just
given as simple examples.

Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:07 -07:00
Steve Muckle
6370716bc3 sched: do not set window until sched_clock is fully initialized
The system initially uses a jiffy-based sched clock. When the platform
registers a new timer for sched_clock, sched_clock can jump backwards.
Once sched_clock_postinit() runs it should be safe to rely on it.

Also sched_clock_cpu() relies on completion of sched_clock_init()
and until that happens sched_clock_cpu() returns zero. This is used
in the irq accounting path which window-based stats relies upon.
So do not set window_start until sched_clock_cpu() is working.

Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:06 -07:00
Syed Rameez Mustafa
57ee8ef06e sched: Make RT tasks eligible for boost
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.

Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:01:05 -07:00
Srivatsa Vaddagiri
dc66ef50f5 sched: Limit LBF_PWR_ACTIVE_BALANCE to within cluster
When higher power (performance) cluster has only one online cpu, we
currently let an idle cpu in lower power cluster pull a running task
from performance cluster via active balance. Active balance for
power-aware reasons is supposed to be restricted to balance within
cluster, the check for which is not correctly implemented.

Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:04 -07:00
Srivatsa Vaddagiri
8e3aa6790c sched: Packing support until a frequency threshold
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.

Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:03 -07:00
Steve Muckle
2365b0cbd6 sched: tighten up jiffy to sched_clock mapping
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.

Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:02 -07:00