find_busiest_group() can end up returning a NULL group due to load based
checks even though there are tasks that can be migrated to higher capacity
CPUs (LBF_BIG_TASK_ACTIVE_BALANCE) or EA core rotation is possible
(LBF_EA_ACTIVE_BALANCE). To get best power and performance ensure that load
balance does attempt to pull tasks when HMP_ACTIVE_BALANCE flag is set.
Since sched boost also falls under the same category club it into the same
generic condition.
Change-Id: I3db7ec200d2a038917b1f2341602eb87b5aed289
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the
worst scheduling latency. It provides easier way to identify maximum
scheduling latency seen across the CPUs.
Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and
/proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the
cases that tasks are runnable but not scheduled more than configured time.
This helps to find out unacceptably high scheduling latency more easily.
Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present scheduler resets task's wait start timestamp when the task
migrates to another rq. This misleads scheduler itself into reporting
less wait time than actual by omitting time spent for waiting prior to
migration and also more wait count than actual by counting migration as
wait end event which can be seen by trace or /proc/<pid>/sched with
CONFIG_SCHEDSTATS=y.
Carry forward migrating task's wait time prior to migration and
don't count migration as a wait end event to fix such statistics error.
In order to determine whether task is migrating mark task->on_rq with
TASK_ON_RQ_MIGRATING while dequeuing and enqueuing due to migration.
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: ohaugan@codeaurora.org
Link: http://lkml.kernel.org/r/20151113033854.GA4247@codeaurora.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
[joonwoop@codeaurora.org: fixed minor conflict in detach_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Change-Id: I2d7f7d9895815430ad61383e62d28d889cce66c3
At boot, the cpufreq framework sends transition notifiers before
sending out the policy notifier. Since the scheduler relies on the
policy notifier to build up the frequency domain masks, when the
initial set of transition notifiers are sent, the scheduler has no
frequency domains. As a result the scheduler fails to update the
cur_freq information. Update cur_freq as part of the policy notifier
so that the scheduler always has the current frequency information.
Change-Id: I7bd2958dfeb064dd20b9ccebafd372436484e5d6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
The irq-aware scheduler is to achieve better performance by avoiding task
placement to the CPUs which have high irq activity. However current
scheduler places tasks to the CPUs which are loaded by irq activity
preferably as opposed to what it is meant to be when the task is non-small.
This is suboptimal for both power and performance.
Fix task placement algorithm to avoid CPUs with significant irq activities.
Change-Id: Ifa5a6ac186241bd58fa614e93e3d873a5f5ad4ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
When performance CPU runs idle or newly idle load balancer to pull a
task on power efficient CPU, the load balancer always fails and enters
idle mode if the big task on the power efficient CPU is running. This is
suboptimal when the running task on the power efficient CPU doesn't fit
on the power efficient CPU as it's quite possible that the big task will
sustain on the power efficient CPU until it's preempted while there is
a performance CPU sitting idle.
Revise load balancer algorithm to actively migrate big tasks on power
efficient CPU to performance CPU when performance CPU runs idle or newly
idle load balancer.
Change-Id: Iaf05e0236955fdcc7ded0ff09af0880050a2be32
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in group_classify().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
It may be desirable to discourage upmigration of tasks belonging to
some cgroups. Add a per-cgroup flag (upmigrate_discourage) that
discourages upmigration of tasks of a cgroup. Tasks of the cgroup are
allowed to upmigrate only under overcommitted scenario.
Change-Id: I1780e420af1b6865c5332fb55ee1ee408b74d8ce
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Use new cgroup APIs]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
With EA (Energy Awareness), idle_balance() on a CPU runs on behalf of most
power efficient idle CPU among the CPUs in its sched domain level under the
condition that the substitute idle CPU should be limited to a CPU which has
the same capacity with original idle CPU.
It is found that at present idle_balance() spans all the CPUs in its sched
domain and run idle balancer on behalf of any CPU within the domain which
could be all the CPUs in the system which consequently makes idle balancer
on a performance CPU always runs on behalf of a power efficient idle CPU.
This would cause for idle performance CPUs to fail to pull tasks from power
efficient CPUs always when there is only an online performance CPU.
Limit search CPUs to cache sharing CPUs with original idle CPU to ensure to
run idle balancre on behalf of more power efficient CPU but still has the
same capacity with original CPU to fix such issue.
Change-Id: I0575290c24f28db011d9353915186e64df7e57fe
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.
Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.
* sched_update_nr_prod() needs to consider that nr_running count can
change by more than 1 when CFS_BANDWIDTH feature is used
* sched_get_nr_running_avg() needs to sum up nr_iowait count across
all cpus, rather than just one
* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
as a result of which it could use curr_time which is behind a cpu's
'last_time' value. That would lead to erroneous calculation of
average nr_running or nr_iowait.
While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.
Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
When running load balance, the destination CPU checks the number
of running tasks on the busiest CPU without holding the busiest
CPUs runqueue lock. This opens the load balancer to a race whereby
a third CPU running load balance at the same time; having found the
same busiest group and queue, may have already pulled one of the
waiting tasks from the busiest CPU. Under scenarios where the source
CPU is running the idle task and only a single task remains waiting on
the busiest runqueue (nr_running = 1), the destination CPU will end
up pulling the only enqueued task from that CPU, leaving the destination
CPU with nothing left to run. Fix this race, by reconfirming nr_running
for the busiest CPU, after its runqueue lock has been obtained.
Change-Id: I42e132b15f96d9d5d7b32ef4de3fb92d2f837e63
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
When a lower capacity CPU attempts to pull work from a higher capacity CPU,
during load balance, it does not distinguish between tasks that will fit
or not fit on the destination CPU. This causes suboptimal load balancing
decisions whereby big tasks end up on the lower capacity CPUs and little
tasks remain on higher capacity CPUs. Avoid this behavior, by first
restricting search to only include tasks that fit on the destination CPU.
If such a task cannot be found, remove this restriction so that any task
can be pulled over to the destination CPU. This behavior is not applicable
during sched_boost, however, as none of the tasks will fit on a lower
capacity CPU.
Change-Id: I1093420a629a0886fc3375849372ab7cf42e928e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in can_migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
It's found that the scaled execution time can be less than its actual time
due to rounding errors. The HMP scheduler accumulates scaled execution
time of tasks to determine if tasks are in need of up-migration. But the
rounding error prevents the HMP scheduler from accumulating 100% load which
prevents us from ever reaching an up-migrate of 100%.
Fix rounding error by rounding quotient up.
CRs-fixed: 759041
Change-Id: Ie4d9693593cc3053a292a29078aa56e6de8a2d52
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Sync wakeup currently takes precedence over wake to idle flag. A sync
wakeup causes a task to be placed on a non-idle CPU because we expect
this CPU to become idle very shortly. However, even though the sync flag
is set there is no guarantee that the task will go to sleep right away
As a consequence performance suffers.
Fix this by preferring an idle CPU over a potential busy cpu when both
wake to idle and sync wakeup are set.
Change-Id: I6b40a44e2b4d5b5fa6088e4f16428f9867bd928d
CRs-fixed: 794424
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
CFS_BANDWIDTH feature is not currently well-supported by HMP
scheduler. Issues encountered include a kernel panic when
rq->nr_big_tasks count becomes negative. This patch fixes HMP
scheduler code to better handle CFS_BANDWIDTH feature. The most
prominent change introduced is maintenance of HMP stats (nr_big_tasks,
nr_small_tasks, cumulative_runnable_avg) per 'struct cfs_rq' in
addition to being maintained in each 'struct rq'. This allows HMP
stats to be updated easily when a group is throttled on a cpu.
Change-Id: Iad9f378b79ab5d9d76f86d1775913cc1941e266a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in dequeue_task_fair().]
Key hmp stats (nr_big_tasks, nr_small_tasks and
cumulative_runnable_average) are currently maintained per-cpu in
'struct rq'. Merge those stats in their own structure (struct
hmp_sched_stats) and modify impacted functions to deal with the newly
introduced structure. This cleanup is required for a subsequent patch
which fixes various issues with use of CFS_BANDWIDTH feature in HMP
scheduler.
Change-Id: Ieffc10a3b82a102f561331bc385d042c15a33998
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in __update_load_avg().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
sched_prefer_idle flag controls whether tasks can be woken to any
available idle cpu. It may be desirable to set sched_prefer_idle to 0
so that most tasks wake up to non-idle cpus under mostly_idle
threshold and have specialized tasks override this behavior through
other means. PF_WAKE_UP_IDLE flag per task provides exactly that. It
lets tasks with PF_WAKE_UP_IDLE flag set be woken up to any available
idle cpu independent of sched_prefer_idle flag setting. Currently
only kernel-space API exists to set PF_WAKE_UP_IDLE flag for a task.
This patch adds a user-space API (in /proc filesystem) to set
PF_WAKE_UP_IDLE flag for a given task. /proc/[pid]/sched_wake_up_idle
file can be written to set or clear PF_WAKE_UP_IDLE flag for a given
task.
Change-Id: I13a37e740195e503f457ebe291d54e83b230fbeb
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/fair.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add code to calculate the run queue depth of a cpu and iowait
depth of the cpu.
The scheduler calls in to sched_update_nr_prod whenever there
is a runqueue change. This function maintains the runqueue average
and the iowait of that cpu in that time interval.
Whoever wants to know the runqueue average is expected to call
sched_get_nr_running_avg periodically to get the accumulated
runqueue and iowait averages for all the cpus.
Change-Id: Id8cb2ecf0ed479f090a83ccb72dd59c53fa73e0c
Signed-off-by: Jeff Ohlstein <johlstei@codeaurora.org>
(cherry picked from commit 0299fcaaad80e2c0ac9aa583c95107f6edc27750)
[rameezmustafa@codeaurora.org: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Add a new sched feature FORCE_CPU_THROTTLING_IMMINENT to perform
migration due to EA without checking frequency throttling. This option
can give us better debugging and verification capability.
Change-Id: Iba445961a7f9812528b4e3aa9c6ddf47a3aad583
[joonwoop@codeaurora.org: fixed trivial conflict in
kernel/sched/features.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
When choosing a CPU to do power-aware active balance from the load
balancer currently selects the first eligible CPU it finds, even if
there is another eligible CPU which is higher-power. This can lead to
suboptimal load balancing behavior and extra migrations. Power and
performance will be impacted.
Achieve better power and performance by continuing to search the least
power efficient cpu as long as the cpu's load average is higher than or
equal to the busiest cpu found by far.
CRs-fixed: 777341
Change-Id: I14eb21ab725bf7dab88b2e1e169aced6f2d712ca
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.
Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
The max possible capacity calculation might overflow given large enough
max possible frequency and capacity. Fix potential for overflow.
Change-Id: Ie9345bc657988845aeb450d922052550cca48a5f
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
At present the HMP task placement algorithm scans CPUs in numerical
order and if two identical options are found, the first one
encountered is chosen, even if it is different from the task's
previous CPU.
Add a bias towards the task's previous CPU in such situations. Any
time two or more CPUs are considered equivalent (load, C-state, power
cost), if one of them is the task's previous CPU, bias towards that
CPU. The algorithm is otherwise unchanged.
CRs-Fixed: 772033
Change-Id: I511f5b929c2bfa6fdea9e7433893c27b29ed8026
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.
Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.
This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.
CRs-Fixed: 773101
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.
Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
There exists a flag called "sched_enable_power_aware" that is not honored
everywhere. Fix this.
Change-Id: I62225939b71b25970115565b4e9ccb450e252d7c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
If irq raises while sched_irqload() is calculating irqload delta,
sched_account_irqtime() can update rq's irqload_ts which can be greater
than the jiffies stored in sched_irqload()'s context so delta can be
negative. This negative delta means there was recent irq occurence.
So remove improper BUG_ON().
CRs-fixed: 771894
Change-Id: I5bb01b50ec84c14bf9f26dd9c95de82ec2cd19b5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0). This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation. Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.
Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.
Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.
In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.
Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: fixed conflict in group_classify() and
set_task_cpu().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
If during the check whether migration is needed we find that there is a
lower power CPU available we commence to find a new CPU for this task.
However, by the time we search for a new CPU the lower power CPU might
no longer be available. We should abort the attempt to migrate a task in
this case.
CRs-Fixed: 764788
Change-Id: I867923a82b95c599278b81cd73bb102b6aff4d03
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add the current CPU temperature to the sched_cpu_load trace point.
This will allow us to track the CPU temperature.
CRs-Fixed: 764788
Change-Id: Ib2e3559bbbe3fe07a6b7c8115db606828bc36254
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
We do not want to migrate tasks unnecessary to avoid cache hit and other
migration latencies that could affect the performance of the system. Add
a check to only try EA migration when CPU frequency throttling is
imminent.
CRs-Fixed: 764788
Change-Id: I92e86e62da10ce15f1e76a980df3545e93d76348
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!
Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).
Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).
CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and
__schedule().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
When a CPU with one task performs a sync wakeup, its
one task is expected to sleep immediately so this CPU
should be treated as idle for the purposes of CPU selection
for the waking task.
This is only done when idle CPUs are the preferred targets
for non-small task wakeups. When prefer_idle is 0, the
CPU is left as non-idle in the selection logic so it is still
a preferred candidate for the sync wakeup.
Change-Id: I65c6535169293e8ba0c37fb5e88aec336338f7d7
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
Prefer idle determines whether the scheduler prefers an idle CPU
over a busy CPU or not to wake up a task on. Knowing the correct
value of this tunable is essential in understanding placement
decisions made in select_best_cpu().
Change-Id: I955d7577061abccb65d01f560e1911d9db70298a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Sync wakeups provide a hint to the scheduler about upcoming task
activity. Knowing which wakeups are sync wakeups from logs will
assist in workload analysis.
Change-Id: I6ffe73f2337e56b8234d4097069d5d70ab045eda
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
If a wakeup is a sync wakeup, we need to discount the currently
running task's load from the waker's CPU as we calculate the best
CPU for the waking task to land on.
Change-Id: I00c5df626d17868323d60fb90b4513c0dd314825
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.
Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.
Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
The irqload is used in determining whether CPUs are mostly idle
so it is useful to know this value while viewing scheduler traces.
Change-Id: Icbb74fc1285be878f254ae54886bdb161b14a270
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
CPUs with significant IRQ activity will not be able to serve tasks
quickly. Avoid them if possible by disqualifying such CPUs from
being recognized as mostly idle.
Change-Id: I2c09272a4f259f0283b272455147d288fce11982
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
The wallclock time passed to sched_account_irqtime() may be stale
after we wait to acquire the runqueue lock. This could cause problems
in update_task_ravg because a different CPU may have advanced
this CPU's window_start based on a more up-to-date wallclock value,
triggering a BUG_ON(window_start > wallclock).
Change-Id: I316af62d1716e9b59c4a2898a2d9b44d6c7a75d8
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
The scheduler currently ignores irq activity when deciding which
CPUs to place tasks on. If a CPU is getting hammered with IRQ activity
but has no tasks it will look attractive to the scheduler as it will
not be in a low power mode.
Track irqload with a decaying average. This quantity can be used
in the task placement logic to avoid CPUs which are under high
irqload. The decay factor is 3/4. Note that with this algorithm the
tracked irqload quantity will be higher than the actual irq time
observed in any single window. Some sample outcomes with steady
irqloads per 10ms window and the 3/4 decay factor (irqload of 10 is
used as a threshold in a subsequent patch):
irqload per window load value asymptote # windows to > 10
2ms 8 n/a
3ms 12 7
4ms 16 4
5ms 20 3
Of course irqload will not be constant in each window, these are just
given as simple examples.
Change-Id: I9dba049f5dfdcecc04339f727c8dd4ff554e01a5
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
The system initially uses a jiffy-based sched clock. When the platform
registers a new timer for sched_clock, sched_clock can jump backwards.
Once sched_clock_postinit() runs it should be safe to rely on it.
Also sched_clock_cpu() relies on completion of sched_clock_init()
and until that happens sched_clock_cpu() returns zero. This is used
in the irq accounting path which window-based stats relies upon.
So do not set window_start until sched_clock_cpu() is working.
Change-Id: Ided349de8f8554f80a027ace0f63ea52b1c38c68
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
During sched boost RT tasks currently end up going to the lowest
power cluster. This can be a performance bottleneck especially if
the frequency and IPC differences between clusters are high.
Furthermore, when RT tasks go over to the little cluster during
boost, the load balancer keeps attempting to pull work over to the
big cluster. This results in pre-emption of the executing RT task
causing more delays. Finally, containing more work on a single
cluster during boost might help save some power if the little
cluster can then enter deeper low power modes.
Change-Id: I177b2e81be5657c23e7ac43889472561ce9993a9
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
When higher power (performance) cluster has only one online cpu, we
currently let an idle cpu in lower power cluster pull a running task
from performance cluster via active balance. Active balance for
power-aware reasons is supposed to be restricted to balance within
cluster, the check for which is not correctly implemented.
Change-Id: I5fba7f01ad80c082a9b27e89b7f6b17a6d9cde14
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.
Change-Id: I318e9af6c8788ddf5dfcda407d621449ea5343c0
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
The tick code already tracks exact time a tick is expected
to arrive. This can be used to eliminate slack in the jiffy
to sched_clock mapping that aligns windows between a caller
of sched_set_window and the scheduler itself.
Change-Id: I9d47466658d01e6857d7457405459436d504a2ca
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>