Commit graph

581 commits

Author SHA1 Message Date
Pavankumar Kondeti
f1e9995fe4 sched: add a knob to prefer the waker CPU for sync wakeups
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.

However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.

Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:52:06 +05:30
Joonwoo Park
cc60f0790f sched: constrain HMP scheduler tunable range with in better way
HMP scheduler tunables can be constrained via extra1 and extra2 of
ctl_table.  Having valid range in the sysctl table gives clearer
view of tunable's range.

Also add range for sched_select_prev_cpu_us so we can avoid invalid
value configuration of that tunable.

CRs-fixed: 1056910
Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-09-21 16:50:42 -07:00
Pavankumar Kondeti
078568e425 sched: Introduce sched_freq_aggregate_threshold tunable
Do the aggregation for frequency only when the total group busy time
is above sched_freq_aggregate_threshold. This filtering is especially
needed for the cases where groups are created by including all threads
of an application process. This knob can be tuned to apply aggregation
only for the heavy workload applications.

When this knob is enabled and load is aggregated, the load is not
clipped to 100% @ current frequency to ramp up the frequency faster.

Change-Id: Icfd91c85938def101a989af3597d3dcaa8026d16
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:37 -07:00
Pavankumar Kondeti
5ddfbfec06 sched: inherit the group id from the group leader
When sysctl_sched_enable_thread_grouping is set to 1, any new tasks
created are put in the same group as their group leader.

Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:35 -07:00
Syed Rameez Mustafa
62f2600ce9 sched: Remove all existence of CONFIG_SCHED_FREQ_INPUT
CONFIG_SCHED_FREQ_INPUT was created to keep parts of the scheduler
dealing with frequency separate from other parts of the scheduler
that deal with task placement. However, overtime the two features
have become intricately linked whereby SCHED_FREQ_INPUT cannot be
turned on without having SCHED_HMP turned on as well. Given this
complex inter-dependency and the fact that all old, existing and
future targets use both config options, remove this unnecessary
feature separation. It will aid in making kernel upgrades a lot
simpler and faster.

Change-Id: Ia20e40d8a088d50909cc28f5be758fa3e9a4af6f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:37:22 -07:00
Syed Rameez Mustafa
ef1e55638d sched: Remove unused migration notifier code.
Migration notifiers were created to aid the CPU-boost driver manage
CPU frequencies when tasks migrate from one CPU to another. Over time
with the evolution of scheduler guided frequency, the scheduler now
directly manages load when tasks migrate. Consequently the CPU-boost
driver no longer makes use of this information. Remove unused code
pertaining to this feature.

Change-Id: I3529e4356e15e342a5fcfbcf3654396752a1d7cd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 11:32:19 -07:00
Subash Abhinov Kasiviswanathan
ea60e2fbe4 Revert "kernel/sysctl.c: detect overflows when converting to int"
We have scripts which write to certain fields on 3.18 kernels but
this seems to be failing on 4.4 kernels.
An entry which we write to here is xfrm_aevent_rseqth which is u32.

echo 4294967295  > /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109 ("kernel/sysctl.c:
detect overflows when converting to int") prevented writing to
sysctl entries when integer overflow occurs. However, this does not
apply to unsigned integers.

u32 should be able to hold 4294967295 here, however it fails due to
this check.

static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
			if (*lvalp > (unsigned long) INT_MAX)
				return -EINVAL;

Fix this for now by reverting this commit till a solution is
finalized upstream.

CRs-Fixed: 1026507
Change-Id: I4fae5f442e4cc2c2414a69e960d42c05c3062415
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
2016-06-15 16:16:33 -07:00
Joonwoo Park
42ab5394f4 Revert "sched: warn/panic upon excessive scheduling latency"
This reverts commit 8f90803a45 ("sched: warn/panic upon excessive
scheduling latency") as this feature is no longer used.

Change-Id: I200d0e9e8dad5047522cd02a68de25d4a70a91a4
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:48:17 -07:00
Joonwoo Park
9103cfbaa1 Revert "sched: add scheduling latency tracking procfs node"
This reverts commit b40bf941f6 ("sched: add scheduling latency
tracking procfs node") as this feature is no longer used.

Change-Id: I5de789b6349e6ea78ae3725af2a3ffa72b7b7f11
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:48:05 -07:00
Joonwoo Park
11ad3c4f92 sched: eliminate sched_early_detection_duration knob
Kill unused scheduler knob sched_early_detection_duration.

Change-Id: I36b7a10982367f9c7ab8eefcb8ef1d0f9955601d
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:51 -07:00
Joonwoo Park
eedf0821f6 sched: Remove the sched heavy task frequency guidance feature
This has always been unused feature given its limitation of adding
phantom load to the system. Since there are no immediate plans of
using this and the fact that it adds unnecessary complications to
the new load fixup mechanism, remove this feature for now. It can
be revisited later in light of the new mechanism.

Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:39 -07:00
Joonwoo Park
6b2c4343e7 sched: eliminate sched_migration_fixup knob
Kill unused scheduler knob sched_migration_fixup.  With this change
scheduler always adjusts CPU's busy time during migration.

Change-Id: I5d59e89d5cc0f2c705c40036cd7b47f5d3f89e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:47:25 -07:00
Joonwoo Park
dc284e65df sched: eliminate sched_upmigrate_min_nice knob
Kill unused scheduler knob sched_upmigrate_min_nice.

Change-Id: I53ddfde39c78e78306bd746c1c4da9a94ec67cd8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-03 14:46:44 -07:00
Joonwoo Park
d009f9c149 sched: eliminate sched_enable_power_aware knob and parameter
Kill unused scheduler knob and parameter sched_enable_power_aware.  HMP
scheduler always take into account power cost for placing task.

Change-Id: Ib26a21df9b903baac26c026862b0a41b4a8834f3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:29 -07:00
Joonwoo Park
462213d1ac sched: eliminate sched_freq_account_wait_time knob
Kill unused scheduler knob sched_freq_account_wait_time.

Change-Id: Ib74123ebd69dfa3f86cf7335099f50c12a6e93c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:18 -07:00
Joonwoo Park
5160d93b6d sched: eliminate sched_account_wait_time knob
Kill unused scheduler knob sched_account_wait_time.  With this change
scheduler always accounts task's wait time into demand.

Change-Id: Ifa4bcb5685798f48fd020f3d0c9853220b3f5fdc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-06-01 15:21:04 -07:00
Srivatsa Vaddagiri
e6aae1c3e0 sched: Aggregate for frequency
Related threads in a group could execute on different CPUs and hence
present a split-demand picture to cpufreq governor. IOW the governor
fails to see the net cpu demand of all related threads in a given
window if the threads's execution were to be split across CPUs. That
could result in sub-optimal frequency chosen in comparison to the
ideal frequency at which the aggregate work (taken up by related
threads) needs to be run.

This patch aggregates cpu execution stats in a window for all related
threads in a group. This helps present cpu busy time to governor as if
all related threads were part of the same thread and thus help select
the right frequency required by related threads. This aggregation
is done per-cluster.

Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read
 lock as this version of Linux doesn't hold p->pi_lock when the
 function gets called while keeping use of rcu_access_pointer() since
 we never dereference return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-05-26 15:28:59 -07:00
Joonwoo Park
16ecb20600 sched: restrict sync wakee placement bias with waker's demand
Biasing sync wakee task towards waker CPU's cluster makes sense when the
waker's demand is high enough so the wakee also can take advantage
of high CPU frequency voted because of waker's load.  Placing sync wakee
on the low demand waker's CPU can lead placement imbalance which can
lead unnecessary migration.

Introduce a new tunable "sched_big_waker_task_load" that defines the big
waker so scheduler avoid wakee on waker's cluster bias when the waker's
load is below the tunable.

CRs-fixed: 971295
Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
 include/linux/sched/sysctl.h.]
2016-03-23 21:25:23 -07:00
Joonwoo Park
616e04a51c sched: add preference for waker cluster CPU in wakee task placement
If sync wakee task's demand is small it's worth to place the wakee task
on waker's cluster for better performance in the sense that waker and
wakee are corelated so the wakee should take advantage of waker cluster's
frequency which is voted by the waker along with cache locality benefit.
While biasing towards the waker's cluster we want to avoid the waker CPU
as much as possible as placing the wakee on the waker's CPU can make the
waker got preempted and migrated by load balancer.

Introduce a new tunable 'sched_small_wakee_task_load' that differentiates
eligible small wakee task and place the small wakee tasks on the waker's
cluster.

CRs-fixed: 971295
Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
 include/linux/sched/sysctl.h.]
2016-03-23 21:25:22 -07:00
Pavankumar Kondeti
6d742ce87b sched: Add separate load tracking histogram to predict loads
Current window based load tracking only saves history for five
windows. A historically heavy task's heavy load will be completely
forgotten after five windows of light load. Even before the five
window expires, a heavy task wakes up on same CPU it used to run won't
trigger any frequency change until end of the window. It would starve
for the entire window. It also adds one "small" load window to
history because it's accumulating load at a low frequency, further
reducing the tracked load for this heavy task.

Ideally, scheduler should be able to identify such tasks and notify
governor to increase frequency immediately after it wakes up.

Add a histogram for each task to track a much longer load history. A
prediction will be made based on runtime of previous or current
window, histogram data and load tracked in recent windows. Prediction
of all tasks that is currently running or runnable on a CPU is
aggregated and reported to CPUFreq governor in sched_get_cpus_busy().

sched_get_cpus_busy() now returns predicted busy time in addition
to previous window busy time and new task busy time, scaled to
the CPU maximum possible frequency.

Tunables:

- /proc/sys/kernel/sched_gov_alert_freq (KHz)

This tunable can be used to further filter the notifications.
Frequency alert notification is sent only when the predicted
load exceeds previous window load by sched_gov_alert_freq converted to
load.

Change-Id: If29098cd2c5499163ceaff18668639db76ee8504
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts around __migrate_task()
 and removed changes for CONFIG_SCHED_QHMP.]
2016-03-23 21:25:17 -07:00
Pavankumar Kondeti
6418f213ab sched: Revise the inter cluster load balance restrictions
The frequency based inter cluster load balance restrictions are not
reliable as frequency does not provide a good estimate of the CPU's
current load. Replace them with the spill_load and spill_nr_run
based checks.

The higher capacity cluster is restricted from pulling the tasks from
the lower capacity cluster unless all of the lower capacity CPUs are
above spill. This behavior can be controlled by a sysctl tunable and
it is disabled by default (i.e. no load balance restrictions).

Change-Id: I45c09c8adcb61a8a7d4e08beadf2f97f1805fb42
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
 for CONFIG_SCHED_QHMP.]
2016-03-23 21:25:13 -07:00
Srivatsa Vaddagiri
3004236139 sched: colocate related threads
Provide userspace interface for tasks to be grouped together as
"related" threads. For example, all threads involved in updating
display buffer could be tagged as related.

Scheduler will attempt to provide special treatment for group of
related threads such as:

1) Colocation of related threads in same "preferred" cluster
2) Aggregation of demand towards determination of cluster frequency

This patch extends scheduler to provide best-effort colocation support
for a group of related threads.

Change-Id: Ic2cd769faf5da4d03a8f3cb0ada6224d0101a5f5
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor merge conflicts.  removed ifdefry
 for CONFIG_SCHED_QHMP.]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>

Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 21:25:12 -07:00
Srivatsa Vaddagiri
df6bfcaf70 sched: Update fair and rt placement logic to use scheduler clusters
Make use of clusters in the fair and rt scheduling classes. This is
needed as the freq domain mask can no longer be used to do correct
task placement. The freq domain mask was being used to demarcate
clusters.

Change-Id: I57f74147c7006f22d6760256926c10fd0bf50cbd
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed merge conflicts due to omitted changes
 for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 21:25:11 -07:00
Vinayak Menon
ad6f486284 mm: swap: swap ratio support
Add support to receive a static ratio from userspace to
divide the swap pages between ZRAM and disk based swap
devices. The existing infrastructure allows to keep
same priority for multiple swap devices, which results
in round robin distribution of pages. With this patch,
the ratio can be defined.

CRs-fixed: 968416
Change-Id: I54f54489db84cabb206569dd62d61a8a7a898991
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>
2016-03-23 21:19:04 -07:00
Joonwoo Park
20af1732c6 sched: fix compile error where !CONFIG_SCHED_FREQ_INPUT
The sysctl node sched_new_task_windows is only for CONFIG_SCHED_HMP and
CONFIG_SCHED_FREQ_INPUT.

Change-Id: I4791e977fa8516fd2cd31198f71103b8d7e874c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:44 -07:00
Joonwoo Park
07eb3f803b sched: select task's prev_cpu as the best CPU when it was chosen recently
Select given task's prev_cpu when the task slept for short period to
reduce latency of task placement and migrations.  A new tunable
/proc/sys/kernel/sched_select_prev_cpu_us introduced to determine whether
tasks are eligible to go through fast path.

CRs-fixed: 947467
Change-Id: Ia507665b91f4e9f0e6ee1448d8df8994ead9739a
[joonwoop@codeaurora.org: fixed conflict in include/linux/sched.h,
 include/linux/sched/sysctl.h, kernel/sched/core.c and kernel/sysctl.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:43 -07:00
Syed Rameez Mustafa
c00814c023 sched: Notify cpufreq governor early about potential big tasks
Tasks that are on the runqueue continuously for a certain amount of time
have the potential to be big tasks at the end of the window in which they
are runnable. In such scenarios ramping the CPU frequency early can
boost performance rather than waiting till the end of a window for the
governor to query load. Notify the governor early at every tick when a
task has been observed to execute beyond some percentage of the tick
period.

The threshold beyond which a task is eligible for early detection can be
changed via the tunable sched_early_detection_duration. The feature itself
is enabled only when scheduler boost is in effect.

Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in
 kernel/sched/core.c.  fixed minor conflicts in include/linux/sched.h,
 include/linux/sched/sysctl.h and kernel/sysctl.c due to
 CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:34 -07:00
Joonwoo Park
446beddcd4 sched: account new task load so that governor can apply different policy
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.

To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced.  The tunable defines tasks as new
when the tasks are have been active less than configured windows.

Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
[joonwoop@codeaurora.org: ommited changes for
 drivers/cpufreq/cpufreq_interactive.c.  cpufreq changes needs to be
 applied separately later.  fixed conflict in include/linux/sched.h and
 include/linux/sched/sysctl.h.  omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:29 -07:00
Syed Rameez Mustafa
ca42a1bec8 sched: add frequency zone awareness to the load balancer
Add zone awareness to the load balancer. Remove all earlier restrictions
that the load balancer had for inter cluster kicks and migration.

Change-Id: I12ad3d0c2d2e9bb498f49a231810f2ad418b061f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in nohz_kick_needed() due
 to its return type change.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:21 -07:00
Syed Rameez Mustafa
d590f25153 sched: remove the notion of small tasks and small task packing
Task packing will now be determined solely on the basis of the
power cost of task placement. All tasks are eligible for packing.
Remove the notion of "small" tasks from the scheduler.

Change-Id: I72d52d04b2677c6a8d0bc6aa7d50ff0f1a4f5ebb
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-03-23 20:02:19 -07:00
Syed Rameez Mustafa
f2ea07a155 sched: Rework energy aware scheduling
Energy aware core rotation is not compatible with the power
based task placement being introduced in subsequent patches.
Remove all existing EA based task placement/migration logic.
power_cost() is the only function remaining. This function has
been modified to return the total power cost associated with a
task on a given CPU taking existing load on that CPU into
account.

Change-Id: Ia00501e3cbfc6e11446a9a2e93e318c4c42bdab4
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed multiple conflicts in fair.c and minor
 conflict in features.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:02:18 -07:00
Joonwoo Park
81280a6963 sched: check HMP scheduler tunables validity
Check tunables validity to take valid values only.

CRs-fixed: 812443
Change-Id: Ibb9ec0d6946247068174ab7abe775a6389412d5b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:53 -07:00
Joonwoo Park
b40bf941f6 sched: add scheduling latency tracking procfs node
Add a new procfs node /proc/sys/kernel/sched_max_latency_us to track the
worst scheduling latency.  It provides easier way to identify maximum
scheduling latency seen across the CPUs.

Change-Id: I6e435bbf825c0a4dff2eded4a1256fb93f108d0e
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:50 -07:00
Joonwoo Park
8f90803a45 sched: warn/panic upon excessive scheduling latency
Add new tunables /proc/sys/kernel/sched_latency_warn_threshold_us and
/proc/sys/kernel/sched_latency_panic_threshold_us to warn or panic for the
cases that tasks are runnable but not scheduled more than configured time.

This helps to find out unacceptably high scheduling latency more easily.

Change-Id: If077aba6211062cf26ee289970c5abcd1c218c82
[joonwoop@codeaurora.org: fixed conflict in update_stats_wait_end().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:49 -07:00
Srivatsa Vaddagiri
5b45dc56e5 sched: Per-cpu prefer_idle flag
Remove the global sysctl_sched_prefer_idle flag and replace it with a
per-cpu prefer_idle flag. The per-cpu flag is expected to same for all
cpus in a cluster. It thus provides convenient means to disable
packing in one cluster while allowing packing in another cluster.

Change-Id: Ie4cc73bb1a55b4eac5697be38e558546161faca1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:26 -07:00
Olav Haugan
3f947e7ba7 sched: Add sysctl to enable power aware scheduling
Add sysctl to enable energy awareness at runtime. This is useful for
performance/power tuning/measurements and debugging. In addition this
will match up with the Documentation/scheduler/sched-hmp.txt documentation.

Change-Id: I0a9185498640d66917b38bf5d55f6c59fc60ad5c
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
2016-03-23 20:01:24 -07:00
Joonwoo Park
790d5d8a4a sched: Prevent race conditions where upmigrate_min_nice changes
When upmigrate_min_nice is changed dec_nr_big_small_task() can trigger
BUG_ON(rq->nr_big_tasks < 0).  This happens when there is a task which was
considered as non-big task due to its nice > upmigrate_min_nice and later
upmigrate_min_nice is changed to higher value so the task becomes big task.
In this case runqueue still has nr_big_tasks = 0 incorrectly with current
implementation.  Consequently next scheduler tick sees a big task to
schedule and try to decrease nr_big_tasks which is already 0.

Introduce sched_upmigrate_min_nice which is updated atomically and re-count
the number of big and small tasks to fix BUG_ON() triggering.

Change-Id: I6f5fc62ed22bbe5c52ec71613082a6e64f406e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:22 -07:00
Olav Haugan
90dc3fa9a7 sched: Avoid frequent task migration due to EA in lb
A new tunable exists that allow task migration to be throttled when the
scheduler tries to do task migrations due to Energy Awareness (EA). This
tunable is only taken into account when migrations occur in the tick
path. Extend the usage of the tunable to take into account the load
balancer (lb) path also.

In addition ensure that the start of task execution on a CPU is updated
correctly. If a task is preempted but still runnable on the same CPU the
start of execution should not be updated. Only update the start of
execution when a task wakes up after sleep or moves to a new CPU.

Change-Id: I6b2a8e06d8d2df8e0f9f62b7aba3b4ee4b2c1c4d
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
[joonwoop@codeaurora.org: fixed conflict in group_classify() and
 set_task_cpu().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:21 -07:00
Srivatsa Vaddagiri
29a412dffa sched: Avoid frequent migration of running task
Power values for cpus can drop quite considerably when it goes idle.
As a result, the best choice for running a single task in a cluster
can vary quite rapidly. As the task keeps hopping cpus, other cpus go
idle and start being seen as more favorable target for running a task,
leading to task migrating almost every scheduler tick!

Prevent this by keeping track of when a task started running on a cpu
and allowing task migration in tick path (migration_needed()) on
account of energy efficiency reasons only if the task has run
sufficiently long (as determined by sysctl_sched_min_runtime
variable).

Note that currently sysctl_sched_min_runtime setting is considered
only in scheduler_tick()->migration_needed() path and not in
idle_balance() path. In other words, a task could be migrated to
another cpu which did a idle_balance(). This limitation should not
affect high-frequency migrations seen typically (when a single
high-demand task runs on high-performance cpu).

CRs-Fixed: 756570
Change-Id: I96413b7a81b623193c3bbcec6f3fa9dfec367d99
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in set_task_cpu() and
 __schedule().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:01:17 -07:00
Srivatsa Vaddagiri
72b7c5d36c sched: Provide knob to prefer mostly_idle over idle cpus
sysctl_sched_prefer_idle lets the scheduler bias selection of
idle cpus over mostly idle cpus for tasks. This knob could be
useful to control balance between power and performance.

Change-Id: Ide6eef684ef94ac8b9927f53c220ccf94976fe67
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:01:12 -07:00
Steve Muckle
588055e8c7 sched: make sched_cpu_high_irqload a runtime tunable
It may be desirable to be able to alter the scehd_cpu_high_irqload
setting easily, so make it a runtime tunable value.

Change-Id: I832030eec2aafa101f0f435a4fd2d401d447880d
Signed-off-by: Steve Muckle <smuckle@codeaurora.org>
2016-03-23 20:01:11 -07:00
Srivatsa Vaddagiri
b2e57842c0 sched: per-cpu mostly_idle threshold
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.

This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine-grained manner.

Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:59 -07:00
Srivatsa Vaddagiri
98f89f00dc sched: update governor notification logic
Make criteria for notifying governor to be per-cpu. Governor is
notified of any large change in cpu's busy time statistics
(rq->prev_runnable_sum) since the last reported value.

Change-Id: I727354d994d909b166d093b94d3dade7c7dddc0d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:54 -07:00
Srivatsa Vaddagiri
c12a2b5ab9 sched: Use absolute scale for notifying governor
Make the tunables used for deciding the need for notification to be on
absolute scale. The earlier scale (in percent terms relative to
cur_freq) does not work well with available range of frequencies. For
example, 100% tunable value would work well for lower range of
frequencies and not for higher range. Having the tunable to be on
absolute scale makes tuning more realistic.

Change-Id: I35a8c4e2f2e9da57f4ca4462072276d06ad386f1
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:51 -07:00
Srivatsa Vaddagiri
3a67b4ce87 sched: window-stats: Enhance cpu busy time accounting
rq->curr/prev_runnable_sum counters represent cpu demand from various
tasks that have run on a cpu. Any task that runs on a cpu will have a
representation in rq->curr_runnable_sum. Their partial_demand value
will be included in rq->curr_runnable_sum. Since partial_demand is
derived from historical load samples for a task, rq->curr_runnable_sum
could represent "inflated/un-realistic" cpu usage. As an example, lets
say that task with partial_demand of 10ms runs for only 1ms on a cpu.
What is included in rq->curr_runnable_sum is 10ms (and not the actual
execution time of 1ms). This leads to cpu busy time being reported on
the upside causing frequency to stay higher than necessary.

This patch fixes cpu busy accounting scheme to strictly represent
actual usage. It also provides for conditional fixup of busy time upon
migration and upon heavy-task wakeup.

CRs-Fixed: 691443
Change-Id: Ic4092627668053934049af4dfef65d9b6b901e6b
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in init_task_load(),
 se.avg.decay_count has deprecated.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:00:50 -07:00
Srivatsa Vaddagiri
c9d0953c31 sched: improve logic for alerting governor
Currently we send notification to governor not taking note of cpus
that are synchronized with regard to their frequency. As a result,
scheduler could send pointless notifications (notification spam!).

Avoid this by considering synchronized cpus and alerting governor only
when the highest demand of any cpu within cluster far exceeds or falls
behind current frequency.

Change-Id: I74908b5a212404ca56b38eb94548f9b1fbcca33d
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:48 -07:00
Srivatsa Vaddagiri
d8932ae7df sched: window-stats: legacy mode
Support legacy mode, which results in busy time being seen by governor
that is close to what it would have seen via existing APIs i.e
get_cpu_idle_time_us(), get_cpu_iowait_time_us() and
get_cpu_idle_time_jiffy(). In particular, legacy mode means that only
task execution time is counted in rq->curr_runnable_sum and
rq->prev_runnable_sum. Also task migration does not result in
adjustment of those counters.

Change-Id: If374ccc084aa73f77374b6b3ab4cd0a4ca7b8c90
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:26 -07:00
Srivatsa Vaddagiri
e39131c3be sched: window-stats: Code cleanup
Remove code duplication associated with update of various window-stats
related sysctl tunables

Change-Id: I64e29ac065172464ba371a03758937999c42a71f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:24 -07:00
Olav Haugan
8eede4a8d5 sched: Make RAVG_HIST_SIZE tunable
Make RAVG_HIST_SIZE available from /proc/sys/kernel/sched_ravg_hist_size
to allow tuning of the size of the history that is used in computation
of task demand.

CRs-fixed: 706138
Change-Id: Id54c1e4b6e974a62d787070a0af1b4e8ce3b4be6
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: fixed minor conflict in sysctl.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-03-23 20:00:19 -07:00
Srivatsa Vaddagiri
4641b37da8 sched: window-stats: Allow acct_wait_time to be tuned
Add sysctl interface to tune sched_acct_wait_time variable at runtime

Change-Id: I38339cdb388a507019e429709a7c28e80b5b3585
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
2016-03-23 20:00:15 -07:00