Commit graph

94 commits

Author SHA1 Message Date
Linux Build Service Account
fbfd0301be Merge "sched: Disable interrupts while holding related_thread_group_lock" 2016-11-29 07:44:06 -08:00
Pavankumar Kondeti
822561f075 sched: Fix out of bounds array access in sched_reset_all_window_stats()
A new reset reason code "FREQ_AGGREGATE_CHANGE" is added to
reset_reason_code enum but the corresponding string array is not
updated. Fix this.

Change-Id: I2a17d95328bef91c4a5dd4dde418296efca44431
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-11-29 15:48:34 +05:30
Olav Haugan
e5c095a2c7 sched/core: Do not free task while holding rq lock
Clearing the hmp request can cause a task to be freed. When a task is
freed the free call might wake up a kworker which will cause a
spinlock lockup (rq lock). Fix this by avoiding calling put_task_struct
when holding the rq lock.

In addition move call to clear_hmp_request out of stopper thread context
since it is not necessary to do this on the cpu being isolated.

Change-Id: Ie577db4701a88849560df385869ff7cf73695a05
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-11-28 11:00:29 -08:00
Pavankumar Kondeti
d0ff1c04e8 sched: Disable interrupts while holding related_thread_group_lock
There is a potential deadlock condition if interrupts are enabled
while holding the related_thread_group_lock. Prevent this.

----------------                              --------------------
    CPU 0                                          CPU 1
---------------                               --------------------

check_for_migration()                         cgroup_file_write(p)

check_for_freq_change()                       cgroup_attach_task(p)

send_notification()                           schedtune_attach(p)

read_lock(&related_thread_group_lock)         sched_set_group_id(p)

                                              raw_spin_lock_irqsave(
					       &p->pi_lock, flags)

					      write_lock_irqsave(
					       &related_thread_group_lock)

					       waiting on CPU#0

raw_spin_lock_irqsave(&rq->lock, flags)

raw_spin_unlock_irqrestore(&rq->lock, flags)

--> interrupt()

----> ttwu(p)

-------> waiting for p's pi_lock on CPU#1

Change-Id: I6f0f8f742d6e1b3ff735dcbeabd54ef101329cdf
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-11-28 22:15:56 +05:30
Syed Rameez Mustafa
30fc774235 sched/hmp: Enhance co-location and scheduler boost features
The recent introduction of the schedtune cgroup controller has provided
the scheduler with added flexibility in terms of some of it's placement
features. In particular each cgroup under the schedtune controller can
now specify:

1) Whether it needs co-location along with other cgroups
2) Whether it is eligible for scheduler boost (sched_boost_enabled)
3) Whether the kernel can override the boost eligibility when necessary
   (sched_boost_no_override)

The scheduler now creates a reserved co-location group at boot. This
group is used to co-locate all tasks that form part of any one of the
cgroups that have co-location enabled. This reserved group can neither
be destroyed nor reused for other purposes. Furthermore, cgroups are
only allowed to indicate their co-location preference once at boot.
Further updates are disallowed.

Since we are now creating co-location groups for an extended period of
time, there are a few other factors to consider when determining the
preferred cluster for the group. We first exclude any tasks in the
group that have not been observed to be running for a significant
amount of time. Secondly we introduce the notion of group up and down
migrate tunables to allow different migration policies than individual
tasks. Lastly we break co-location if a single task in a group exceeds
up-migrate but the total load of the group does not exceed group
up-migrate.

In terms of sched_boost, the scheduler now supports multiple types of
boost. These are:

1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that
                   has the sched_boost_enabled flag turned on. Little
                   CPUs will only be used when big CPUs can no longer
                   accommodate tasks. Also up-migrate all RT tasks.

2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups
                  except those that have the sched_boost_no_override
                  flag set. Force up-migrate all tasks belonging to only
                  those cgroups that still remain eligible for boost.
                  RT tasks do not get force up migrated.

3) RESTRAINED : Start frequency aggregation for co-located tasks. This
                type of boost does not force up-migrate any task.

Finally the boost API removes ref-counting. This means that there can
only be a single entity using boost at any given time. If multiple
entities are managing boost, they are required to be well behaved so
that they don't interfere with one another. Even for a single client,
it is not possible to switch directly from one boost type to another.
Boost must be first turned off before switching over to a new type.

Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-16 17:57:56 -08:00
Syed Rameez Mustafa
8b74c7eb5f sched: Remove thread group iteration from colocation
Iterating a leader task's thread group in order to add them to a
colocation group involves a complex locking chain that ends up
causing a deadlock. The deadlock is as follows when the same task
is being referenced on three different CPUs:

-----                     ------                      -----
CPU 0                     CPU 1                       CPU 2
-----                     ------                      -----
                          add_task_to_group(p)

__schedule(prev = p)      write_lock(                 ttwu(p)
                          related_thread_grp_lock)
                                                      lock(pi_lock)

idle_balance()                                        wait for
                                                      p->on_cpu
load_balance()            unable to acquire
                          p->pi_lock
send_notification()

wait for read_lock(
related_thread_grp_lock)

unable to set p->on_cpu

There are a couple of ways to resolve this deadlock in the kernel,
however, they are not trivial. For the sake of simplicity, move
the responsibility of thread group iteration back to userspace. This
would apply to both adding and removing the leader task from a
colocation group. The kernel would continue to automatically add
newly forked children of the colocated leader to the colocation
group.

This still leaves an issue with the locking order of the pi_lock and
the related_thread_group_lock. To solve all deadlocks, we need to avoid
taking the pi_lock in reset_all_task_stats() and instead rely on a more
heavy handed approach of taking all rq locks. The pi_lock was taken to
avoid a race between reset_all_task_stats() and sched_exit(). The race
can be avoided with rq locks as well.

Change-Id: I15323e3ef91401142d3841db59c18fd8fee753fd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-16 17:57:35 -08:00
Syed Rameez Mustafa
576259be4a sched/hmp: Use GFP_KERNEL for top task memory allocations
Task load structure allocations can consume a lot of memory as the
number of tasks begin to increase. Also they might exhaust the atomic
memory pool pretty quickly if a workload starts spawning lots of
threads in a short amount of time thus increasing the possibility of
failed allocations. Move the call to init_new_task_load() outside
atomic context and start using GFP_KERNEL for allocations. There is
no need for this allocation to be in atomic context.

Change-Id: I357772e10bf8958804d9cd0c78eda27139054b21
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-07 14:46:21 -08:00
Syed Rameez Mustafa
ecd8f7800f sched/hmp: Use improved information for frequency notifications
Recent changes to scheduler guided frequency have started reporting
the maximum of the cpu load and the load of the top task on a CPU
to the governor. Use the same information to determine whether a
notification is necessary or not.

Change-Id: I1928c6cd0509952443a912ef54e0d72d5f75955d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-07 14:46:20 -08:00
Syed Rameez Mustafa
54052c3658 sched/hmp: Remove capping when reporting load to the cpufreq governor
Capping load when reporting to the governor was important prior to new
scheduler guided frequency changes as intra-cluster migrations would
sometimes lead to CPU loads well in excess of 100%. With the new top
task approach however, load greater than 100% is no longer possible
except for the same conditions that were previously exempted (i.e.
inter-cluster migrations and frequency aggregation).

Change-Id: I3e4f5e39ec9ae7eeaba9a567efd245a7aec1b7ad
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-11-07 14:46:19 -08:00
Joonwoo Park
dfb9634d03 sched: prevent race between disable window statistics and task grouping
Change of colocation group requires to finish CPU busy time accounting
prior to its operation by calling update_task_ravg().  However when
window statistics accounting is disabled, update_task_ravg() acts as
nop and results in incorrect CPU time accounting.

Disallow colocation group change while window statistics accounting is
disabled in order to prevent race between reset_all_window_stats() and
colocation grouping functions.

Change-Id: I6dfa20b8d8b0ae7ccc94119bf9cf14c5e11a1cf7
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-11-04 17:25:30 -07:00
Linux Build Service Account
4b37769e9e Merge "sched/hmp: Automatically add children threads to colocation group" 2016-11-02 14:41:34 -07:00
Linux Build Service Account
c0961b67cc Merge "sched/hmp: Disable interrupts when resetting all task stats" 2016-10-31 06:59:15 -07:00
Syed Rameez Mustafa
6385a475e0 sched/hmp: Disable interrupts when resetting all task stats
Taking the pi_lock without disabling interrupts in reset_all_task_stats()
is problematic. In that an interrupt can end up waking a task which in
turn needs the pi_lock again causing a deadlock. Disable interrupts along
with taking the lock to avoid this problem.

Change-Id: If27cb2bb3fcaafa5c8435f3c2e0e4be9b8f1e987
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-28 12:12:41 -07:00
Syed Rameez Mustafa
740c2801a9 sched/hmp: Automatically add children threads to colocation group
When sched_enable_thread_grouping is turned on, the scheduler needs
to ensure that any pre-existing children of a task get added to the
co-location group. Upon removal from the co-location group, however,
the scheduler does not check for the thread grouping flag because
userspace cannot ensure correct behavior. Therefore as a
precautionary measure to avoid memory leaks the scheduler has to
forcefully remove children from the group regardless of the flag
setting.

While at it, also make group management a lot simpler. Without these
simplifications, we can end up in extremely complicated locking scenarios
where ensuring the correct order to avoid deadlocks is near impossible.

Change-Id: I4c13601b0fded6de9d8f897c6d471c6a40c90e4d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-27 19:21:00 -07:00
Olav Haugan
95ceec13f7 sched: Fix compilation issue with reset_hmp_stats
reset_hmp_stats was moved to another file and when CONFIG_CFS_BANDWIDTH
is enabled there is code still referencing this in the original file
causing compilation error.

Change-Id: Iab7fc8551b628c443ce751026b06c5ff4ebba39a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-10-25 13:13:09 -07:00
Linux Build Service Account
cc0c20f3fa Merge "sched/core_ctl: Move header file to global location" 2016-10-20 18:37:44 -07:00
Linux Build Service Account
3e7ad9dfd4 Merge "sched/hmp: Fix range checking for target load" 2016-10-19 19:11:49 -07:00
Olav Haugan
04daea81fc sched/hmp: Fix range checking for target load
The range check for target load is incorrect. Fix this. This is only a
sanity check to catch badly specified target loads.

Change-Id: Ia90d020f5e0bdf37c600661a1c246dab5b637b3b
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-10-19 10:35:52 -07:00
Olav Haugan
76ac2a2803 sched/core_ctl: Move header file to global location
Move the header file of core control to the standard linux include
directory to allow other entities to include this file.

Change-Id: I2ddb8b3b96063be3c6a6cb6bc333998e007f9de7
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-10-18 18:10:52 -07:00
Syed Rameez Mustafa
066f712e43 sched: Add multiple load reporting policies for cpu frequency
The previous patches in this series introduce the mechanics of CPU
load tracking without fixups for intra cluster migration and top task
load tracking. Add a tunable that dictates what of the above needs to
be considered when reporting load to the governor. The default policy
is to take the maximum of the CPU load and top task load.

Change-Id: Ie585a11ed774b929910d04c41471db3a2a102ec5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:45:54 -07:00
Syed Rameez Mustafa
dc09dd60a0 sched: Optimize the next top task search logic upon task migration
find_next_top_index() is responsible for finding the second top task
on a CPU when the top task migrates away from that CPU. This operation
is expensive as we need to iterate the entire array of top tasks to
find the second top task.

Optimize this by introducing bitmaps for tracking top task indices.
There are two bitmaps; one for the previous window and one for the
current window. Each bit in a bitmap tracks whether the corresponding
bucket in the top task hashmap has a non zero refcount. The bit is set
when the refcount becomes non zero and is cleared when it becomes zero.

Finding the second top task upon migration is then simply a matter of
finding the highest set bit in the bitmap.

Change-Id: Ibafaf66eed756b0328704dfaa89c17ab0d84e359
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:45:51 -07:00
Syed Rameez Mustafa
7bd09f2441 sched: Add the mechanics of top task tracking for frequency guidance
The previous patches in this rewrite of scheduler guided frequency
selection reintroduces the part-picture problem that we addressed in
our initial implementation. In that, when tasks migrate across CPUs
within a cluster, we end up losing the complete picture of the
sequential nature of the workload.

This patch aims to solve that problem slightly differently. We track
the top task on every CPU within a window. Top task is defined as the
task that runs the most in a given window. This enhances our ability
to detect the sequential nature of workloads. A single migrating task
executing for an entire window will cause 100% load to be reported
for frequency guidance instead of the maximum footprint left on any
individual CPU in the task's trail. There are cases, that this new
approach does not address. Namely, cases where the sum of two or more
tasks accurately reflects the true sequential nature of the workload.
Future optimizations might aim to tackle that problem.

To track top tasks, we first realize that there is no strict need to
maintain the task struct itself as long as we know the load exerted by
the top task. We also realize that to maintain top tasks on every CPU
we have to track the execution of every single task that runs during
the window. The load associated with a task needs to be migrated when
the task migrates from one CPU to another. When the top task migrates
away, we need to locate the second top task and so on.

Given the above realizations, we use hashmaps to track top task load
both for the current and the previous window. This hashmap is
implemented as an array of fixed size. The key of the hashmap is given
by task_execution_time_in_a_window / array_size. The size of the array
(number of buckets in the hashmap) dictate the load granularity of each
bucket. The value stored in each bucket is a refcount of all the tasks
that executed long enough to be in that bucket.

This approach has a few benefits. Firstly, any top task stats update
now take O(1) time. While task migration is also O(1), it does still
involve going through up to the size of the array to find the second
top task. Further patches will aim to optimize this behavior. Secondly,
and more importantly, not having to store the task struct itself saves
a lot of memory usage in that 1) there is no need to retrieve task
structs later causing cache misses and 2) we don't have to unnecessarily
hold up task memory for up to 2 full windows by calling get_task_struct()
after a task exits.

Change-Id: I004dba474f41590db7d3f40d9deafe86e71359ac
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:43:55 -07:00
Syed Rameez Mustafa
7e1a4f15b2 sched: Enhance the scheduler migration load fixup feature
In the current frequency guidance implementation the scheduler migrates
task load from the source CPU to the destination CPU when a task migrates.
The underlying assumption is that a task will stay on the destination CPU
following the migration. Hence a CPU's load should reflect the sum of
all tasks that last ran on that CPU prior to window expiration even if
these tasks executed on some other CPU in that window prior to being
migrated.

However, given the ubiquitous nature of migrations the above assumption
is flawed causing the scheduler to often add up load on a single CPU
that in reality ran concurrently on multiple CPUs and will continue to
run concurrently in subsequent windows. This leads to load over
reporting on a single CPU which in turn causes CPU frequency to be higher
than necessary.

This is the first patch in a series of patches that attempts to change
how load fixups are done upon migration to prevent load over reporting.
In this patch, we stop doing migration fixups for intra-cluster
migrations. Inter-cluster migration fixups are still retained.

In order to achieve the above, we make use the per CPU footprint of each
task introduced in the previous patch. Upon inter cluster migration, we
go through every CPU in the source cluster to subtract the migrating
task's contribution to the busy time on each one of those CPUs. The sum
of the contributions is then added to the destination CPU allowing it
to ramp up to the appropriate frequency for that task.

Subtracting load from each of the source CPUs is not trivial, however,
as it would require all runqueue locks to held. To get around this
we introduce a deferred load subtraction mechanism whereby subtracting
load from each of the source CPUs in deferred until an opportune moment.
This opportune moment is when the governor comes asking the scheduler
for load. At that time, all necessary runqueue locks are already held.

There are a few cases to consider when doing deferred subtraction. Since
we are not holding all runqueue locks other CPUs in the source cluster
can be in a different window than the source CPU where the task
is migrating from.

Case 1: Other CPU in the source cluster is in the same window
No special consideration

Case 2: Other CPU in the source cluster is ahead by 1 window
In this case, we will be doing redundant updates to subtraction load
for the prev window. There is no way to avoid this redundant update
though, without holding the rq lock.

Case 3: Other CPU in the source cluster is trailing by 1 window
In this case, we might end up overwriting old data for that CPU. But
this is not a problem as when the other CPU calls update_task_ravg()
it will move to the same window. This relies on maintaining
synchronized windows between CPUs, which is true today.

Finally, we must deal with frequency aggregation. When frequency
aggregation is in effect, there is little point in dealing with per
CPU footprint since the load of all related tasks have to be reported
on a single CPU. Therefore when a task enters a related group we clear
out all per CPU contributions and add it to the task CPU's cpu_time
struct. From that point onwards we stop managing per CPU contributions
upon inter cluster migrations since that work is redundant. Finally
when a task exits a related group we must walk every CPU in reset
all CPU contributions. We then set the task CPU contribution to the
respective curr/prev sum values and add that sum to the task CPU
rq runnable sum.

Change-Id: I1f8d596e6c930f3f6f00e24109ddbe8b121f8d6b
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:43:54 -07:00
Syed Rameez Mustafa
eb7300e9a8 sched: Add per CPU load tracking for each task
Keeping a track of the load footprint of each task on every CPU
that it executed on gives the scheduler much more flexibility in
terms of the number of frequency guidance policies. These new fields
will be used in subsequent patches as we alter the load fixup
mechanism upon task migration. We still need to maintain the
curr/prev_window sums as they will also be required in subsequent
patches as we start to track top tasks based on cumulative load.

Also, we need to call init_new_task_load() for the idle task. This
is an existing harmless bug as load tracking for the idle task is
irrelevant. However, in this patch we are adding pointers to the
ravg structure. These pointers have to be initialized even for the
idle task.

Finally move init_new_task_load() to sched_fork(). This was always
the more appropriate place, however, following the introduction of
new pointers in the ravg struct, this is necessary to avoid races
with functions such as reset_all_task_stats().

Change-Id: Ib584372eb539706da4319973314e54dae04e5934
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-10-17 12:43:53 -07:00
Joonwoo Park
825b7ef93a sched: bucketize CPU c-state levels
C-state aware scheduler takes note of wakeup latency of each c-state
level to determine whether to pack or wake up LPM CPU.  But it doesn't
distinguish small and large delta as it's inefficient for scheduler to
do so on its critical path.

Disregard wakeup latencies less than 64 us between different c-state
levels.  This reduces unnecessary task packing.

CRs-fixed: 1074879
Change-Id: Ib0cadbd390d1a0b6da3e39c98010cedb43e5bf60
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-10-12 14:14:16 -07:00
Linux Build Service Account
6312438224 Merge "sched: Fix a division by zero bug in scale_exec_time()" 2016-10-06 01:07:16 -07:00
Linux Build Service Account
fb89803f09 Merge "sched: Add a device tree property to specify the sched boost type" 2016-10-05 19:29:18 -07:00
Linux Build Service Account
a6e4924acb Merge "sched: add a knob to prefer the waker CPU for sync wakeups" 2016-10-03 10:34:58 -07:00
Pavankumar Kondeti
a86b380f35 sched: Add a device tree property to specify the sched boost type
The HMP scheduler has two types of task placement boost policies.

(1) boost-on-big policy make use of all big CPUs up to their full capacity
before using the little CPUs. This improves performance on true b.L systems
where the big CPUs have higher efficiency compared to the little CPUs.

(2) boost-on-all policy place the tasks on the CPU having the highest
spare capacity. This policy is optimal for SMP like systems.

The scheduler sets the boost policy to boost-on-big on systems which has
CPUs of different efficiencies. However it is possible that CPUs of the
same micro architecture to have slight difference in efficiency due to
other factors like cache size. Selecting the boost-on-big policy based
on relative difference in efficiency is not optimal on such systems.
The boost-policy device tree property is introduced to specify the
required boost type and it overrides the default selection of boost
type in the scheduler. The possible values for this property are
"boost-on-big" and "boost-on-all".

Change-Id: Iac19183fa7d4bfd9e5746b02a02b2b19cf64b78d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:54:45 +05:30
Pavankumar Kondeti
f1e9995fe4 sched: add a knob to prefer the waker CPU for sync wakeups
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.

However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.

Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-02 10:52:06 +05:30
Pavankumar Kondeti
7c3461a6ac sched: Fix a division by zero bug in scale_exec_time()
When cycle_counter is used to estimate the frequency, calling
update_task_ravg() twice on the same task without refreshing
the wallclock results in a division by zero bug. Add a safety
check in update_task_ravg() to prevent this.

The above bug is hit from __schedule() when next == prev. There
is no need to call update_task_ravg() twice for PUT_PREV_TASK
and PICK_NEXT_TASK events for the same task. Calling
update_task_ravg() with TASK_UPDATE event is sufficient.

Change-Id: Ib3af9004f2462618c535b8195377bedb584d0261
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-10-01 10:05:08 +05:30
Linux Build Service Account
818156d2ac Merge "sched: don't assume higher capacity means higher power in lb" 2016-09-30 18:23:48 -07:00
Linux Build Service Account
f6d68e27bf Merge "sched: constrain HMP scheduler tunable range with in better way" 2016-09-29 11:20:30 -07:00
Pavankumar Kondeti
9d128dbca3 sched: don't assume higher capacity means higher power in lb
The load balancer restrictions are in place to control the tasks
migration from the lower capacity cluster to higher capacity
cluster to save power. The assumption here is that higher capacity
cluster will have higher power cost which may not be necessarily
true for all platforms. Use power cost based checks instead of
capacity based checks while applying the inter cluster migration
restrictions.

Change-Id: Id9519eb8f7b183a2e9fca87a23cf95e951aa4005
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-09-29 10:07:19 +05:30
Olav Haugan
e42aae958b sched/core_ctl: Integrate core control with cpu isolation
Replace hotplug functionality in core control with cpu isolation
and integrate into scheduler.

Change-Id: I4f1514ba5bac2e259a1105fcafb31d6a92ddd249
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-09-24 11:00:06 -07:00
Olav Haugan
e33c24bfec sched: add cpu isolation support
This adds cpu isolation APIs to the scheduler to isolate and unisolate
CPUs. Isolating and unisolating a CPU can be used in place of hotplug.
Isolating and unisolating a CPU is faster than hotplug and can thus be
used to optimize the performance and power of multi-core CPUs.

Isolating works by migrating non-pinned IRQs and tasks to other CPUS and
marking the CPU as not available to the scheduler and load balancer.
Pinned tasks and IRQs are still allowed to run but it is expected that
this would be minimal.

Unisolation works by just marking the CPU available for scheduler and
load balancer.

Change-Id: I0bbddb56238c2958c5987877c5bfc3e79afa67cc
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
2016-09-24 10:55:17 -07:00
Joonwoo Park
cc60f0790f sched: constrain HMP scheduler tunable range with in better way
HMP scheduler tunables can be constrained via extra1 and extra2 of
ctl_table.  Having valid range in the sysctl table gives clearer
view of tunable's range.

Also add range for sched_select_prev_cpu_us so we can avoid invalid
value configuration of that tunable.

CRs-fixed: 1056910
Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
2016-09-21 16:50:42 -07:00
Syed Rameez Mustafa
1389927146 sched: Move data structures under CONFIG_SCHED_HMP
Frequency-demand conversion data structures are only used under
CONFIG_SCHED_HMP. Move them out of sched.h into hmp.c to where they
actually belong after the recent refactor.

Change-Id: I3c3eebca86062f11b80af93ba3716695eb787376
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-09 15:35:38 -07:00
Syed Rameez Mustafa
591ce8ed84 sched: Further re-factor HMP specific code
The structures being moved around are only used for trace events
defined under CONFIG_SCHED_HMP. Move code to hmp.c to reflect
the same.

Change-Id: Ib959355264405ab779b24948f111a2ca61d367de
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-09-08 15:44:52 -07:00
Pavankumar Kondeti
078568e425 sched: Introduce sched_freq_aggregate_threshold tunable
Do the aggregation for frequency only when the total group busy time
is above sched_freq_aggregate_threshold. This filtering is especially
needed for the cases where groups are created by including all threads
of an application process. This knob can be tuned to apply aggregation
only for the heavy workload applications.

When this knob is enabled and load is aggregated, the load is not
clipped to 100% @ current frequency to ramp up the frequency faster.

Change-Id: Icfd91c85938def101a989af3597d3dcaa8026d16
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:37 -07:00
Pavankumar Kondeti
2552980f79 sched: handle frequency alert notifications better
The load reporting during frequency alert notifications is broken under
load aggregation. When aggregation is enabled, the total group busy
time is accounted towards the maximum busy CPU of a frequency domain.
If this CPU has a notification pending, it's group busy time alone is
accounted and other CPU's group busy time is completely ignored.
Similarly if any CPU other than maximum busy CPU has a pending
notification, its group busy time is accounted twice.

Maintain the frequency alert notification flag per frequency domain.
When the notification is pending, don't clip the load to 100% @ fur
for any of the CPUs in the frequency domain.

Change-Id: Iebc7d74d6fafa20430fa1c7d80f34a6ab198832d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:36 -07:00
Pavankumar Kondeti
5ddfbfec06 sched: inherit the group id from the group leader
When sysctl_sched_enable_thread_grouping is set to 1, any new tasks
created are put in the same group as their group leader.

Change-Id: If1837dd7c8120c8b097cfffa1dc52eb4781f1641
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
2016-08-22 14:06:35 -07:00
Syed Rameez Mustafa
67e0df6e33 sched: Move notify_migration() under CONFIG_SCHED_HMP
notify_migration() is a HMP specific function that relies on all
of its contents to be stubbed out for !CONFIG_SCHED_HMP. However,
it still maintains calls to rcu_read_lock/unlock(). In the !HMP
case these calls are simply redundant. Move the function under
CONFIG_SCHED_HMP and add a stub when the config is not defined so
that there is no overhead.

Change-Id: Iad914f31b629e81e403b0e89796b2b0f1d081695
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:33 -07:00
Syed Rameez Mustafa
9095a09ab1 sched: Move most HMP specific code to a separate file.
Most code pertaining to CONFIG_SCHED_HMP has been moved to a separate
file "hmp.c" in order to facilitate kernel upgrades. Fewer changes in
the original scheduler files means fewer conflicts. Some parts of code,
however, could not be moved to the separate file either because of
dependencies with other non-HMP code or because the changes are specific
only to the scheduling classes where the code resides.

Change-Id: Ib067ac75e5a494008dcb3c67586b622c1b3962ce
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
2016-08-22 14:06:33 -07:00