migrate_tasks() migrates all tasks of a CPU by using pick_next_task().
This works in the hotplug case as we force migrate every single task
allowing pick_next_task() to return a new task on every loop iteration.
In the case of isolation, however, task migration is not guaranteed
which causes pick_next_task() to keep returning the same task over and
over again until we terminate the loop without having migrated all the
tasks that were supposed to migrated.
Fix the above problem by temporarily dequeuing tasks that are pinned
and marking them with TASK_ON_RQ_MIGRATING. This not only allows
pick_next_task() to properly walk the runqueue but also prevents any
migrations or changes in affinity for the dequeued tasks. Once we are
done with migrating all possible tasks, we re-enqueue all the dequeued
tasks.
While at it, ensure consistent ordering between task de-activation and
setting the TASK_ON_RQ_MIGRATING flag across all scheduling classes.
Change-Id: Id06151a8e34edab49ac76b4bffd50c132f0b792f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
There is a race condition between clearing an HMP request for active
migration and the actual active migration. Active migration can he
half-way through doing the migration when the HMP request can be cleared
by another core. Move clearing of HMP request to the stopper thread to
avoid this.
Change-Id: I6d73b8f246ae3754ab60984af198333fd284ae16
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
We don't want user space tasks to run on isolated cpus. If the affinity
mask that the user space task is trying to set only includes online
cpus that are isolated return error.
Also ensure that tasks do not get stuck on isolated cores. We are not
properly updating the mask that we check against the current CPU so we
might end up thinking we can run on the current CPU. Fix this.
Change-Id: I078d01e63860d1fc60fc96eb0c739c0f680ae983
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
At present, sched_set_group_id() dynamically allocates structure for
colocation group to assign the given task to the group. However
this can cause deadlock as memory allocator can wakeup a task which
also tries to acquire related_thread_group_lock.
Avoid such deadlock by pre-allocating colocation structures. This
limits maximum colocation groups to static number but it's fine as it's
never expected to be a lot.
Change-Id: Ifc32ab4ead63c382ae390358ed86f7cc5b6eb2dc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Clearing the hmp request can cause a task to be freed. When a task is
freed the free call might wake up a kworker which will cause a
spinlock lockup (rq lock). Fix this by avoiding calling put_task_struct
when holding the rq lock.
In addition move call to clear_hmp_request out of stopper thread context
since it is not necessary to do this on the cpu being isolated.
Change-Id: Ie577db4701a88849560df385869ff7cf73695a05
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Before adding the irq affinity based qos request to the list, if
the affinity of the interrupt changes it will trigger notify call.
This notifier call will try to update the qos request. Accessing
the qos request which is not yet added to the list leads to a
NULL pointer exception.
Avoid this race by registering the notifier after adding the
qos request.
Change-Id: I99869cc233573b5db10e4f3224d65c29511050ea
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
If the qos value is increased only for a subset of cpu's
aggregated qos for those cpu's is still the previous value.
This is because the qos request list is maintained per
request and not per cpu. In this case as there is no change
in aggregated qos value, these cpu's are not wokenup to
take the new qos value into effect.
So wakeup cpu's even if the aggregated qos value does not change
but the cpumask changes.
Change-Id: If5a4a100108e85e04beb77e5249bd6c452672edf
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change-Id: Ie9848961d236739df5014474f2c2a781af9fb811
Git-repo: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Git-commit: 43761473c254b45883a64441dd0bc85a42f3645c
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Isolation code needs to be synchronized with both hotplug and suspend.
Ensure this by taking the lock that is taken by both paths and ensure
hotplug notifiers are processed for suspend/resume.
Change-Id: I663588cfd2f9e3972b9adc1a10887ef36cd70c57
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
The recent introduction of the schedtune cgroup controller has provided
the scheduler with added flexibility in terms of some of it's placement
features. In particular each cgroup under the schedtune controller can
now specify:
1) Whether it needs co-location along with other cgroups
2) Whether it is eligible for scheduler boost (sched_boost_enabled)
3) Whether the kernel can override the boost eligibility when necessary
(sched_boost_no_override)
The scheduler now creates a reserved co-location group at boot. This
group is used to co-locate all tasks that form part of any one of the
cgroups that have co-location enabled. This reserved group can neither
be destroyed nor reused for other purposes. Furthermore, cgroups are
only allowed to indicate their co-location preference once at boot.
Further updates are disallowed.
Since we are now creating co-location groups for an extended period of
time, there are a few other factors to consider when determining the
preferred cluster for the group. We first exclude any tasks in the
group that have not been observed to be running for a significant
amount of time. Secondly we introduce the notion of group up and down
migrate tunables to allow different migration policies than individual
tasks. Lastly we break co-location if a single task in a group exceeds
up-migrate but the total load of the group does not exceed group
up-migrate.
In terms of sched_boost, the scheduler now supports multiple types of
boost. These are:
1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that
has the sched_boost_enabled flag turned on. Little
CPUs will only be used when big CPUs can no longer
accommodate tasks. Also up-migrate all RT tasks.
2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups
except those that have the sched_boost_no_override
flag set. Force up-migrate all tasks belonging to only
those cgroups that still remain eligible for boost.
RT tasks do not get force up migrated.
3) RESTRAINED : Start frequency aggregation for co-located tasks. This
type of boost does not force up-migrate any task.
Finally the boost API removes ref-counting. This means that there can
only be a single entity using boost at any given time. If multiple
entities are managing boost, they are required to be well behaved so
that they don't interfere with one another. Even for a single client,
it is not possible to switch directly from one boost type to another.
Boost must be first turned off before switching over to a new type.
Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
At present HMP scheduler boost tends to pack tasks by taking into
account of power cost and cstate. This is suboptimal to performance
as it can lead preemption and higher latency.
Revise logic to prefer the least loaded CPU among the big cluster CPUs
when boost type is SCHED_BOOST_ON_BIG. New logic still honor the
behaviour that scheduler can place tasks on the little CPUs when the
big CPUs are all overcommitted.
Also, it's found that need_idle with boost can easily return previous
CPU when there is no idle CPU found. Fix this issue by making
need_idle flag to take precedence over sched_boost.
CRs-fixed: 1074879
Change-Id: I470bcd0588e038b4a540d337fe6a412f2fa74920
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Iterating a leader task's thread group in order to add them to a
colocation group involves a complex locking chain that ends up
causing a deadlock. The deadlock is as follows when the same task
is being referenced on three different CPUs:
----- ------ -----
CPU 0 CPU 1 CPU 2
----- ------ -----
add_task_to_group(p)
__schedule(prev = p) write_lock( ttwu(p)
related_thread_grp_lock)
lock(pi_lock)
idle_balance() wait for
p->on_cpu
load_balance() unable to acquire
p->pi_lock
send_notification()
wait for read_lock(
related_thread_grp_lock)
unable to set p->on_cpu
There are a couple of ways to resolve this deadlock in the kernel,
however, they are not trivial. For the sake of simplicity, move
the responsibility of thread group iteration back to userspace. This
would apply to both adding and removing the leader task from a
colocation group. The kernel would continue to automatically add
newly forked children of the colocated leader to the colocation
group.
This still leaves an issue with the locking order of the pi_lock and
the related_thread_group_lock. To solve all deadlocks, we need to avoid
taking the pi_lock in reset_all_task_stats() and instead rely on a more
heavy handed approach of taking all rq locks. The pi_lock was taken to
avoid a race between reset_all_task_stats() and sched_exit(). The race
can be avoided with rq locks as well.
Change-Id: I15323e3ef91401142d3841db59c18fd8fee753fd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Export core control boost function to make it accessible to kernel
modules.
Change-Id: I94359afa433ad57dd5bfeae3cb78a1f196cd02fe
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
During migrate_tasks, we have to drop the dead_rq lock in
order to preserve locking order when acquiring task->pi_lock.
This may allow the task to migrate off of dead_rq. Therefore,
don't attempt to migrate such a task again from dead_rq.
Change-Id: Id31b58e231d3dcd7d32e0dc7f264595d60a7c408
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Migrate tasks function is used by both hotplug and cpu isolation. During
hotplug all the cpus are stalled (in stop machine) while tasks are being
migrated. However, this is not the case during cpu isolation. A task
that was counted as a pinned thread might have been migrated off the
cpu. Take this into account when checking whether we have completed
moving all tasks off the runqueue.
Also ignore warning about tasks moving off the run-queue for isolation
use case.
Change-Id: I5c5f25eb9b1eaf0605b606a65e0ac86996fa5f27
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Cluster cpu list traversal is not properly protected against removal of
element by a separate thread. Add proper locking to ensure an element
cannot be removed while accessing the list.
In addition ensure we don't end up in a livelock never exiting the loop
due to hotplug continuously moving elements to the end of the list.
Change-Id: Ie98fe48c2f4fdd0244573229b77ee9823df9e214
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
This reverts 'commit 7112993181 ("input: touchscreen: synaptics v1.1")'
This change is not needed in 4.4 kernel.
Change-Id: I89ab8f353bc04bc0a04d5f5a6993e8e8e5ebbd2e
Signed-off-by: Abinaya P <abinayap@codeaurora.org>
Signed-off-by: Shantanu Jain <shjain@codeaurora.org>
A CPU that is isolated needs to have its timers migrated off to
another CPU. If while migrating timers, there is a running
timer, acquiring the timer base lock after marking a CPU as
isolated will ensure that:
1) No more timers can be queued on to the isolated CPU, and
2) A running timer will finish execution on the to-be-isolated
CPU, and so will any just expired timers since they're all
taken off of the CPU's tvec1 in one go while the base lock
is held.
Therefore there is no apparent reason to wait for the expired
timers to finish execution, and isolation can proceed to migrate
non-expired timers even when the expired ones are running
concurrently.
While we're here, also add a delay to the wait-loop inside
migrate_hrtimer_list to allow for store-exclusive fairness
when run_hrtimer is attempting to grab the hrtimer base
lock.
Change-Id: Ib697476c93c60e3d213aaa8fff0a2bcc2985bfce
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
There is a race between watchdog being enabled by hotplug and core
isolation disabling the watchdog. When a CPU is hotplugged in and
the hotplug lock has been released the watchdog thread might not
have run yet to enable the watchdog. We have to wait for the
watchdog to be enabled before proceeding.
Change-Id: I88f73603b6d389a46f8e819d9b490091d5ba4fe9
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
To move tasks off a cpu when offlining the rq needs to be offlined to
un-throttle tasks. However, tasks might still run on the CPU even after
the CPU has been isolated (per-CPU threads). Thus we should leave the rq
in online state after tasks have been moved.
Change-Id: I61486e8648af0dbb82595fe699e1bc158e837362
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
There is a race condition between checking for whether an active load
balance request has been set and clearing the request. A cpu might have
an active load balance request set and queued but not executed yet.
Before the load balance request is executed the request flag might be
cleared by cpu isolation. Then subsequently the load balancer or tick
might try to do another active load balance. This can cause the same
active load balance work to be queued twice causing report of list
corruption.
Fix this by moving the clearing of the request to the stopper thread and
ensuring that load balance will not try to queue a request on an
already isolated cpu.
Change-Id: I5c900d2ee161fa692d66e3e66012398869715662
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
The scheduler allocates memory for the task load structures during
fork. It then relies to sched_exit() to be called to free that memory.
However, if the fork itself fails at any point after the allocation,
the memory is left unclaimed forever. Fix this memory leak by freeing
the allocated memory under error conditions.
Change-Id: I14a8290c9fcc4174ec80560e9f9d7bcdb119761f
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Task load structure allocations can consume a lot of memory as the
number of tasks begin to increase. Also they might exhaust the atomic
memory pool pretty quickly if a workload starts spawning lots of
threads in a short amount of time thus increasing the possibility of
failed allocations. Move the call to init_new_task_load() outside
atomic context and start using GFP_KERNEL for allocations. There is
no need for this allocation to be in atomic context.
Change-Id: I357772e10bf8958804d9cd0c78eda27139054b21
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Recent changes to scheduler guided frequency have started reporting
the maximum of the cpu load and the load of the top task on a CPU
to the governor. Use the same information to determine whether a
notification is necessary or not.
Change-Id: I1928c6cd0509952443a912ef54e0d72d5f75955d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Capping load when reporting to the governor was important prior to new
scheduler guided frequency changes as intra-cluster migrations would
sometimes lead to CPU loads well in excess of 100%. With the new top
task approach however, load greater than 100% is no longer possible
except for the same conditions that were previously exempted (i.e.
inter-cluster migrations and frequency aggregation).
Change-Id: I3e4f5e39ec9ae7eeaba9a567efd245a7aec1b7ad
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Change of colocation group requires to finish CPU busy time accounting
prior to its operation by calling update_task_ravg(). However when
window statistics accounting is disabled, update_task_ravg() acts as
nop and results in incorrect CPU time accounting.
Disallow colocation group change while window statistics accounting is
disabled in order to prevent race between reset_all_window_stats() and
colocation grouping functions.
Change-Id: I6dfa20b8d8b0ae7ccc94119bf9cf14c5e11a1cf7
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>