At present, sched_set_group_id() dynamically allocates structure for
colocation group to assign the given task to the group. However
this can cause deadlock as memory allocator can wakeup a task which
also tries to acquire related_thread_group_lock.
Avoid such deadlock by pre-allocating colocation structures. This
limits maximum colocation groups to static number but it's fine as it's
never expected to be a lot.
Change-Id: Ifc32ab4ead63c382ae390358ed86f7cc5b6eb2dc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Unlike monotonic clock, boot clock as a trace clock will account for
time spent in suspend useful for tracing suspend/resume. This uses
earlier introduced infrastructure for using the fast boot clock.
Bug: b/33184060
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
This boot clock can be used as a tracing clock and will account for
suspend time.
To keep it NMI safe since we're accessing from tracing, we're not using a
separate timekeeper with updates to monotonic clock and boot offset
protected with seqlocks. This has the following minor side effects:
(1) Its possible that a timestamp be taken after the boot offset is updated
but before the timekeeper is updated. If this happens, the new boot offset
is added to the old timekeeping making the clock appear to update slightly
earlier:
CPU 0 CPU 1
timekeeping_inject_sleeptime64()
__timekeeping_inject_sleeptime(tk, delta);
timestamp();
timekeeping_update(tk, TK_CLEAR_NTP...);
(2) On 32-bit systems, the 64-bit boot offset (tk->offs_boot) may be
partially updated. Since the tk->offs_boot update is a rare event, this
should be a rare occurrence which postprocessing should be able to handle.
Bug: b/33184060
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
If the cpufreq driver hasn't set the CPUFREQ_HAVE_GOVERNOR_PER_POLICY
flag, then the kernel will crash on accessing sysfs files for the sched
governor.
CPUFreq governors we can have the governor specific sysfs files in two
places:
A. /sys/devices/system/cpu/cpuX/cpufreq/<governor>
B. /sys/devices/system/cpu/cpufreq/<governor>
The case A. is for governor per policy case, where we can control the
governor tunables for each policy separately. The case B. is for system
wide tunable values.
The schedfreq governor only implements the case A. and not B. The sysfs
files in case B will still be present in
/sys/devices/system/cpu/cpufreq/<governor>, but accessing them will
crash kernel as the governor doesn't support that.
Moreover the sched governor is pretty new and will be used only for the
ARM platforms and there is no need to support the case B at all.
Hence use policy->kobj instead of get_governor_parent_kobj(), so that we
always create the sysfs files in path A.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
A new reset reason code "FREQ_AGGREGATE_CHANGE" is added to
reset_reason_code enum but the corresponding string array is not
updated. Fix this.
Change-Id: I2a17d95328bef91c4a5dd4dde418296efca44431
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
With major controllers - cpu, memory and io - shaping up for the
unified hierarchy, cgroup2 is about ready to be, gradually, released
into the wild. Replace __DEVEL__sane_behavior flag which was used to
select the unified hierarchy with a separate filesystem type "cgroup2"
so that unified hierarchy can be mounted as follows.
mount -t cgroup2 none $MOUNT_POINT
The cgroup2 fs has its own magic number - 0x63677270 ("cgrp").
v2: Assign a different magic number to cgroup2 fs.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
(cherry picked from commit 67e9c74b8a873408c27ac9a8e4c1d1c8d72c93ff)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Clearing the hmp request can cause a task to be freed. When a task is
freed the free call might wake up a kworker which will cause a
spinlock lockup (rq lock). Fix this by avoiding calling put_task_struct
when holding the rq lock.
In addition move call to clear_hmp_request out of stopper thread context
since it is not necessary to do this on the cpu being isolated.
Change-Id: Ie577db4701a88849560df385869ff7cf73695a05
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Before adding the irq affinity based qos request to the list, if
the affinity of the interrupt changes it will trigger notify call.
This notifier call will try to update the qos request. Accessing
the qos request which is not yet added to the list leads to a
NULL pointer exception.
Avoid this race by registering the notifier after adding the
qos request.
Change-Id: I99869cc233573b5db10e4f3224d65c29511050ea
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
commit ceb75787bc75d0a7b88519ab8a68067ac690f55a upstream.
Make sure to drop the reference taken by class_find_device() after
opening the RTC device.
Fixes: 77437fd4e6 (pm: boot time suspend selftest)
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
If the qos value is increased only for a subset of cpu's
aggregated qos for those cpu's is still the previous value.
This is because the qos request list is maintained per
request and not per cpu. In this case as there is no change
in aggregated qos value, these cpu's are not wokenup to
take the new qos value into effect.
So wakeup cpu's even if the aggregated qos value does not change
but the cpumask changes.
Change-Id: If5a4a100108e85e04beb77e5249bd6c452672edf
Signed-off-by: Anil Kumar Mamidala <amami@codeaurora.org>
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change-Id: Ie9848961d236739df5014474f2c2a781af9fb811
Git-repo: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Git-commit: 43761473c254b45883a64441dd0bc85a42f3645c
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Isolation code needs to be synchronized with both hotplug and suspend.
Ensure this by taking the lock that is taken by both paths and ensure
hotplug notifiers are processed for suspend/resume.
Change-Id: I663588cfd2f9e3972b9adc1a10887ef36cd70c57
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
The recent introduction of the schedtune cgroup controller has provided
the scheduler with added flexibility in terms of some of it's placement
features. In particular each cgroup under the schedtune controller can
now specify:
1) Whether it needs co-location along with other cgroups
2) Whether it is eligible for scheduler boost (sched_boost_enabled)
3) Whether the kernel can override the boost eligibility when necessary
(sched_boost_no_override)
The scheduler now creates a reserved co-location group at boot. This
group is used to co-locate all tasks that form part of any one of the
cgroups that have co-location enabled. This reserved group can neither
be destroyed nor reused for other purposes. Furthermore, cgroups are
only allowed to indicate their co-location preference once at boot.
Further updates are disallowed.
Since we are now creating co-location groups for an extended period of
time, there are a few other factors to consider when determining the
preferred cluster for the group. We first exclude any tasks in the
group that have not been observed to be running for a significant
amount of time. Secondly we introduce the notion of group up and down
migrate tunables to allow different migration policies than individual
tasks. Lastly we break co-location if a single task in a group exceeds
up-migrate but the total load of the group does not exceed group
up-migrate.
In terms of sched_boost, the scheduler now supports multiple types of
boost. These are:
1) FULL_THROTTLE : Force up-migrate tasks belonging any cgroup that
has the sched_boost_enabled flag turned on. Little
CPUs will only be used when big CPUs can no longer
accommodate tasks. Also up-migrate all RT tasks.
2) CONSERVATIVE : Override the sched_boost_enabled flag for all cgroups
except those that have the sched_boost_no_override
flag set. Force up-migrate all tasks belonging to only
those cgroups that still remain eligible for boost.
RT tasks do not get force up migrated.
3) RESTRAINED : Start frequency aggregation for co-located tasks. This
type of boost does not force up-migrate any task.
Finally the boost API removes ref-counting. This means that there can
only be a single entity using boost at any given time. If multiple
entities are managing boost, they are required to be well behaved so
that they don't interfere with one another. Even for a single client,
it is not possible to switch directly from one boost type to another.
Boost must be first turned off before switching over to a new type.
Change-Id: I8d224a70cbef162f27078b62b73acaa22670861d
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
At present HMP scheduler boost tends to pack tasks by taking into
account of power cost and cstate. This is suboptimal to performance
as it can lead preemption and higher latency.
Revise logic to prefer the least loaded CPU among the big cluster CPUs
when boost type is SCHED_BOOST_ON_BIG. New logic still honor the
behaviour that scheduler can place tasks on the little CPUs when the
big CPUs are all overcommitted.
Also, it's found that need_idle with boost can easily return previous
CPU when there is no idle CPU found. Fix this issue by making
need_idle flag to take precedence over sched_boost.
CRs-fixed: 1074879
Change-Id: I470bcd0588e038b4a540d337fe6a412f2fa74920
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Iterating a leader task's thread group in order to add them to a
colocation group involves a complex locking chain that ends up
causing a deadlock. The deadlock is as follows when the same task
is being referenced on three different CPUs:
----- ------ -----
CPU 0 CPU 1 CPU 2
----- ------ -----
add_task_to_group(p)
__schedule(prev = p) write_lock( ttwu(p)
related_thread_grp_lock)
lock(pi_lock)
idle_balance() wait for
p->on_cpu
load_balance() unable to acquire
p->pi_lock
send_notification()
wait for read_lock(
related_thread_grp_lock)
unable to set p->on_cpu
There are a couple of ways to resolve this deadlock in the kernel,
however, they are not trivial. For the sake of simplicity, move
the responsibility of thread group iteration back to userspace. This
would apply to both adding and removing the leader task from a
colocation group. The kernel would continue to automatically add
newly forked children of the colocated leader to the colocation
group.
This still leaves an issue with the locking order of the pi_lock and
the related_thread_group_lock. To solve all deadlocks, we need to avoid
taking the pi_lock in reset_all_task_stats() and instead rely on a more
heavy handed approach of taking all rq locks. The pi_lock was taken to
avoid a race between reset_all_task_stats() and sched_exit(). The race
can be avoided with rq locks as well.
Change-Id: I15323e3ef91401142d3841db59c18fd8fee753fd
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Export core control boost function to make it accessible to kernel
modules.
Change-Id: I94359afa433ad57dd5bfeae3cb78a1f196cd02fe
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Conflicts:
conflicts are almost come from mm-kaslr, focus on mm
arch/arm64/include/asm/cpufeature.h
arch/arm64/include/asm/pgtable.h
arch/arm64/kernel/Makefile
arch/arm64/kernel/cpufeature.c
arch/arm64/kernel/head.S
arch/arm64/kernel/suspend.c
arch/arm64/kernel/vmlinux.lds.S
arch/arm64/kvm/hyp.S
arch/arm64/mm/init.c
arch/arm64/mm/mmu.c
arch/arm64/mm/proc-macros.S
During migrate_tasks, we have to drop the dead_rq lock in
order to preserve locking order when acquiring task->pi_lock.
This may allow the task to migrate off of dead_rq. Therefore,
don't attempt to migrate such a task again from dead_rq.
Change-Id: Id31b58e231d3dcd7d32e0dc7f264595d60a7c408
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Migrate tasks function is used by both hotplug and cpu isolation. During
hotplug all the cpus are stalled (in stop machine) while tasks are being
migrated. However, this is not the case during cpu isolation. A task
that was counted as a pinned thread might have been migrated off the
cpu. Take this into account when checking whether we have completed
moving all tasks off the runqueue.
Also ignore warning about tasks moving off the run-queue for isolation
use case.
Change-Id: I5c5f25eb9b1eaf0605b606a65e0ac86996fa5f27
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Cluster cpu list traversal is not properly protected against removal of
element by a separate thread. Add proper locking to ensure an element
cannot be removed while accessing the list.
In addition ensure we don't end up in a livelock never exiting the loop
due to hotplug continuously moving elements to the end of the list.
Change-Id: Ie98fe48c2f4fdd0244573229b77ee9823df9e214
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
commit cfe02a8a973e7e5f66926b8ae38dfce404b19e29 upstream.
When all subsystems are disabled, gcc notices that cgroup_subsys_enabled_key
is a zero-length array and that any access to it must be out of bounds:
In file included from ../include/linux/cgroup.h:19:0,
from ../kernel/cgroup.c:31:
../kernel/cgroup.c: In function 'cgroup_add_cftypes':
../kernel/cgroup.c:261:53: error: array subscript is above array bounds [-Werror=array-bounds]
return static_key_enabled(cgroup_subsys_enabled_key[ssid]);
~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
../include/linux/jump_label.h:271:40: note: in definition of macro 'static_key_enabled'
static_key_count((struct static_key *)x) > 0; \
^
We should never call the function in this particular case, so this is
not a bug. In order to silence the warning, this adds an explicit check
for the CGROUP_SUBSYS_COUNT==0 case.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Some sysfs attributes in /sys/power/ should really be read-only,
so add support for that, convert those attributes to read-only
and drop the stub .show() routines from them.
Original-by: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit a1e9ca6967d68209c70e616a224efa89a6b86ca6)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
Some architectures require code written to memory as if it were data to be
'cleaned' from any data caches before the processor can fetch them as new
instructions.
During resume from hibernate, the snapshot code copies some pages directly,
meaning these architectures do not get a chance to perform their cache
maintenance. Modify the read and decompress code to call
flush_icache_range() on all pages that are restored, so that the restored
in-place pages are guaranteed to be executable on these architectures.
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Rafael J. Wysocki <rjw@rjwysocki.net>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
[will: make clean_pages_on_* static and remove initialisers]
Signed-off-by: Will Deacon <will.deacon@arm.com>
(cherry picked from commit f6cf0545ec697ddc278b7457b7d0c0d86a2ea88e)
Signed-off-by: Alex Shi <alex.shi@linaro.org>
This reverts 'commit 7112993181 ("input: touchscreen: synaptics v1.1")'
This change is not needed in 4.4 kernel.
Change-Id: I89ab8f353bc04bc0a04d5f5a6993e8e8e5ebbd2e
Signed-off-by: Abinaya P <abinayap@codeaurora.org>
Signed-off-by: Shantanu Jain <shjain@codeaurora.org>
A CPU that is isolated needs to have its timers migrated off to
another CPU. If while migrating timers, there is a running
timer, acquiring the timer base lock after marking a CPU as
isolated will ensure that:
1) No more timers can be queued on to the isolated CPU, and
2) A running timer will finish execution on the to-be-isolated
CPU, and so will any just expired timers since they're all
taken off of the CPU's tvec1 in one go while the base lock
is held.
Therefore there is no apparent reason to wait for the expired
timers to finish execution, and isolation can proceed to migrate
non-expired timers even when the expired ones are running
concurrently.
While we're here, also add a delay to the wait-loop inside
migrate_hrtimer_list to allow for store-exclusive fairness
when run_hrtimer is attempting to grab the hrtimer base
lock.
Change-Id: Ib697476c93c60e3d213aaa8fff0a2bcc2985bfce
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>