Interactive governor already has a per_cpu field cpuinfo to keep track
of per_cpu data. Move cached_tunables into cpuinfo.
Change-Id: I77fda0cda76b56ff949456a95f96d129d877aa7b
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
To avoid multiple frees of an allocated tunables struct during
module_exit(), the pointer to the allocated tunables should be stored in
only one of the per-CPU cached_tunables pointer.
So, in the case of per policy governor configuration, store the cached
values in the pointer of first CPU in a policy. In the case of one governor
across all policies, store it in the CPU0 pointer.
Change-Id: Id4334246491519ac91ab725a8758b2748f743bb0
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Userspace might change tunable values for a governor. Currently, if
all CPUs in a policy go offline, governor frees its tunable. This
wipes out all userspace modifications. Kernel drivers can call
cpu_up/down() directly and thus userspace won't have a chance to
restore the tunables.
Permanently save tunable struct in a per_cpu field so that we
preserve tunable values across hotplug, suspend/resume and governor
switch.
Change-Id: I126b8278c8e75c8eadb3e2ddfe97fcc72cddfa23
[junjiew@codeaurora.org: Resolved merge conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Many subsystems depend on cpufreq API for CPU frequency scaling.
Cpufreq API is expected to fail until cpufreq device registers.
Change pr_debug() to pr_info() so that user could determine when
cpufreq API becomes available during boot from kernel messages. This
is crucial to understand whether a cpufreq API failure is benign
during early boot.
Change-Id: Id2dfa009ae33859ec3efcdb29a3296e891852c6a
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Governor error messages point to important failures in governor or
framework. Output triggering CPU and policy->cpu to help debugging.
Resolved conflicts for 3.18 kernel.
Change-Id: I4c5c392ec973b764ec3240bb2eb455c624bcaf63
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
cpufreq_frequency_get_table could return NULL. Do error check on the
return value instead of continue with a potentially NULL pointer.
Change-Id: I0cb8a3a8ae3499e738683e5f45271aeadee488f6
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
__cpufreq_driver_target() checks if policy->cur is same as target_freq
without holding any lock. This function is used by governor to
directly set CPU frequency. Governor calling this function can't hold
any CPUfreq framework locks due to deadlock possibility.
However, this results in a race condition where one thread could see
a stale policy->cur while another thread is changing CPU frequency.
Thread A: Governor calls __cpufreq_driver_target(), starts increasing
frequency but hasn't sent out CPUFREQ_POSTCHANGE notification yet.
Thread B: Some other driver (could be thermal mitigation) starts
limiting frequency using cpufreq_update_policy(). Every limits are
applied to policy->min/max and final policy->max happens to be same as
policy->cur. __cpufreq_driver_target() simply returns 0.
Thread A: Governor finish scaling and now policy->cur violates
policy->max and could last forever until next CPU frequency scaling
happens.
Shifting the responsibility of checking policy->cur and target_freq
to CPUfreq device driver would resolve the race as long as the device
driver holds a common mutex.
Change-Id: I6f943228e793a4a4300c58b3ae0143e09ed01d7d
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Some modules can benefit from getting additional information cpufreq
governors use to make frequency switch decisions.
This change lays down a basic framework that the governors can use
to report additional information (Eg: CPU's load) information to
the clients that subscribe to cpufreq govinfo notifier chain.
Change-Id: I511b4bdb7d12394a31ce5352ae47553861e49303
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
[imaund@codeaurora.org: resolved context conflicts]
Signed-off-by: Ian Maund <imaund@codeaurora.org>
Frequency table is allocated with devm_kzalloc() and thus should be
freed using devm_kfree().
Change-Id: I9c08838eadb9fc04bda9cc66596e1e0b45b3e4db
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Previous cpufreq_notify_transition() is deprecated in favor of
cpufreq_freq_transition_begin/end() API which provides serialization
guarantee for notifications.
Use the new API for transition notification.
Change-Id: I8d559e5c6ef4771986b24e017c900476da1f6cdf
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
cpufreq_suspend is now a function in core CPUfreq framework. Rename
qcom-cpufreq's local per-cpu variable to suspend_data.
Change-Id: I2f567f0c04271d728d4e6a17b61cea2152c4d8f7
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Different structures might need to be saved and restored based on
different scheduling policies of current thread. Saving and restoring
priority using scheduler APIs is very fragile due to potential changes
in scheduler code. In addition, the priority change doesn't
provide any starvation guarantee because threads can be preempted
before the priority change.
Therefore remove save and restore of priority to avoid potential bugs
when scheduler API changes. Caller will now be responsible for setting
the right priority for their CPU frequency scaling workqueue/thread.
Change-Id: I2a5d8599e75c0c4aa902df3214c17ab2b13dc9a9
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
qcom-cpufreq blocks CPU frequency change request during suspend, because
its dependencies might be suspended. Thus a freq change request would
fail silently, and CPU clock won't change until first frequency update
is requested after system comes out of suspend. This creates a period
when thermal driver cannot perform frequency mitigation, even though
policy->min/max have been correctly updated.
Check each online CPU's policy during resume to correct any frequency
violation as soon as possible.
Change-Id: I3be79cf91e7d5e361314020c9806b770823c0b72
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
It's no longer a requirement to pin frequency change on the CPU that
is being scaled. Therefore, there is no longer a need for per-cpu
workqueue in qcom-cpufreq. Remove the workqueue.
Change-Id: Ic6fd7f898fa8b1b1226a178b04530c24f0398daa
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
MSM_CPU_FREQ_SET_MIN_MAX and related Kconfigs are deprecated. Purge
them from Kconfig and qcom-cpufreq.
Change-Id: I8ac786c155c7e235154b60c79f97d76ea15dace2
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
It is sometimes useful to profile how long CPU frequency switches
take, since they often involve variable overhead (PLL lock times,
voltage increase time, etc.). Add additional traces to to make this
possible.
Since the overhead involved may differ based on the frequencies
being switched between, record both the start and the end frequencies
as part of the trace.
Change-Id: I2de743fc357dad3590fd4980f65f38f6073d426e
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[abhimany: resolve trivial merge conflicts]
Signed-off-by: Abhimanyu Kapur <abhimany@codeaurora.org>
This is a snapshot of qcom-cpufreq as of msm-3.10 commit
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Change-Id: Idb99a856330566ffad6309c48edabb220cee7917
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
[junjiew@codeaurora.org: resolved conflicts in Kconfig.arm
and Makefile. Dropped dependency on ARCH_MSM.]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
cpu-boost driver is not a CPUFreq device. Moving it to the end of
CPUFreq governor section.
Change-Id: Ib433f81e7596789a2e6ea03d0bd0a8d166ecf9e9
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Scheduler provides an API to force tasks to the big cluster. To
improve performance, use this API to move most/all tasks to the
big cluster for short duration on an input event. On the removal of
frequency boost (after input_boost_ms), this scheduler boost is also
deactivated.
Change-Id: I9d643914ebc75266478cc22260a45862faad6236
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
Enable CONFIG_SCHED_DEBUG in order to expose /proc/sched_debug.
Change-Id: Id784c80fe6203f007501637c3d17876528329e2b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Enable HMP scheduler along with scheduler guided frequency input.
Change-Id: Ia0e7cf6c5c5ff44492836ebb5189574f55cb742e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Clean up msm_defconfig and msm-perf_defconfig with 'make savedefconfig'.
Change-Id: I118d9d4ddc1fb89b4301cb7ceffdbccc60699329
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
The sysctl node sched_new_task_windows is only for CONFIG_SCHED_HMP and
CONFIG_SCHED_FREQ_INPUT.
Change-Id: I4791e977fa8516fd2cd31198f71103b8d7e874c3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Select given task's prev_cpu when the task slept for short period to
reduce latency of task placement and migrations. A new tunable
/proc/sys/kernel/sched_select_prev_cpu_us introduced to determine whether
tasks are eligible to go through fast path.
CRs-fixed: 947467
Change-Id: Ia507665b91f4e9f0e6ee1448d8df8994ead9739a
[joonwoop@codeaurora.org: fixed conflict in include/linux/sched.h,
include/linux/sched/sysctl.h, kernel/sched/core.c and kernel/sysctl.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add documentation for the revised task placement logic for the
scheduler. Since the old file sched-hmp.txt is still required,
add a new one instead.
Change-Id: Ic7e3845c8d6b85b7918cd35c2a0a482a621fe525
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
At present, HMP scheduler uses sched_clock to setup window boundary to
be aligned with timer interrupt to ensure timer interrupt fires after
window rollover. However this alignment won't last long since the timer
interrupt rearms next timer based on time measured by ktime which isn't
coupled with sched_clock.
Convert sched_clock to ktime to avoid wallclock discrepancy between
scheduler and timer so that we can ensure scheduler's window boundary is
always aligned with timer.
CRs-fixed: 933330
Change-Id: I4108819a4382f725b3ce6075eb46aab0cf670b7e
[joonwoop@codeaurora.org: fixed minor conflict in include/linux/tick.h
and kernel/sched/core.c. omitted fixes for kernel/sched/qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Following the change "57e2905 sched: Skip resetting HMP stats when
max frequencies remain unchanged" the scheduler fails to update
min/max capacities appropriately when CPUs are hot added after being
hot removed. Fix this problem by handling the CPUFREQ_CREATE_POLICY
notification and explicitly updating min/max capacities.
Change-Id: I5dadac3258e18897fa3d505cf128ebe24c091efa
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
cpu_hardirq_time and cpu_softirq_time are protected with
seqlock on 32bit systems. There is a potential deadlock
with this seqlock and rq->lock.
CPU 1 CPU0
========================== ========================
--> acquire CPU0 rq->lock --> __irq_enter()
----> task enqueue/dequeue ----> irqtime_account_irq()
------> update_rq_clock() ------> irq_time_write_begin()
--------> irq_time_read() --------> sched_account_irqtime()
(waiting for the seqlock (waiting for the CPU0 rq->lock)
held in irq_time_write_begin()
Fix this issue by dropping the seqlock before calling
sched_account_irqtime()
Change-Id: I29a33876e372f99435a57cc11eada9c8cfd59a3f
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Scheduler ftrace events currently generate a lot of data when turned
on. The excessive log messages often end up overflowing trace buffers
for long use cases or crowding out other events. Optimize scheduler
events so that the log spew is less and more manageable. To that end
change the variable type for some event fields; introduce variants
of sched_cpu_load that can be turned on/off for separate code paths
and remove unused fields from various events.
Change-Id: I2b313542b39ad5e09a01ad1303b5dfe2c4883b8a
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in rt.c due to
CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
It's possible select_best_cpu() gets called before the first cpufreq
notifier call. In such scenario select_best_cpu() can hang forever by
not clearing search_cpus.
Initialize frequency domain cpumask with the CPU of rq to avoid such
scenario.
CRs-fixed: 931349
Change-Id: If8d31c5477efe61ad7c6b336ba9e27ca6f556b63
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present select_best_cpu() bails out when best idle CPU found without
printing sched_task_load trace event. Print it.
Change-Id: Ie749239bdb32afa5b1b704c048342b905733647e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add a bias towards the RT task's previous CPU and sibling CPUs in order
to avoid cache bouncing and migrations.
CRs-fixed: 927903
Change-Id: I45d79d774e65efcb38282130b6692b4c3b03c2f0
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
To migrate a running task using stop_one_cpu, one has to give up
the the pi_lock and rq_lock. To safeguard against migration
between giving up those locks and actually invoking stop_one_cpu,
one has to save away task_cpu(p) before releasing pi_lock, and
use the saved value when passing it as the src_cpu argument to
stop_one_cpu. If the current task_cpu is passed in, the task may
have already been migrated to that CPU for whatever other reason.
sched_exec attempts to invoke stop_one_cpu with source CPU
set to task_cpu(task) after dropping the pi_lock. While this
doesn't result in a functional error, it is rather useless to
have the entire migration code run when the task is already
running on the destination CPU.
Change-Id: I02963ed02c7119a3d707580a191fbc86b94cdfaf
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
[joonwoop@codeaurora.org: omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Tasks that are on the runqueue continuously for a certain amount of time
have the potential to be big tasks at the end of the window in which they
are runnable. In such scenarios ramping the CPU frequency early can
boost performance rather than waiting till the end of a window for the
governor to query load. Notify the governor early at every tick when a
task has been observed to execute beyond some percentage of the tick
period.
The threshold beyond which a task is eligible for early detection can be
changed via the tunable sched_early_detection_duration. The feature itself
is enabled only when scheduler boost is in effect.
Change-Id: I528b72bbc79a55b4593d1b8ab45450411c6d70f3
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in scheduler_tick() in
kernel/sched/core.c. fixed minor conflicts in include/linux/sched.h,
include/linux/sched/sysctl.h and kernel/sysctl.c due to
CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
A change in cpufreq policy parameters currently trigger a partial reset
of HMP stats. This is necessary when there are changes in the max
frequency of any cluster since updated load scaling factors necessitate
updating the number of big and small tasks on every CPU. However, this
computation is redundant when parameters other than the max freq change.
Optimize code by avoiding the redundant calculations.
Change-Id: Ib572f5dfdc4ada378e695f328ff81e2ce31132ba
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Add best_cpu and latency field to sched_task_load trace event. The latency
field represents combined latency of update_task_ravg(), update_task_ravg()
and select_best_cpu() which is useful to analyze latency overhead of HMP
scheduler.
Change-Id: Ie6d777c918d0414d361d758490e3cd7d509f5837
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Avoid unnecessary multiplication and division when load scaling factor
is 1024.
Change-Id: If3cb63a77feaf49cc69ddec7f41cc3c1cabbfc5a
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present in order to estimate power cost of CPU load, HMP scheduler
converts CPU load to coresponding frequency on the fly which can be
avoided.
Optimize and reduce execution time of select_best_cpu() by precomputing
CPU load to frequency conversion. This optimization reduces about ~20% of
execution time of select_best_cpu() on average.
Change-Id: I385c57f2ea9a50883b76ba6ca3deb673b827217f
[joonwoop@codeaurora.org: fixed minior conflict in kernel/sched/sched.h.
stripped out codes for CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
The commit 392edf4969d20 ("sched: avoid stale cumulative_runnable_avg
HMP statistics) introduced the callback function fixup_hmp_sched_stats()
so update_history() can avoid decrement and increment pair of HMP stat.
However the commit also made fixup function to do obscure p->ravg.demand
update which isn't the cleanest way.
Revise the function fixup_hmp_sched_stats() so the caller can update
p->ravg.demand directly.
Change-Id: Id54667d306495d2109c26362813f80f08a1385ad
[joonwoop@codeaurora.org: stripped out CONFIG_SCHED_QHMP.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.
To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced. The tunable defines tasks as new
when the tasks are have been active less than configured windows.
Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
[joonwoop@codeaurora.org: ommited changes for
drivers/cpufreq/cpufreq_interactive.c. cpufreq changes needs to be
applied separately later. fixed conflict in include/linux/sched.h and
include/linux/sched/sysctl.h. omitted changes for qhmp_core.c]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
On the 3.18 kernel version and beyond, when the affinity for an
enqueued task changes such that migration is required, the rq
variable gets updated to the destination rq. This means that
check_for_freq_change() skips the source CPU frequency check and
instead double checks the destination CPU. Fix this by using the
src_cpu variable instead.
Change-Id: I14727a34e22c50c9a839007d474802f96a2f49f6
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict int __migrate_task().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add per-cpu tunable to set the extra cost to use a CPU that is idle.
Add the same for a cluster.
Change-Id: I4aa53f3c42c963df7abc7480980f747f0413d389
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted changes for qhmp*.[c,h] stripped out
CONFIG_SCHED_QHMP in drivers/base/cpu.c and include/linux/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add new API to the scheduler to allow low power mode driver to inform
the scheduler about the d-state of a cluster. This can be leveraged by
the scheduler to make an informed decision about the cost of placing a task
on a cluster.
Change-Id: If0fe0fdba7acad1c2eb73654ebccfdb421225e62
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: omitted fixes for qhmp_core.c and qhmp_core.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present HMP scheduler packs tasks to busy CPU till the CPU's load is
100% to avoid waking up of idle CPU as much as possible. Such aggressive
packing leads unintended CPU frequency raise as governor raises the busy
CPU's frequency when its load is more than configured frequency max load
which can be less than 100%.
Fix to take into account of governor's frequency max load and pack tasks
only when the CPU's projected load is less than max load to avoid
unnecessary frequency raise.
Change-Id: I4447e5e0c2fa5214ae7a9128f04fd7585ed0dcac
[joonwoop@codeaurora.org: fixed minor conflict in kernel/sched/sched.h]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Set init_task_load to 100% to allow new tasks to wake up on the best
performance CPUs.
Change-Id: Ie762a3f629db554fb5cfa8c1d7b8b2391badf573
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present HMP task placement algorithm places wake-up task on any lowest
power cost CPU in the system even if the task's previous CPU is also one
of the lowest power cost CPU. Placing task on the previous CPU can reduce
cache bouncing.
Add a bias towards the task's previous CPU and CPU in the same cache domain
with previous CPU.
Change-Id: Ieab3840432e277048058da76764b3a3f16e20c56
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>