The version 2 of the BIMC BWMON HW doesn't reset the counter to 0 when it
hits the threshold. It also has support for an overflow status register.
Change-Id: I9f18d2153a2e5e762ec9950f26e0e7601468a80a
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
The BIMC bwmon device supports monitoring read/write traffic from each BIMC
master port. It also has the capability to raise an IRQ when the traffic
count exceeds a programmable threshold. This allows for it to be used with
the bw_hwmon governor to scale the BW requests from each BIMC master.
Change-Id: Ie8a1471226411e23954ed556292186a5a864ddc1
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Absence of traffic is guaranteed when the device sitting behind a devbw
device is suspended. In such cases, it is a waste of power to make non-zero
bandwidth votes or to scale the devbw device. So, provide APIs to
suspend/resume the devbw device as needed.
Change-Id: Id58072aec7a9710eb917f248d9b9bd08d3a1ec6a
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
When devfreq_devbw was added, the header was omitted since it was
unused. Add it now so that clients can call APIs exported by this
driver.
Change-Id: I39d52f6bf5ca65ab85ae573abbe8cff8796e5971
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
- Add more debug logs
- Change the format out the count logs to use hex instead of decimal to be
consistent with the rest of the logs
- Fix the type of the count variable from signed to unsigned to do the
above
Change-Id: I02a2968a3f10ce20ca00618e7aeeac9b9cd52bd3
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
The clearing of the BIMC BWMON IRQ needs clearing bits in two separate
registers. One is a global register and the other is a port specific
register.
The bit in the port specific register needs to be cleared first before
clearing the bit in the global register. Otherwise, the bit in the global
register gets set again before the port specific bit is cleared. Since
these register are in different address regions, we also need memory
barriers around writes to the global register.
Also, clear the counter value before clearing the interrupt status just to
be safe.
Change-Id: Iee8d2caf9bf7d639c65ed19c979036bd5e203bfd
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
The HW workaround margin being too high reduces the effectiveness of the
interrupt. Try using a margin only when the measured bandwidth is too small
and risks the counter wrapping around multiple times before it's read.
Change-Id: Ic1e88ad360b2348dfb9ad314c42c1b0218010c1d
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Some devfreq devices using this governor might need suspend/resume support.
When suspended, those devices won't need any bandwidth votes and there is
no point in monitoring their bandwidth either.
Therefore, upon suspend, vote for zero bandwidth and stop the HW monitor.
Upon resume, vote for the previous bandwidth and start the HW monitor.
Change-Id: I318449995d714959f0ebfe91961bc23fa8edbd04
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
Several devfreq drivers were added without their corresponding
device tree bindings. Add them now.
Change-Id: I4ca5073a6f3b16c3f02d65bb30f60361c353239f
Signed-off-by: Matt Wagantall <mattw@codeaurora.org>
This is a snapshot of the MSM BIMC bwmon driver as of msm-3.10
commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
This is a snapshot of the Bandwidth driver as of msm-3.10 commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
[junjiew@codeaurora.org: resolved conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Change-Id: I30d48abdfe19a421b4d05003c56c47423c6d0456
This is a snapshot of the Generic bandwidth hw monitor driver as of
msm-3.10 commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
Commit a0dd7b7 excludes cpufreq header from pm_opp.h. Since the
cpufreq governor uses the cpufreq APIs, include cpufreq header
directly for this governor.
Signed-off-by: Rohit Gupta <rohgup@codeaurora.org>
This is a snapshot of the Generic cpufreq governor driver as of msm-3.10
commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
This is a snapshot of the Krait L2 cache HW monitor driver as of msm-3.10
commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
This is a snapshot of the HW monitor governor driver as of msm-3.10
commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
This is a snapshot of the simple devfreq device driver as of msm-3.10
commit:
acdce027751d5a7488b283f0ce3111f873a5816d (Merge "defconfig: arm64:
Enable ONESHOT_SYNC for msm8994")
Signed-off-by: Kumar Gala <galak@codeaurora.org>
[junjiew@codeaurora.org: resolved conflicts]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Change-Id: I37f1781d9192dd0ad2797ea52f9bd3a5ea5847b0
When unregistering devfreq device (devfreq_remove_device()),
there is an additional call to put_device,
after device_unregister().This causes data aborts in case
of access to a kobj in put_device(), that was already freed
by preceding device_unregister()
CRs-Fixed: 841819
Change-Id: I98bd9e4cc9ecfbc48a0bfe72fc47e362a6697741
Signed-off-by: Hanumant Singh <hanumant@codeaurora.org>
If max_state is 0, freq_table will be empty. Change do-while loop to
while loop to avoid dereferencing freq_table.
Change-Id: I4a24e9b8cab8073db429c74e627b7fb50076ea93
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Some devices use freq_table instead of OPP. For those devices, the
available_frequencies file shows up empty. Fix that by using freq_table to
generate the available_frequencies data when OPP is not present.
Change-Id: Ibea8b388ee81c55d2eeddd8a1e2c18c91faed8c7
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
The new flag for the devfreq profile->target() function is used
by the performance governor to notify the driver that the device
should wakeup on the max frequency.
Change-Id: I91c2d649177bdd1841a087a2125d1cdbc979f5c1
Signed-off-by: Vladimir Razgulin <vrazguli@codeaurora.org>
The devfreq framework calls a frequency targeting function with
a flag parameter. Allow the governors to influence that parameter.
Change-Id: I4058bd9dcd027dd246ccdb90d25c68f1dc055901
Signed-off-by: Lucille Sylvester <lsylvest@codeaurora.org>
The predefined performance and powersave governors set the device
frequency on their startup only. That's not enough because the
frequency might have changed after device suspend-resume. With this
fix the governors re-set the required device frequency every time
a device get resumed.
Change-Id: I47ac877fc9e2cfbfc4a46cc676d6f2f838cd41d6
Signed-off-by: Vladimir Razgulin <vrazguli@codeaurora.org>
Set devfreq device min and max frequency limits when device
is added to devfreq, provided frequency table is supplied.
This helps governors to suggest target frequency with in
limits.
Change-Id: Iab24aef59bfeffcfb3c3118c12ba58e25cd9d479
Signed-off-by: Rajagopal Venkat <rajagopal.venkat@linaro.org>
Patch-mainline: linux-pm @ 01/08/13, 05:50
Signed-off-by: Vladimir Razgulin <vrazguli@codeaurora.org>
Enable CPU_FREQ related configs, including governors, cpu-boost
driver and cpufreq device for MSM.
Change-Id: Icd0a0a7962e72706dbbae02ad7898f938391682c
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
CPU scaling for thread migration is now handled by scheduler and
governor. Remove migration related boost feature from cpu-boost.
Change-Id: I36f58e54eaceae30a3d0c11d73b1aadc4787db4e
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Add documentation to describe sysfs nodes that are currently not
documented.
Change-Id: Ib67d401afbb738e1ab79f73f34fab13922c3d98e
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Scheduler provides different load number based on whether a
notification is pending. Under normal situation, it won't provide a
load that exceeds 100% busy time of current frequency. For migration,
the busy time can be huge if a heavy task just moved to the CPU.
This creates a race condition due to how governor handles
notification:
1) Scheduler sends notification for a big task
2) Governor timer runs, and gets a huge load, but fails to skip
hispeed_freq logic and all delays because it's not a notification
3) After receiving sched_get_cpus_busy(), scheduler thinks governor has
finished handling the notification and changes to provide normal load
that is capped to 100% of the CPU at current frequency.
4) Governor now starts handling notification, but gets a small load
that doesn't reflect real demand of the heavy task.
The migration notification is thus effectively lost. Fixing this by
making notification pending a per-cpu flag. If timer gets ahead of
notification handling, it will be run as if it's a notification.
Change-Id: Ie3d68edf85b822232a646c2694bec6928a2d7cd1
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
prev_load could be zero if no active time is registered for a CPU
within a sampling period. Fix potential divide-by-zero issue when
calculating new load percentage.
Change-Id: I8ad118f5b6b94a410ec59eb5ce939b9467e921c7
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
When the existing code computes the target frequency, it limits the target
frequency to be within policy min/max. It does this to make sure the
governor doesn't set the CPU frequency to something outside the policy
min/max limits.
The problem with this is that when the limits are removed, the CPU
frequency takes time to catch up with the real load because the governor
needs to wait for the next recalculation and even when the recalculated
frequency is correct, hysteresis might be applied.
In reality, the load might have already been consistent enough to exceeded
the hysteresis criteria and cause a frequency change if it wasn't for the
policy limits. However, since the policy min/max limits the target
frequency from reflecting the increased need, the hysteresis criteria
doesn't get a chance to expire.
Since the CPUfreq framework already takes care of limiting the governor's
request to be within the policy min/max limits before it sets the CPU
frequency, there's no need to limit the computation of target frequency to
be within policy min/max.
That way, when limits are removed, we can use the current target frequency
as is and immediately jump to a CPU frequency that's appropriate for the
current load.
Change-Id: Idc02359f6ff91530ff69de8edd8a25c275642099
Signed-off-by: Saravana Kannan <skannan@codeaurora.org>
New tasks don't have sufficient history to predict its behavior, even
with scheduler's help. Ramping up conservatively for a heavy task
could hurt performance when it's needed. Therefore, separate out new
tasks' load with scheduler's help and ramp up more aggressively if new
tasks make up a significant portion of total load.
Change-Id: Ia95c956369edb9b7a0768f3bdcb0b2fab367fdf7
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Account amount of load contributed by new tasks within CPU load so that
governor can apply different policy when CPU is loaded by new tasks.
To be able to distinguish new task load a new tunable
sched_new_task_windows also introduced. The tunable defines tasks as new
when the tasks are have been active less than configured windows.
Change-Id: I2e2e62e4103882f7362154b792ab978b181b9f59
Suggested-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[junjiew@codeaurora.org: Dropped all changes on scheduler side because
those have been merged separately.]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
max_freq_hysteresis keeps CPU at policy->max even after load goes
away. This is necessary for some workloads where heavy task start and
stop often. However, in case heavy task indeed stops, it's not very
power friendly to stay at policy->max for extended period.
Instead of keeping CPU at policy->max, drop frequency optimistically.
If a heavy load starts back up again and hit go_hispeed_load within
max_freq_hysteresis period, directly ramp back up to policy->max.
Change-Id: I5edf6d765a3599a5b26e13e584bd237e932593f0
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
CPU load is now normalized to per-policy target_load, instead of
current frequency of CPU. Fix cpufreq_interactive_cpuload accordingly
so that its load number matches other cpufreq interactive events like
cpufreq_interactive_target/notyet/already.
Change-Id: I0685b5930ad1bac01819e96fcdfc181167d4dae0
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When governor gets a notification from scheduler, scheduler provides
exact load that is required by the workload. Ignore hispeed_freq logic
and directly use choose_freq result for notifications.
Also use is_notif field to distinguish notifications instead of
MAX_LOCAL_LOAD.
Change-Id: I409ea66c00f4277adf32d18c339631e1a8b0f97b
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Scheduler needs to understand governor's target_load in order to make
correct decisions when scheduling tasks.
Change-Id: Ia440986de813632def0352e34425fa69da3b2923
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
With per-policy timer implemented, there is no need to use policy->cur
in load calculation and delay enforcement. Each CPUs in policy will
naturally get the cluster frequency in target_freq. Using policy->cur
has side effects if second evaluation comes before frequency switch
requested by first evaluation is finished. When that occurs, the second
evalution could enforce delays incorrectly based on the stale
policy->cur while the timestamps have been updated when target_freq is
updated by earlier evaluation.
For example, assume current frequency is 1.5GHz, hispeed_freq is 1GHz.
First evaluation drops target_freq to 500MHz. It also resets
hispeed_validate_time. While frequency switch is still underway and
policy->cur is still 1.5GHz, a second evaluation happens, and the
evaluation result is 1GHz. Current evaluation would enforce
hispeed_delay for 1.5GHz using the updated hispeed_validate_time and
thus incorrectly delaying the ramp up to 1GHz.
Change from policy->cur to target_freq in load calculation and delay
enforcement.
Change-Id: I416e1d524e14b2c082944b88678eb3105bd70d88
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Commit 92352c0a65bc ("cpufreq: interactive: Ramp up directly if
cpu_load exceeds 100") and commit 594945e67031 ("cpufreq: interactive:
Skip delay in frequency changes due to migration") allow interactive
governor to skip above_hispeed_delay and min_sample_time if the
frequency evaluation request comes from scheduler. Power and performance
benefits of these two features are dependent on the behavior of each
workload. Adverse load pattern may experience regression instead of
improvement.
Make both features optional by introducing a sysfs file for each. Both
features are disabled by default.
Change-Id: I394c7fac00e6b20259dd198bd526a32ead54f14e
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
sched_get_cpus_busy() provides a snapshot of all CPUs' busy time
information for the set of CPUs being queried. This avoids race
condition due to migration when CPU load is queried one by one.
Change-Id: I6afdfa74ff9f3ef616872df4e2c3bb04f6233c3f
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Slack timer's expire field was not correctly initialized if slack_only
is true in cpufreq_interactive_timer_resched(). This causes both
compilation warning and functional breakage.
Fix expire field by setting it properly.
Change-Id: I2f8c454d63626876522c163eb8d3c5d1c8adfd51
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
With per-cluster timer implementation, only max load across CPUs in
cluster is traced in timer function. Add cpufreq_interactive_cpuload
trace to provide per-cpu load information.
Change-Id: Icea9f2574332a4bc472b14193e77d76100a896ed
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Interactive governor currently uses per-cpu timer to evaluate each
CPU's frequency. For policies that manages multiples CPUs, each CPU
runs its own algorithm to decide its frequency and then final result
is aggregated in speedchange task. This implementation has a few
drawbacks.
Due to the use of deferrable timers, timers between CPUs can be easily
misaligned. If a load migrates from CPU A to CPU B, there exists a gap
where CPU A could have dropped its frequency vote yet CPU B hasn't
seen the demand to ramp up its vote. This would result in an incorrect
drop in policy frequency which is harmful for performance.
In addition, for CPU waking up in middle of a window, the timestamps
it takes will not be aligned with jiffy boundaries, and thus when next
time timer fires, it could incorrectly prevent frequency ramp up/down
for one more window.
Change-Id: Ia82c7b0cff5bb1ea165fb83fbb7a5546ea7d0396
[junjiew@codeaurora.org: Resolved merge conflicts. ]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
first_cpu field was introduced to handle tunable save and restore, but
later improvements removed the need for it. Remove it from
cpufreq_interactive_cpuinfo struct.
Change-Id: Ib6fd7546451ee537f55d874f93d0e52bec58f124
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Set use_sched_load tunable early in store so that we pass
the correct 64-bit jiffy to scheduler.
Change-Id: I46ed73441c9d242f15e5759360d0cea4a9dd23d0
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
There is a race window as explained below when governor tries to change
the cpu frequency and some other thread (say thermal mitigation) try to
change the policy limits simultaneously.
speedchange task (ThreadA) Thread B(say Thermal)
cpufreq_interactive_speedchange_task()
|
__cpufreq_driver_target()
|
set_cpu_freq()
|
cpufreq_update_policy()
|
modified policy_max
|
check policy->curr against
new policy limits,return
without calling
__cpufreq_driver_target as
policy->curr(which is not
updated by ThreadA) is still
within the new policy limits.
|
sent CPUFREQ_POSTCHANGE notification
|
updated policy->cur which happens to be higher than policy->max
This results the current frequency being higher than the policy->max and
violating the policy limits. This causes thermal impact and in turn high
power consumption. So Fix this by calling __cpufreq_driver_target() always
with current frequency and leave it to __cpufreq_driver_target() to
guarantee there is no race condition when multiple threads are changing
frequencies.
Change-Id: I9136e9245677e8fc90a628d3099aca8d63d3677c
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
When tunables are not available for events other than
CPUFREQ_GOV_POLICY_INIT in cpufreq_governor_interactive(), trigger a
panic instead of throwing a warning.
When the original warning happens, some race condition must have
occurred, and governor will be in a bad state even if it might still
run for a while. Panic directly so that it's easier to catch the
first race event.
Change-Id: I2dc1185cabfe72a63739452731fe242924d2cf45
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
The above hispeed delay and min sample time delays are used to
distinguish between sporadic load changes versus steady state load
changes. The governor tried to make sure the frequency changes only
when the load change is a steady state load change.
However, when the load change is for predictable reasons like
migration, the delays only negatively affect performance and power.
Once a significant load is migrated into a CPU, it's fairly reasonable
to assume it's going to continue contributing that additional load.
Similarly once a significant load is migrated away from a CPU, it's
fairly reasonable to assume the load will be gone forever. Future
migrations can bring back a load or take it away, but the
notifications that come along with it will allow us to quickly correct
for it. For this reason, when the load change is due to a
notification, do not delay frequency changes.
Change-Id: I19ad294b599e30654fbbeb0c56e8b50b0e19198f
[junjiew@codeaurora.org: Resolved merge conflicts.]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When a CPU is running at policy->min, slack timer will not be scheduled.
If policy->min is reduced later, current implementation doesn't
reschedule slack timer and thus could leave CPU at a higher
frequency indefinitely as long as the CPU is idle. This behavior is
undesirable from power perspective.
Change-Id: I40bfd7c93ad3fd06e3837dc48befdc07f29c78c8
[junjiew@codeaurora.org: Resolved merge conflicts.]
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
When governor is using regular busy time tracking, cpu_load will
never exceed 100 because busy time will never exceed elapsed time in
any one sampling window. The only exception is when frequency is
reduced in middle of a window (e.g. due to thermal throttling). In
this case, cpu_load is likely irrelevant since current frequency
governor has been voting is already higher than what target can run
at.
However, on a heterogeneous CPU system with scheduler input enabled
to track the load of migrated tasks, cpu_load could also exceed 100
when a task migrates from more capable CPU to slower CPU. When this
happens, governor already knows the exact frequency required to handle
this load. There is no need to progressively ramp up frequency in order
to assess the load's real demand. It's not desirable to starve such a
migrating task by forcing it through ramping up process on the slower
CPU.
Direclty jump beyond hispeed_freq and ignore above_hispeed_delay if
cpu_load exceeds 100.
Change-Id: Ib87057e4f00732fad943ab595a33e3059494ef15
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
Current implementation of cpufreq_interactive_enable_sched_input()
returns early if use_sched_input is already enabled. This breaks
refcounting for migration notification registration. It could also
result in failure of registering migration notification after
hotplugging the entire cluster and/or suspend/resume.
Change-Id: I079b2c70b182f696cd8a883f5c8e3a37b5c6d21d
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>