Add a cold_boot parameter which supplements the
boot_reason sysctl entry with information about
whether the system was booted from cold or warm state.
/proc/sys/kernel/cold_boot entry is updated with 1 or 0 when
system was booted from cold or warm boot state respecitively.
CRs-Fixed: 461256
Change-Id: I2bc5d80c8f26eb9e9dbb4b34960d991a51a224e4
Signed-off-by: David Keitel <dkeitel@codeaurora.org>
[abhimany: fixup minor merge conflict and drop changes to
kernel/sysctl.c and Documentation since it was brought in via
snapshot commit]
Signed-off-by: Abhimanyu Kapur <abhimany@codeaurora.org>
During board initialization read the shared memory item
SMEM_POWER_ON_STATUS_INFO and place it in the procfs at
/proc/sys/kernel/boot_reason
The data item is an integer with a bit being set to identify the reason
the device was powered on. The values of this data item is defined in
the document Document/arm/msm/boot.txt, the following is the data in the
documentation file.
power_on_status values set by the PMIC for power on event:
----------------------------------------------------------
0x01 -- keyboard power on
0x02 -- RTC alarm
0x04 -- cable power on
0x08 -- SMPL
0x10 -- Watch Dog timeout
0x20 -- USB charger
0x40 -- Wall charger
0xFF -- error reading power_on_status value
This is cherrypicked from commit <372d39f87b0da75>
("put reason for boot in procfs") of 3.18 tree.
Change-Id: I59e665f92e6e29f7dfef4380314f676a2d92c94b
Signed-off-by: Rick Adams <rgadams@codeaurora.org>
[abhimany: fix up minor merge conflicts]
Signed-off-by: Abhimanyu Kapur <abhimany@codeaurora.org>
Signed-off-by: Srinivas Ramana <sramana@codeaurora.org>
The max_possible_efficiency and CPU's efficiency are fixed values which
are determined at cluster allocation time. Avoid division on the fast
by using precomputed scale factor.
Also update_cpu_busy_time() doesn't need to know how many full windows
have elapsed. Thus replace unneeded division with simple comparison.
Change-Id: I2be1aad3fb9b895e4f0917d05bd8eade985bbccf
Suggested-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Updating cycle counter should be serialized by holding rq lock.
Add missing rq lock hold when cycle counter is updated by irq entry
point.
Change-Id: I92cf75d047a45ebf15a6ddeeecf8fc3823f96e5d
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Task execution time in nanoseconds and CPU cycle counters are large
enough to cause overflow when we multiply both. Avoid overflow by
calculating frequency separately.
Change-Id: I076d9ecd27cb1c1f11578f009ebe1a19c1619454
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
The function parameter cpu isn't used anymore by cpu_cycles_to_freq().
So remove it.
Change-Id: Ide19321206dacb88fedca97e1b689d740f872866
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
We have scripts which write to certain fields on 3.18 kernels but
this seems to be failing on 4.4 kernels.
An entry which we write to here is xfrm_aevent_rseqth which is u32.
echo 4294967295 > /proc/sys/net/core/xfrm_aevent_rseqth
Commit 230633d109 ("kernel/sysctl.c:
detect overflows when converting to int") prevented writing to
sysctl entries when integer overflow occurs. However, this does not
apply to unsigned integers.
u32 should be able to hold 4294967295 here, however it fails due to
this check.
static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
if (*lvalp > (unsigned long) INT_MAX)
return -EINVAL;
Fix this for now by reverting this commit till a solution is
finalized upstream.
CRs-Fixed: 1026507
Change-Id: I4fae5f442e4cc2c2414a69e960d42c05c3062415
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
It's possible thermal driver and governor notify that fmax is being
changed at the same time. In such case we can potentially skip updating
of CPU's capacity. Fix this by updating capacity always when limited
fmax is changed by same entity.
Meanwhile serialize sched_update_cpu_freq_min_max() with spinlock since
this function can be called by multiple drivers at the same time.
Change-Id: I3608cb09c30797bf858f434579fd07555546fb60
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Time between mark_start of idle task and IRQ handler entry time is CPU
cycle counter stall period. Therefore it's inappropriate to include such
duration as part of sample period when we do frequency estimation.
Fix such suboptimality by replenishing idle task's CPU cycle counter
upon IRQ entry and using irqtime as time delta.
Change-Id: I274d5047a50565cfaaa2fb821ece21c8cf4c991d
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
CPU cycle counter won't increase when CPU or cluster is idle depending
on hardware. Thus using cycle counter in that period of time can
result in incorrect CPU frequency estimation. Use previously calculated
CPU frequency when CPU was idle.
Change-Id: I732b50c974a73c08038995900e008b4e16e9437b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Preserve cycle counter in rq in preparation for wait time accounting
while CPU idle fix.
Change-Id: I469263c90e12f39bb36bde5ed26298b7c1c77597
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Add support to provide an interface that can be used from
userspace to decide whether app specific settings need to
be applied / cleared when particular processes are running.
CRs-Fixed: 981519 997757
Change-Id: Id81f8b70de64f291a8586150f4d2c7c8f8b4420f
Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org>
[satyap@codeaurora.org: trivial merge conflict resolution and pull
fixes for CR: 997757]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
This reverts commit 8f90803a45 ("sched: warn/panic upon excessive
scheduling latency") as this feature is no longer used.
Change-Id: I200d0e9e8dad5047522cd02a68de25d4a70a91a4
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
This reverts commit b40bf941f6 ("sched: add scheduling latency
tracking procfs node") as this feature is no longer used.
Change-Id: I5de789b6349e6ea78ae3725af2a3ffa72b7b7f11
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
This has always been unused feature given its limitation of adding
phantom load to the system. Since there are no immediate plans of
using this and the fact that it adds unnecessary complications to
the new load fixup mechanism, remove this feature for now. It can
be revisited later in light of the new mechanism.
Change-Id: Ie9501a898d0f423338293a8dde6bc56f493f1e75
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Kill unused scheduler knob sched_migration_fixup. With this change
scheduler always adjusts CPU's busy time during migration.
Change-Id: I5d59e89d5cc0f2c705c40036cd7b47f5d3f89e58
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Kill unused scheduler knob and parameter sched_enable_power_aware. HMP
scheduler always take into account power cost for placing task.
Change-Id: Ib26a21df9b903baac26c026862b0a41b4a8834f3
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Kill unused scheduler knob sched_account_wait_time. With this change
scheduler always accounts task's wait time into demand.
Change-Id: Ifa4bcb5685798f48fd020f3d0c9853220b3f5fdc
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Related threads in a group could execute on different CPUs and hence
present a split-demand picture to cpufreq governor. IOW the governor
fails to see the net cpu demand of all related threads in a given
window if the threads's execution were to be split across CPUs. That
could result in sub-optimal frequency chosen in comparison to the
ideal frequency at which the aggregate work (taken up by related
threads) needs to be run.
This patch aggregates cpu execution stats in a window for all related
threads in a group. This helps present cpu busy time to governor as if
all related threads were part of the same thread and thus help select
the right frequency required by related threads. This aggregation
is done per-cluster.
Change-Id: I71e6047620066323721c6d542034ddd4b2950e7f
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: Fixed notify_migration() to hold rcu read
lock as this version of Linux doesn't hold p->pi_lock when the
function gets called while keeping use of rcu_access_pointer() since
we never dereference return value.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
The function trace_printk() performs optimization by determining if
there are no format parameters in argument string and calls appropriate
apis to write to ftrace buffer. Add STM logging to support this
optimization in order to allow CoreSight STM tracing for optimized
trace_printk path.
Change-Id: I1a77291e77410c6ed99474335a6d25742c409e47
Signed-off-by: Aparna Das <adas@codeaurora.org>
Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
Signed-off-by: Shashank Mittal <mittals@codeaurora.org>
Dup ftrace event traffic and writes to trace_marker file from
userspace to STM. Also dup trace printk traffic to STM. This
allows Linux tracing and log data to be correlated with other
data transported over STM.
Change-Id: I4fcb42f2e97ab963fdc85853f4f3ea1f208bfc3c
Signed-off-by: Pratik Patel <pratikp@codeaurora.org>
[spjoshi@codeaurora.org: 3.18 code fixup]
Signed-off-by: Sarangdhar Joshi <spjoshi@codeaurora.org>
[mittals@codeaurora.org: 4.4 code fixup]
Signed-off-by: Shashank Mittal <mittals@codeaurora.org>
Most of CPUs increase cycle counter by one every cycle which makes
frequency = cycles / time_delta is correct. Therefore it's reasonable
to get rid of current cpu_cycle_max_scale_factor and ask cycle counter
read callback function to return scaled counter value when it's needed
in such a case that cycle counter doesn't increase every cycle.
Thus multiply NSEC_PER_SEC / HZ_PER_KHZ to CPU cycle counter delta
as we calculate frequency in khz and remove cpu_cycle_max_scale_factor.
This allows us to simplify frequency estimation and cycle counter API.
Change-Id: Ie7a628d4bc77c9b6c769f6099ce8d75740262a14
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
a malicious app can open a perf event with constraint_duplicate
bit set, disable the event, and close the fd. On closing the fd,
the perf_release() modification causes the kernel to clean up
the event as if it still were enabled, leading to the event
being removed from a list twice.
CRs-Fixed: 977563
Change-Id: I5fbec3722407d2f3d0ff0d9f7097c5889e31fd62
Signed-off-by: Srinivasarao P <spathi@codeaurora.org>
Fix macro name so CONFIG_SCHED_HMP_CSTATE_AWARE=y to take effect.
Change-Id: I0218b36b2d74974f50a173a0ac3bc59156c57624
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
This reverts commit 28f67e5a50 ("sched: set HMP scheduler's
default initial task load to 100%") since 100% of init task load
makes too much of power inefficiency on some targets.
CRs-fixed: 1006303
Change-Id: I81b4ba8fdc2e2fe1b40f18904964098fa558989b
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Rather than using debugfs, switch to tracefs which trace
moved to in kernel 4.4.
Signed-off-by: David Keitel <dkeitel@codeaurora.org>
Change-Id: I52ef7d45cabb20cc61fbd2fb3ef5016b041bc56c
The permissions of /proc/iomem currently are -r--r--r--. Everyone can
see its content. As iomem contains information about the physical memory
content of the device, restrict the information only to root.
Change-Id: If0be35c3fac5274151bea87b738a48e6ec0ae891
CRs-Fixed: 786116
Signed-off-by: Biswajit Paul <biswajitpaul@codeaurora.org>
Signed-off-by: Avijit Kanti Das <avijitnsec@codeaurora.org>
Workqueue stalls can happen from a variety of usage bugs such as
missing WQ_MEM_RECLAIM flag or concurrency managed work item
indefinitely staying RUNNING. These stalls can be extremely difficult
to hunt down because the usual warning mechanisms can't detect
workqueue stalls and the internal state is pretty opaque.
To alleviate the situation, this patch implements workqueue lockup
detector. It periodically monitors all worker_pools periodically and,
if any pool failed to make forward progress longer than the threshold
duration, triggers warning and dumps workqueue state as follows.
BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
Showing busy workqueues and worker pools:
workqueue events: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
workqueue events_power_efficient: flags=0x80
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
pending: check_lifetime, neigh_periodic_work
workqueue cgroup_pidlist_destroy: flags=0x0
pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
pending: cgroup_pidlist_destroy_work_fn
...
The detection mechanism is controller through kernel parameter
workqueue.watchdog_thresh and can be updated at runtime through the
sysfs module parameter file.
v2: Decoupled from softlockup control knobs.
CRs-Fixed: 1007459
Change-Id: Id7dfbbd2701128a942b1bcac2299e07a66db8657
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Chris Mason <clm@fb.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Git-commit: 82607adcf9cdf40fb7b5331269780c8f70ec6e35
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
touch_softlockup_watchdog() is used to tell watchdog that scheduler
stall is expected. One group of usage is from paths where the task
may not be able to yield for a long time such as performing slow PIO
to finicky device and coming out of suspend. The other is to account
for scheduler and timer going idle.
For scheduler softlockup detection, there's no reason to distinguish
the two cases; however, workqueue lockup detector is planned and it
can use the same signals from the former group while the latter would
spuriously prevent detection. This patch introduces a new function
touch_softlockup_watchdog_sched() and convert the latter group to call
it instead. For now, it just calls touch_softlockup_watchdog() and
there's no functional difference.
CRs-Fixed: 1007459
Change-Id: I6fe77926acd4240458cab29d399f81d8739a16c0
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Git-commit: 03e0d4610bf4d4a93bfa16b2474ed4fd5243aa71
Git-repo: git://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
Actual CPU's min and max frequencies can be limited by hardware
components while governor's not aware of. Provide an API for them to
notify for scheduler to be able to notice accurate currently
operating frequency boundaries which helps better task placement
decision.
CRs-fixed: 1006303
Change-Id: I608f5fa8b0baff8d9e998731dcddec59c9073d20
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present scheduler calculates task's demand with the task's execution
time weighted over CPU frequency. The CPU frequency is given by
governor's CPU frequency transition notification. Such notification
may not be available.
Provide an API for CPU clock driver to register callback functions so
in order for scheduler to access CPU's cycle counter to estimate CPU's
frequency without notification. At time point scheduler assumes the
cycle counter increases always even when cluster is idle which might
not be true. This will be fixed by subsequent change for more accurate
I/O wait time accounting.
CRs-fixed: 1006303
Change-Id: I93b187efd7bc225db80da0184683694f5ab99738
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present sched_boost changes scheduler to place tasks on the least
loaded CPU under the assumption both big and little clusters capacities
are same at the same level of frequency. This is suboptimal for the
big.Little system that doesn't have such a symmetrical capacity between
big and little CPUs.
Fix sched_boost to place tasks on the big CPUs for the non-symmetrical
capacity target.
CRs-fixed: 1006303
Change-Id: I752f020acf1a76580edb5cd0e5ad283b62edfeed
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present among the same power cost and c-state CPUs scheduler places
newly waking up task on the most loaded CPU which can incur too much of
task packing on the same CPU. Place onto the most loaded CPU only when
the best CPU is in idle cstate, otherwise spread out by placing onto the
least loaded CPU.
CRs-fixed: 1006303
Change-Id: I8ae7332971b3293d912b1582f75e33fd81407d86
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
There are CPUs that don't have an obvious low power mode exit latency
penalty. Add a new Kconfig CONFIG_SCHED_HMP_CSTATE_AWARE which controls
whether CPU C-state is used to guide task placement.
CRs-fixed: 1006303
Change-Id: Ie8dbab8e173c3a1842d922f4d1fbd8cc4221789c
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Update the wakeup placement logic when need_idle is not set. Break
ties in power with C-state. If C-state is the same break ties with
prev_cpu. Finally go for the most loaded CPU.
CRs-fixed: 1006303
Change-Id: Iafa98a909ed464af33f4fe3345bbfc8e77dee963
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
[joonwoop@codeaurora.org: fixed bug where assigns best_cpu_cstate with
uninitialized cpu_cstate.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Try and find the min cstate CPU within the little cluster when a
task fits there. If there is no idle CPU return the least busy
CPU. Also Add a prev CPU bias when C-states or load is the same.
CRs-fixed: 1006303
Change-Id: I577cc70a59f2b0c5309c87b54e106211f96e04a0
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
A killed task can stay in the task list long after its
memory has been returned to the system, therefore
ignore any tasks whose mm struct has been freed.
Change-Id: I76394b203b4ab2312437c839976f0ecb7b6dde4e
CRs-fixed: 450383
Signed-off-by: Liam Mark <lmark@codeaurora.org>
When ever notification of IRQ affinity changes, call
cancel_work_sync from irq_set_affinity_notifier to cancel
all pending works to avoid work list corruption.
Change-Id: I1f093bcc43be8c6696bad29250e4926cbc6c4029
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
There is a deadlock scenario due to the circular dependency of CPU's
rq->lock and kswapd's waitqueue lock.
(1) when kswapd is woken up, try_to_wake_up() is called with it's
waitqueue lock held. It's previous CPU is offline, so it is woken
up on a different CPU. We try to acquire the offline CPU's rq->lock
in either cpufreq change callback or fixup_busy_time()
(2) At the same time, the offline CPU is coming online and init_idle()
is called from __cpu_up(). init_idle() calls __sched_fork() with
rq->lock held. A debug object allocation in hrtimer_init() called
from __sched_fork() is trying to wakeup the kswapd and attempts to
take the waitqueue lock held in the (1) path.
Task specific initialization is done in __sched_fork() and rq->lock
is not held when it is called for other tasks. The same holds true for
the idle task as well. __sched_fork() for the idle task is called only
when the CPU is not active.
Acquire the rq->lock after calling __sched_fork() in init_idle()
to fix this deadlock.
CRs-Fixed: 965873
Change-Id: Ib8a265835c29861dba571c9b2a6b7e75b5cb43ee
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
[satyap: trivial merge conflicts resolution and omitted changes for QHMP]
Signed-off-by: Satya Durga Srinivasu Prabhala <satyap@codeaurora.org>
The commit 5e16bbc2fb ("sched: Streamline the task migration locking
a little") hardened task migration locking and now __migrate_task() is
called after rq lock held. Move out notification out of spinlock.
Change-Id: I553adcfe80d5c670f4ddf83438226fd5e0924fe8
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Fix various compilation failures when CONFIG_SCHED_HMP or
CONFIG_SCHED_INPUT isn't enabled.
Change-Id: I385dd37cfd778919f54f606bc13bebedd2fb5b9e
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Biasing sync wakee task towards waker CPU's cluster makes sense when the
waker's demand is high enough so the wakee also can take advantage
of high CPU frequency voted because of waker's load. Placing sync wakee
on the low demand waker's CPU can lead placement imbalance which can
lead unnecessary migration.
Introduce a new tunable "sched_big_waker_task_load" that defines the big
waker so scheduler avoid wakee on waker's cluster bias when the waker's
load is below the tunable.
CRs-fixed: 971295
Change-Id: I1550ede0a71ac8c9be74a7daabe164c6a269a3fb
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
include/linux/sched/sysctl.h.]
If sync wakee task's demand is small it's worth to place the wakee task
on waker's cluster for better performance in the sense that waker and
wakee are corelated so the wakee should take advantage of waker cluster's
frequency which is voted by the waker along with cache locality benefit.
While biasing towards the waker's cluster we want to avoid the waker CPU
as much as possible as placing the wakee on the waker's CPU can make the
waker got preempted and migrated by load balancer.
Introduce a new tunable 'sched_small_wakee_task_load' that differentiates
eligible small wakee task and place the small wakee tasks on the waker's
cluster.
CRs-fixed: 971295
Change-Id: I96897d9a72a6f63dca4986d9219c2058cd5a7916
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
[joonwoop@codeaurora.org: fixed a minor conflict in
include/linux/sched/sysctl.h.]
p->grp is being accessed outside of lock which can cause null-pointer
dereference. Fix this and also add rcu critical section around access
of this data structure.
CRs-fixed: 985379
Change-Id: Ic82de6ae2821845d704f0ec18046cc6a24f98e39
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
[joonwoop@codeaurora.org: fixed conflict in init_new_task_load().]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
At present sched_select_prev_cpu_us tunable is restricted to values
below 100us. Fix this unintended restriction.
CRs-Fixed: 972237
Change-Id: I5eaf9f40468805c396328ca1022baef32acf8de0
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>