"int" type is used to hold the time difference between the successive
updates to nr_run in sched_update_nr_prod(). This can result in
overflow, if the function is called ~2.15 sec after it was called
before. The most probable scenarios are when CPU is idle and
hotplugged. But as we update the last_time of all possible CPUs in
sched_get_nr_running_avg() periodically from a deferrable timer context
(core_ctl module), this overflow is observed only when the system is
completely idle for long time. When this overflow happens we hit
a BUG_ON() in sched_get_nr_running_avg().
Use "u64" type instead of "int" for holding the time difference and
add additional BUG_ON() to catch the instances where sched_clock()
returns a backward value.
Change-Id: I284abb5889ceb8cf9cc689c79ed69422a0e74986
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
The HMP scheduler has two types of task placement boost policies.
(1) boost-on-big policy make use of all big CPUs up to their full capacity
before using the little CPUs. This improves performance on true b.L systems
where the big CPUs have higher efficiency compared to the little CPUs.
(2) boost-on-all policy place the tasks on the CPU having the highest
spare capacity. This policy is optimal for SMP like systems.
The scheduler sets the boost policy to boost-on-big on systems which has
CPUs of different efficiencies. However it is possible that CPUs of the
same micro architecture to have slight difference in efficiency due to
other factors like cache size. Selecting the boost-on-big policy based
on relative difference in efficiency is not optimal on such systems.
The boost-policy device tree property is introduced to specify the
required boost type and it overrides the default selection of boost
type in the scheduler. The possible values for this property are
"boost-on-big" and "boost-on-all".
Change-Id: Iac19183fa7d4bfd9e5746b02a02b2b19cf64b78d
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Add a stub function for init_cluster() and remove a ifdefry
for SCHED_HMP in sched_init()
Change-Id: I6745485152d735436d8398818f7fb5e70ce5ee65
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
The current policy has a preference to select an idle CPU in the waker
cluster compared to the waker CPU running only 1 task. By selecting
an idle CPU, it eliminates the chance of waker migrating to a
different CPU after the wakee preempts it. This policy is also not
susceptible to the incorrect "sync" usage i.e the waker does not
goto sleep after waking up the wakee.
However LPM exit latency associated with an idle CPU outweigh the
above benefits on some targets. So add a knob to prefer the waker
CPU having only 1 runnable task over idle CPUs in the waker cluster.
Change-Id: Id974748c07625c1b19112235f426a5d204dfdb33
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
After the introduction of "33c24b sched: add cpu isolation support"
select_fallback_rq() might sometimes be unable find any CPU to place
a task on. This happens when the all online CPUs are isolated and
the allow isolated flag is set to false. In such cases, we have
little choice but to use an isolated CPU and wait for core control
to eventually un-isolate one or more online CPUs.
Change-Id: Id8738bd8493c11731c5491efcc99eb90f051233e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
A timer might be running when we are trying to move the timer to another
CPU so ensure that we wait for the timer to finish before migrating.
Change-Id: I4c9ee39c715baebfbdb8a50476a475e38b092f70
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
The load balancer restrictions are in place to control the tasks
migration from the lower capacity cluster to higher capacity
cluster to save power. The assumption here is that higher capacity
cluster will have higher power cost which may not be necessarily
true for all platforms. Use power cost based checks instead of
capacity based checks while applying the inter cluster migration
restrictions.
Change-Id: Id9519eb8f7b183a2e9fca87a23cf95e951aa4005
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Replace hotplug functionality in core control with cpu isolation
and integrate into scheduler.
Change-Id: I4f1514ba5bac2e259a1105fcafb31d6a92ddd249
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Refactor cpu data into cpu data and cluster data to improve readability and
ease of understanding the code.
Change-Id: I96505aeb9d07a6fa3a2c28648ffa299e0cfa2e41
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Move the core control trace events to scheduler trace event file.
Change-Id: I65943d8e4a9eac1f9f5a40ad5aaf166679215f48
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Move core control from out-of-tree module into the kernel proper.
Core control monitors load on CPUs and controls how many CPUs are
available for the system to use at any point in time. This can help save
power. Core control can be configured through sysfs interface.
Change-Id: Ia78e701468ea3828195c2a15c9cf9fafd099804a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Remove the core control helper code since this is not needed anymore
with subsequent patches that moves core control into the kernel.
Change-Id: I62acddeb707fc7d5626580166b3466e63f45fd89
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Ensure perf events does not wake up idle cores when core is isolated.
Change-Id: Ifefb2f1cf6c24af7bc46fc62797955b8c8ad5815
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Set long latency requirement for isolated cores to ensure LPM logic will
select a deep sleep state.
Change-Id: I83e9fbb800df259616a145d311b50627dc42a5ff
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Prohibit setting the affinity of an IRQ to an isolated core.
Change-Id: I7b50778615541a64f9956573757c7f28748c4f69
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add tracepoint to capture the cpu isolation event including KPI for
time it took to isolate.
Change-Id: If2d30000f068afc50db953940f4636ef6a089b24
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
This adds cpu isolation APIs to the scheduler to isolate and unisolate
CPUs. Isolating and unisolating a CPU can be used in place of hotplug.
Isolating and unisolating a CPU is faster than hotplug and can thus be
used to optimize the performance and power of multi-core CPUs.
Isolating works by migrating non-pinned IRQs and tasks to other CPUS and
marking the CPU as not available to the scheduler and load balancer.
Pinned tasks and IRQs are still allowed to run but it is expected that
this would be minimal.
Unisolation works by just marking the CPU available for scheduler and
load balancer.
Change-Id: I0bbddb56238c2958c5987877c5bfc3e79afa67cc
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
HMP scheduler tunables can be constrained via extra1 and extra2 of
ctl_table. Having valid range in the sysctl table gives clearer
view of tunable's range.
Also add range for sched_select_prev_cpu_us so we can avoid invalid
value configuration of that tunable.
CRs-fixed: 1056910
Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Open up interface to allow external subsystem to enable and disable hard
lockup detector.
Change-Id: I88a728ee1d54aaa887fab52e5e40d1d4e4fc69ca
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add bitmask and corresponding supporting functions for cpu isolation.
Change-Id: Ice1a9503666a2b720bdb324289ca55ceb33097cd
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Do not require CPUSETS to be enabled to allow migration of timers and
hrtimers.
Change-Id: Ib911a0d34c250c4df020bdb265b92d2b8df8db93
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add function to migrate timer that will be used by later patch set.
Change-Id: I370e404001344e635a663822b07557abbe0f6f52
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Updated commit text and fixed trivial merge conflict]
Git-commit: 3633b88d8fcb4273807574c27c328b6908a741e5
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
An hrtimer may be pinned to a CPU but inactive, so it is no longer valid
to test the hrtimer.state struct member as having no bits set when inactive.
Changed the test function to mask out the HRTIMER_STATE_PINNED bit when
checking for inactive state.
Change-Id: I632f37874ef79887ee1202a028ef734f392d6ed0
Signed-off-by: Gary S. Robertson <gary.robertson@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 902e4d4eb0d2158d2792166221a72a829caecf07
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
To isolate CPUs (isolate from hrtimers) from sysfs using cpusets, we need some
support from the hrtimer core. i.e. A routine hrtimer_quiesce_cpu() which would
migrate away all the unpinned hrtimers, but shouldn't touch the pinned ones.
This patch creates this routine.
Change-Id: I51259ea41e3bd5cdba50b718201a6840174a7224
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: d4d50a0ddc35e58ee95137ba4d14e74fea8b682f
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
'Pinned' information would be required in migrate_hrtimers() now, as we can
migrate non-pinned timers away without a hotplug (i.e. with cpuset.quiesce). And
so we may need to identify pinned timers now, as we can't migrate them.
This patch reuses the timer->state variable for setting this flag as there were
enough number of free bits available in this variable. And there is no point
increasing size of this struct by adding another field.
Change-Id: If3b3770e547971809e789ea7c8033c48ec2aa92d
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 62feaf1ed0b64c04868d143d8bdb92d60dc3189b
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
To isolate CPUs (isolate from timers) from sysfs using cpusets, we need some
support from the timer core. i.e. A routine timer_quiesce_cpu() which would
migrates away all the unpinned timers, but shouldn't touch the pinned ones.
This patch creates this routine.
Change-Id: I8624e0659b86b7b8fa425a3fafdb0784fe005124
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4. Fixes for compilation error]
Git-commit: 313910b70ea0c73f8789d9189c11e1f339080646
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
This is needed to support migration of timers during cpu isolation. A
timer might be running on the CPU that we want to isolate so we are
unable to migrate the timers at this point. We are adding a spin-loop to
wait for the timer to finish before migrating the timers.
Change-Id: I24d6e91b6dff468c640c2fe3a37a7f31b6f0c79a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g. via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).
Just panic directly.
Change-Id: I73409be3e4cfba82bae36a487227eb5260cd6e37
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-repo: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Git-commit: 29d6455178a09e1dc340380c582b13356227e8df
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.
This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making
the variable read-only. It also allows enabling further restriction
at run-time regardless of whether the default is changed.
https://lkml.org/lkml/2016/1/11/587
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Git-repo: https://android.googlesource.com/kernel/common.git
Git-commit: 012b0adcf7299f6509d4984cf46ee11e6eaed4e4
[d-cagle@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations. For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.
However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.
For example when there are 4 possible CPUs 0-3 and only CPU0 is online:
======================== ===========================
cpu_online_mask top_cpuset.effective_cpus
======================== ===========================
echo 1 > cpu2/online.
CPU hotplug notifier woke up hotplug work but not yet scheduled.
[0,2] [0]
echo 0 > cpu0/online.
The workqueue is still runnable.
[2] [0]
======================== ===========================
Now there is no intersection between cpu_online_mask and
top_cpuset.effective_cpus. Thus invoking sys_sched_setaffinity() at
this moment can cause following:
Unable to handle kernel NULL pointer dereference at virtual address 000000d0
------------[ cut here ]------------
Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 1420 Comm: taskset Tainted: G W 4.4.8+ #98
task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
PC is at guarantee_online_cpus+0x2c/0x58
LR is at cpuset_cpus_allowed+0x4c/0x6c
<snip>
Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
Call trace:
[<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
[<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
[<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
[<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
[<ffffffc000085cb0>] el0_svc_naked+0x24/0x28
The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually. Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.
CRs-fixed: 1058529
Change-Id: I83ee4619feff2ca7452119c9baecb6ffde755287
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
* tmp-bab1564:
ANDROID: mmc: Add CONFIG_MMC_SIMULATE_MAX_SPEED
android: base-cfg: Add CONFIG_INET_DIAG_DESTROY
cpufreq: interactive: only apply interactive boost when enabled
cpufreq: interactive: fix policy locking
ANDROID: dm verity fec: add sysfs attribute fec/corrected
ANDROID: android: base-cfg: enable CONFIG_DM_VERITY_FEC
UPSTREAM: dm verity: add ignore_zero_blocks feature
UPSTREAM: dm verity: add support for forward error correction
UPSTREAM: dm verity: factor out verity_for_bv_block()
UPSTREAM: dm verity: factor out structures and functions useful to separate object
UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.c
UPSTREAM: dm verity: separate function for parsing opt args
UPSTREAM: dm verity: clean up duplicate hashing code
UPSTREAM: dm: don't save and restore bi_private
mm: Export do_munmap
sdcardfs: remove unneeded __init and __exit
sdcardfs: Remove unused code
fs: Export d_absolute_path
sdcardfs: remove effectless config option
inotify: Fix erroneous update of bit count
fs: sdcardfs: Declare LOOKUP_CASE_INSENSITIVE unconditionally
trace: cpufreq: fix typo in min/max cpufreq
sdcardfs: Add support for d_canonical_path
vfs: add d_canonical_path for stacked filesystem support
sdcardfs: Bring up to date with Android M permissions:
Changed type-casting in packagelist management
Port of sdcardfs to 4.4
Included sdcardfs source code for kernel 3.0
ANDROID: usb: gadget: Add support for MTP OS desc
CHROMIUM: usb: gadget: f_accessory: add .raw_request callback
CHROMIUM: usb: gadget: audio_source: add .free_func callback
CHROMIUM: usb: gadget: f_mtp: fix usb_ss_ep_comp_descriptor
CHROMIUM: usb: gadget: f_mtp: Add SuperSpeed support
FROMLIST: mmc: block: fix ABI regression of mmc_blk_ioctl
FROMLIST: mm: ASLR: use get_random_long()
FROMLIST: drivers: char: random: add get_random_long()
FROMLIST: pstore-ram: fix NULL reference when used with pdata
usb: u_ether: Add missing rx_work init
ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
misc: uid_stat: Include linux/atomic.h instead of asm/atomic.h
hid-sensor-hub.c: fix wrong do_div() usage
power: Provide dummy log_suspend_abort_reason() if SUSPEND is disabled
PM / suspend: Add dependency on RTC_LIB
drivers: power: use 'current' instead of 'get_current()'
video: adf: Set ADF_MEMBLOCK to boolean
video: adf: Fix modular build
net: ppp: Fix modular build for PPPOLAC and PPPOPNS
net: pppolac/pppopns: Replace msg.msg_iov with iov_iter_kvec()
ANDROID: mmc: sdio: Disable retuning in sdio_reset_comm()
ANDROID: mmc: Move tracepoint creation and export symbols
ANDROID: kernel/watchdog: fix unused variable warning
ANDROID: usb: gadget: f_mtp: don't use le16 for u8 field
ANDROID: lowmemorykiller: fix declaration order warnings
ANDROID: net: fix 'const' warnings
net: diag: support v4mapped sockets in inet_diag_find_one_icsk()
net: tcp: deal with listen sockets properly in tcp_abort.
tcp: diag: add support for request sockets to tcp_abort()
net: diag: Support destroying TCP sockets.
net: diag: Support SOCK_DESTROY for inet sockets.
net: diag: Add the ability to destroy a socket.
net: diag: split inet_diag_dump_one_icsk into two
Revert "mmc: Extend wakelock if bus is dead"
Revert "mmc: core: Hold a wake lock accross delayed work + mmc rescan"
ANDROID: mmc: move to a SCHED_FIFO thread
Conflicts:
drivers/cpufreq/cpufreq_interactive.c
drivers/misc/uid_stat.c
drivers/mmc/card/block.c
drivers/mmc/card/queue.c
drivers/mmc/card/queue.h
drivers/mmc/core/core.c
drivers/mmc/core/sdio.c
drivers/staging/android/lowmemorykiller.c
drivers/usb/gadget/function/f_mtp.c
kernel/watchdog.c
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ibb4db11c57395f67dee86211a110c462e6181552
Frequency-demand conversion data structures are only used under
CONFIG_SCHED_HMP. Move them out of sched.h into hmp.c to where they
actually belong after the recent refactor.
Change-Id: I3c3eebca86062f11b80af93ba3716695eb787376
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>