After the introduction of "33c24b sched: add cpu isolation support"
select_fallback_rq() might sometimes be unable find any CPU to place
a task on. This happens when the all online CPUs are isolated and
the allow isolated flag is set to false. In such cases, we have
little choice but to use an isolated CPU and wait for core control
to eventually un-isolate one or more online CPUs.
Change-Id: Id8738bd8493c11731c5491efcc99eb90f051233e
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
A timer might be running when we are trying to move the timer to another
CPU so ensure that we wait for the timer to finish before migrating.
Change-Id: I4c9ee39c715baebfbdb8a50476a475e38b092f70
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
The load balancer restrictions are in place to control the tasks
migration from the lower capacity cluster to higher capacity
cluster to save power. The assumption here is that higher capacity
cluster will have higher power cost which may not be necessarily
true for all platforms. Use power cost based checks instead of
capacity based checks while applying the inter cluster migration
restrictions.
Change-Id: Id9519eb8f7b183a2e9fca87a23cf95e951aa4005
Signed-off-by: Pavankumar Kondeti <pkondeti@codeaurora.org>
Replace hotplug functionality in core control with cpu isolation
and integrate into scheduler.
Change-Id: I4f1514ba5bac2e259a1105fcafb31d6a92ddd249
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Refactor cpu data into cpu data and cluster data to improve readability and
ease of understanding the code.
Change-Id: I96505aeb9d07a6fa3a2c28648ffa299e0cfa2e41
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Move the core control trace events to scheduler trace event file.
Change-Id: I65943d8e4a9eac1f9f5a40ad5aaf166679215f48
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Move core control from out-of-tree module into the kernel proper.
Core control monitors load on CPUs and controls how many CPUs are
available for the system to use at any point in time. This can help save
power. Core control can be configured through sysfs interface.
Change-Id: Ia78e701468ea3828195c2a15c9cf9fafd099804a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Remove the core control helper code since this is not needed anymore
with subsequent patches that moves core control into the kernel.
Change-Id: I62acddeb707fc7d5626580166b3466e63f45fd89
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Ensure perf events does not wake up idle cores when core is isolated.
Change-Id: Ifefb2f1cf6c24af7bc46fc62797955b8c8ad5815
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Set long latency requirement for isolated cores to ensure LPM logic will
select a deep sleep state.
Change-Id: I83e9fbb800df259616a145d311b50627dc42a5ff
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Prohibit setting the affinity of an IRQ to an isolated core.
Change-Id: I7b50778615541a64f9956573757c7f28748c4f69
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add tracepoint to capture the cpu isolation event including KPI for
time it took to isolate.
Change-Id: If2d30000f068afc50db953940f4636ef6a089b24
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
This adds cpu isolation APIs to the scheduler to isolate and unisolate
CPUs. Isolating and unisolating a CPU can be used in place of hotplug.
Isolating and unisolating a CPU is faster than hotplug and can thus be
used to optimize the performance and power of multi-core CPUs.
Isolating works by migrating non-pinned IRQs and tasks to other CPUS and
marking the CPU as not available to the scheduler and load balancer.
Pinned tasks and IRQs are still allowed to run but it is expected that
this would be minimal.
Unisolation works by just marking the CPU available for scheduler and
load balancer.
Change-Id: I0bbddb56238c2958c5987877c5bfc3e79afa67cc
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
HMP scheduler tunables can be constrained via extra1 and extra2 of
ctl_table. Having valid range in the sysctl table gives clearer
view of tunable's range.
Also add range for sched_select_prev_cpu_us so we can avoid invalid
value configuration of that tunable.
CRs-fixed: 1056910
Change-Id: I09fcc019133f4d37b7be3287da8e0733e40fc0ac
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Open up interface to allow external subsystem to enable and disable hard
lockup detector.
Change-Id: I88a728ee1d54aaa887fab52e5e40d1d4e4fc69ca
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add bitmask and corresponding supporting functions for cpu isolation.
Change-Id: Ice1a9503666a2b720bdb324289ca55ceb33097cd
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Do not require CPUSETS to be enabled to allow migration of timers and
hrtimers.
Change-Id: Ib911a0d34c250c4df020bdb265b92d2b8df8db93
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Add function to migrate timer that will be used by later patch set.
Change-Id: I370e404001344e635a663822b07557abbe0f6f52
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Updated commit text and fixed trivial merge conflict]
Git-commit: 3633b88d8fcb4273807574c27c328b6908a741e5
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
An hrtimer may be pinned to a CPU but inactive, so it is no longer valid
to test the hrtimer.state struct member as having no bits set when inactive.
Changed the test function to mask out the HRTIMER_STATE_PINNED bit when
checking for inactive state.
Change-Id: I632f37874ef79887ee1202a028ef734f392d6ed0
Signed-off-by: Gary S. Robertson <gary.robertson@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 902e4d4eb0d2158d2792166221a72a829caecf07
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
To isolate CPUs (isolate from hrtimers) from sysfs using cpusets, we need some
support from the hrtimer core. i.e. A routine hrtimer_quiesce_cpu() which would
migrate away all the unpinned hrtimers, but shouldn't touch the pinned ones.
This patch creates this routine.
Change-Id: I51259ea41e3bd5cdba50b718201a6840174a7224
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: d4d50a0ddc35e58ee95137ba4d14e74fea8b682f
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
'Pinned' information would be required in migrate_hrtimers() now, as we can
migrate non-pinned timers away without a hotplug (i.e. with cpuset.quiesce). And
so we may need to identify pinned timers now, as we can't migrate them.
This patch reuses the timer->state variable for setting this flag as there were
enough number of free bits available in this variable. And there is no point
increasing size of this struct by adding another field.
Change-Id: If3b3770e547971809e789ea7c8033c48ec2aa92d
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4]
Git-commit: 62feaf1ed0b64c04868d143d8bdb92d60dc3189b
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
To isolate CPUs (isolate from timers) from sysfs using cpusets, we need some
support from the timer core. i.e. A routine timer_quiesce_cpu() which would
migrates away all the unpinned timers, but shouldn't touch the pinned ones.
This patch creates this routine.
Change-Id: I8624e0659b86b7b8fa425a3fafdb0784fe005124
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[forward port to 3.18]
Signed-off-by: Santosh Shukla <santosh.shukla@linaro.org>
[ohaugan@codeaurora.org: Port to 4.4. Fixes for compilation error]
Git-commit: 313910b70ea0c73f8789d9189c11e1f339080646
Git-repo: git://git.linaro.org/people/mike.holmes/santosh.shukla/lng-isol.git
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
This is needed to support migration of timers during cpu isolation. A
timer might be running on the CPU that we want to isolate so we are
unable to migrate the timers at this point. We are adding a spin-loop to
wait for the timer to finish before migrating the timers.
Change-Id: I24d6e91b6dff468c640c2fe3a37a7f31b6f0c79a
Signed-off-by: Olav Haugan <ohaugan@codeaurora.org>
Until now, hitting this BUG_ON caused a recursive oops (because oops
handling involves do_exit(), which calls into the scheduler, which in
turn raises an oops), which caused stuff below the stack to be
overwritten until a panic happened (e.g. via an oops in interrupt
context, caused by the overwritten CPU index in the thread_info).
Just panic directly.
Change-Id: I73409be3e4cfba82bae36a487227eb5260cd6e37
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-repo: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git
Git-commit: 29d6455178a09e1dc340380c582b13356227e8df
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.
This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making
the variable read-only. It also allows enabling further restriction
at run-time regardless of whether the default is changed.
https://lkml.org/lkml/2016/1/11/587
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Git-repo: https://android.googlesource.com/kernel/common.git
Git-commit: 012b0adcf7299f6509d4984cf46ee11e6eaed4e4
[d-cagle@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Dennis Cagle <d-cagle@codeaurora.org>
Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations. For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.
However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.
For example when there are 4 possible CPUs 0-3 and only CPU0 is online:
======================== ===========================
cpu_online_mask top_cpuset.effective_cpus
======================== ===========================
echo 1 > cpu2/online.
CPU hotplug notifier woke up hotplug work but not yet scheduled.
[0,2] [0]
echo 0 > cpu0/online.
The workqueue is still runnable.
[2] [0]
======================== ===========================
Now there is no intersection between cpu_online_mask and
top_cpuset.effective_cpus. Thus invoking sys_sched_setaffinity() at
this moment can cause following:
Unable to handle kernel NULL pointer dereference at virtual address 000000d0
------------[ cut here ]------------
Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 1420 Comm: taskset Tainted: G W 4.4.8+ #98
task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
PC is at guarantee_online_cpus+0x2c/0x58
LR is at cpuset_cpus_allowed+0x4c/0x6c
<snip>
Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
Call trace:
[<ffffffc0001389b0>] guarantee_online_cpus+0x2c/0x58
[<ffffffc00013b208>] cpuset_cpus_allowed+0x4c/0x6c
[<ffffffc0000d61f0>] sched_setaffinity+0xc0/0x1ac
[<ffffffc0000d6374>] SyS_sched_setaffinity+0x98/0xac
[<ffffffc000085cb0>] el0_svc_naked+0x24/0x28
The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually. Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.
CRs-fixed: 1058529
Change-Id: I83ee4619feff2ca7452119c9baecb6ffde755287
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org> # 3.17+
Signed-off-by: Tejun Heo <tj@kernel.org>
* tmp-bab1564:
ANDROID: mmc: Add CONFIG_MMC_SIMULATE_MAX_SPEED
android: base-cfg: Add CONFIG_INET_DIAG_DESTROY
cpufreq: interactive: only apply interactive boost when enabled
cpufreq: interactive: fix policy locking
ANDROID: dm verity fec: add sysfs attribute fec/corrected
ANDROID: android: base-cfg: enable CONFIG_DM_VERITY_FEC
UPSTREAM: dm verity: add ignore_zero_blocks feature
UPSTREAM: dm verity: add support for forward error correction
UPSTREAM: dm verity: factor out verity_for_bv_block()
UPSTREAM: dm verity: factor out structures and functions useful to separate object
UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.c
UPSTREAM: dm verity: separate function for parsing opt args
UPSTREAM: dm verity: clean up duplicate hashing code
UPSTREAM: dm: don't save and restore bi_private
mm: Export do_munmap
sdcardfs: remove unneeded __init and __exit
sdcardfs: Remove unused code
fs: Export d_absolute_path
sdcardfs: remove effectless config option
inotify: Fix erroneous update of bit count
fs: sdcardfs: Declare LOOKUP_CASE_INSENSITIVE unconditionally
trace: cpufreq: fix typo in min/max cpufreq
sdcardfs: Add support for d_canonical_path
vfs: add d_canonical_path for stacked filesystem support
sdcardfs: Bring up to date with Android M permissions:
Changed type-casting in packagelist management
Port of sdcardfs to 4.4
Included sdcardfs source code for kernel 3.0
ANDROID: usb: gadget: Add support for MTP OS desc
CHROMIUM: usb: gadget: f_accessory: add .raw_request callback
CHROMIUM: usb: gadget: audio_source: add .free_func callback
CHROMIUM: usb: gadget: f_mtp: fix usb_ss_ep_comp_descriptor
CHROMIUM: usb: gadget: f_mtp: Add SuperSpeed support
FROMLIST: mmc: block: fix ABI regression of mmc_blk_ioctl
FROMLIST: mm: ASLR: use get_random_long()
FROMLIST: drivers: char: random: add get_random_long()
FROMLIST: pstore-ram: fix NULL reference when used with pdata
usb: u_ether: Add missing rx_work init
ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
misc: uid_stat: Include linux/atomic.h instead of asm/atomic.h
hid-sensor-hub.c: fix wrong do_div() usage
power: Provide dummy log_suspend_abort_reason() if SUSPEND is disabled
PM / suspend: Add dependency on RTC_LIB
drivers: power: use 'current' instead of 'get_current()'
video: adf: Set ADF_MEMBLOCK to boolean
video: adf: Fix modular build
net: ppp: Fix modular build for PPPOLAC and PPPOPNS
net: pppolac/pppopns: Replace msg.msg_iov with iov_iter_kvec()
ANDROID: mmc: sdio: Disable retuning in sdio_reset_comm()
ANDROID: mmc: Move tracepoint creation and export symbols
ANDROID: kernel/watchdog: fix unused variable warning
ANDROID: usb: gadget: f_mtp: don't use le16 for u8 field
ANDROID: lowmemorykiller: fix declaration order warnings
ANDROID: net: fix 'const' warnings
net: diag: support v4mapped sockets in inet_diag_find_one_icsk()
net: tcp: deal with listen sockets properly in tcp_abort.
tcp: diag: add support for request sockets to tcp_abort()
net: diag: Support destroying TCP sockets.
net: diag: Support SOCK_DESTROY for inet sockets.
net: diag: Add the ability to destroy a socket.
net: diag: split inet_diag_dump_one_icsk into two
Revert "mmc: Extend wakelock if bus is dead"
Revert "mmc: core: Hold a wake lock accross delayed work + mmc rescan"
ANDROID: mmc: move to a SCHED_FIFO thread
Conflicts:
drivers/cpufreq/cpufreq_interactive.c
drivers/misc/uid_stat.c
drivers/mmc/card/block.c
drivers/mmc/card/queue.c
drivers/mmc/card/queue.h
drivers/mmc/core/core.c
drivers/mmc/core/sdio.c
drivers/staging/android/lowmemorykiller.c
drivers/usb/gadget/function/f_mtp.c
kernel/watchdog.c
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
Change-Id: Ibb4db11c57395f67dee86211a110c462e6181552
Frequency-demand conversion data structures are only used under
CONFIG_SCHED_HMP. Move them out of sched.h into hmp.c to where they
actually belong after the recent refactor.
Change-Id: I3c3eebca86062f11b80af93ba3716695eb787376
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
commit 8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4 upstream.
There are three subsystem callbacks in css shutdown path -
css_offline(), css_released() and css_free(). Except for
css_released(), cgroup core didn't guarantee the order of invocation.
css_offline() or css_free() could be called on a parent css before its
children. This behavior is unexpected and led to bugs in cpu and
memory controller.
The previous patch updated ordering for css_offline() which fixes the
cpu controller issue. While there currently isn't a known bug caused
by misordering of css_free() invocations, let's fix it too for
consistency.
css_free() ordering can be trivially fixed by moving putting of the
parent css below css_free() invocation.
Change-Id: I97febdd414ef5cd57490ce2746650dde7fdda28f
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>i
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Git-commit: 8bb5ef79bc0f4016ecf79e8dce6096a3c63603e4
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Runmin Wang <runminw@codeaurora.org>
The structures being moved around are only used for trace events
defined under CONFIG_SCHED_HMP. Move code to hmp.c to reflect
the same.
Change-Id: Ib959355264405ab779b24948f111a2ca61d367de
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork. It is also grabbed in write mode during
__cgroups_proc_write(). I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see
systemd
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
percpu_down_write+0x114/0x170
__cgroup_procs_write.isra.12+0xb8/0x3c0
cgroup_file_write+0x74/0x1a0
kernfs_fop_write+0x188/0x200
__vfs_write+0x6c/0xe0
vfs_write+0xc0/0x230
SyS_write+0x6c/0x110
system_call+0x38/0xb4
This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit. The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
jbd2_log_wait_commit+0xd4/0x180
ext4_evict_inode+0x88/0x5c0
evict+0xf8/0x2a0
dispose_list+0x50/0x80
prune_icache_sb+0x6c/0x90
super_cache_scan+0x190/0x210
shrink_slab.part.15+0x22c/0x4c0
shrink_zone+0x288/0x3c0
do_try_to_free_pages+0x1dc/0x590
try_to_free_pages+0xdc/0x260
__alloc_pages_nodemask+0x72c/0xc90
alloc_pages_current+0xb4/0x1a0
page_table_alloc+0xc0/0x170
__pte_alloc+0x58/0x1f0
copy_page_range+0x4ec/0x950
copy_process.isra.5+0x15a0/0x1870
_do_fork+0xa8/0x4b0
ppc_clone+0x8/0xc
In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.
This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork(). There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup. This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.
tj: Subject and description edits.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: e91f1799ff
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 0c3240a1ef
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.
This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.
This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.
Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[jstultz: Backported to 4.4]
Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f
Signed-off-by: John Stultz <john.stultz@linaro.org>
Git-commit: 3228c5eb7a
Git-repo: https://android.googlesource.com/kernel/common/+/android-4.4
Signed-off-by: Omprakash Dhyade <odhyade@codeaurora.org>
This reverts commit 9d6fd2c3e9 ("Merge remote-tracking branch
'msm-4.4/tmp-510d0a3f' into msm-4.4"), because it breaks the
dump parsing tools due to kernel can be loaded anywhere in the memory
now and not fixed at linear mapping.
Change-Id: Id416f0a249d803442847d09ac47781147b0d0ee6
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>
On arm systems the spin on owner optimization can intermittently cause a
lockup that's usually as long as the waiting thread's cpu timeslice. The
repeated mutex aquisitions + atomics in a single spinning thread can
completely lock out the owner from releasing the kernel mutex. The
owner needs to acquire a spinlock on the relase path and this spinlock
can share a monitor with the other locks and atomics on the waiter path.
Rate limit the waiter so that the thread releasing the mutex never
is starved.
Bug 23036902
Change-Id: Ie1b64275a0c6141f94faaf3e63fcbf9b5438140c
Signed-off-by: Riley Andrews <riandrews@google.com>
Git-commit: 84d8ce7e0025cac60a8a379a7ee3e59d640fbc03
Git-repo: https://android.googlesource.com/kernel/msm.git
Signed-off-by: Trilok Soni <tsoni@codeaurora.org>