android_kernel_oneplus_msm8998/kernel
Mel Gorman 93dcb09e29 futex: Remove requirement for lock_page() in get_futex_key()
commit 65d8fc777f6dcfee12785c057a6b57f679641c90 upstream.

When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.

Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.

Task A
     get_futex_key()
     lock_page()
    ---> preempted

Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.

With this patch, we pretty much have a lockless futex_get_key().

Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.

Signed-off-by: Mel Gorman <mgorman@suse.de>
[ Ported on top of thp refcount rework, changelog, comments, fixes. ]
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Chris Mason <clm@fb.com>
Cc: Darren Hart <dvhart@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Chenbo Feng <fengc@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:24 +02:00
..
bpf bpf: skip unnecessary capability check 2018-03-28 18:40:17 +02:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-24 10:07:42 +02:00
debug kdb: Fix handling of kallsyms_symbol_next() return value 2017-12-16 10:33:49 +01:00
events perf/core: Correct event creation with PERF_FORMAT_GROUP 2018-04-13 19:50:19 +02:00
gcov gcov: disable for COMPILE_TEST 2018-01-23 19:50:10 +01:00
irq genirq: Use cpumask_available() for check of cpumask variable 2018-04-08 11:51:57 +02:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking locking/mutex: Allow next waiter lockless wakeup 2018-01-17 09:35:27 +01:00
power sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:27:35 +02:00
printk braille-console: Fix value returned by _braille_console_setup 2018-03-22 09:23:23 +01:00
rcu rcu: Allow for page faults in NMI handlers 2017-10-18 09:20:41 +02:00
sched sched/numa: Use down_read_trylock() for the mmap_sem 2018-04-13 19:50:09 +02:00
time time: Change posix clocks ops interfaces to use timespec64 2018-03-24 10:58:40 +01:00
trace tracing: probeevent: Fix to support minus offset from symbol 2018-03-28 18:40:15 +02:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:27:08 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:09:45 +01:00
audit.c audit: add tty field to LOGIN event 2018-04-08 11:51:57 +02:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_fsnotify.c
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c audit: Fix use after free in audit_remove_watch_rule() 2017-08-24 17:02:35 -07:00
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c audit: add tty field to LOGIN event 2018-04-08 11:51:57 +02:00
backtracetest.c
bounds.c
capability.c exec: Ensure mm->user_ns contains the execed files 2017-01-06 11:16:14 +01:00
cgroup.c cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups 2017-04-21 09:30:04 +02:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
compat.c
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu.c stable-fixup: hotplug: fix unused function warning 2017-01-12 11:22:48 +01:00
cpu_pm.c
cpuset.c sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:27:35 +02:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-09-15 08:27:49 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c wait/ptrace: assume __WALL if the child is traced 2016-06-07 18:14:35 -07:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:44:56 +02:00
fork.c kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE 2018-01-05 15:44:23 +01:00
freezer.c
futex.c futex: Remove requirement for lock_page() in get_futex_key() 2018-04-13 19:50:24 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-16 10:33:55 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_core.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-24 10:07:36 +02:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c kprobes/x86: Fix to set RWX bits correctly before releasing trampoline 2018-04-08 11:51:56 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups 2017-04-21 09:30:04 +02:00
latencytop.c
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:37:26 +01:00
memremap.c mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} 2017-01-19 20:17:18 +01:00
module-internal.h
module.c module/retpoline: Warn about missing retpoline in module 2018-02-25 11:03:52 +01:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c
nsproxy.c
padata.c padata: free correct variable 2017-05-20 14:27:02 +02:00
panic.c kernel/panic.c: add missing \n 2017-07-05 14:37:19 +02:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2018-04-13 19:50:03 +02:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-25 14:30:11 +02:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2018-02-25 11:03:44 +01:00
ptrace.c ptrace: Properly initialize ptracer_cred on fork 2017-06-14 13:16:20 +02:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c
resource.c /proc/iomem: only expose physical resource addresses to privileged users 2017-08-06 19:19:42 -07:00
seccomp.c seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter() 2017-10-05 09:41:46 +02:00
signal.c kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal() 2018-01-10 09:27:11 +01:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys.c prctl: take mmap sem for writing to protect against others 2016-02-25 12:01:25 -08:00
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sysctl.c timer/sysclt: Restrict timer migration sysctl values to 0 and 1 2017-10-05 09:41:47 +02:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c kernel/watchdog: use nmi registers snapshot in hardlockup handler 2017-01-06 11:16:16 +01:00
workqueue.c workqueue: Allow retrieval of current task's work struct 2018-03-18 11:17:48 +01:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 17:13:11 +01:00