android_kernel_oneplus_msm8998/kernel
Linus Torvalds 204b14581f access: avoid the RCU grace period for the temporary subjective credentials
commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream.

It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:35:01 +02:00
..
bpf bpf: silence warning messages in core 2019-08-04 09:34:46 +02:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-24 10:07:42 +02:00
debug kdb: use memmove instead of overlapping memcpy 2018-12-13 09:21:29 +01:00
events perf/core: Fix perf_sample_regs_user() mm check 2019-07-21 09:07:12 +02:00
gcov gcov: disable for COMPILE_TEST 2018-01-23 19:50:10 +01:00
irq genirq: Prevent use-after-free and work list corruption 2019-05-16 19:45:04 +02:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking locking/lockdep: Hide unused 'class' variable 2019-08-04 09:34:59 +02:00
power PM / Hibernate: Call flush_icache_range() on pages restored in-place 2019-04-03 06:23:24 +02:00
printk printk: Fix panic caused by passing log_buf_len to command line 2018-11-21 09:27:35 +01:00
rcu rcutorture: Fix cleanup path for invalid torture_type strings 2019-06-11 12:24:04 +02:00
sched sched/core: Handle overflow in cpu_shares_write_u64 2019-06-11 12:23:58 +02:00
time timer_list: Guard procfs specific code 2019-08-04 09:34:47 +02:00
trace tracing/snapshot: Resize spare buffer if size changed 2019-08-04 09:34:50 +02:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:27:08 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:09:45 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:49:16 +02:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:35:08 +02:00
auditfilter.c audit: fix a memory leak bug 2019-06-11 12:23:58 +02:00
auditsc.c audit: allow not equal op for audit by executable 2018-08-06 16:24:38 +02:00
backtracetest.c
bounds.c kbuild: fix kernel/bounds.c 'W=1' warning 2018-11-21 09:27:35 +01:00
capability.c exec: Ensure mm->user_ns contains the execed files 2017-01-06 11:16:14 +01:00
cgroup.c cgroup: Fix deadlock in cpu hotplug path 2018-10-13 09:11:33 +02:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu.c cpu/speculation: Warn on unsupported mitigations= parameter 2019-07-10 09:56:36 +02:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:27:35 +02:00
crash_dump.c
cred.c access: avoid the RCU grace period for the temporary subjective credentials 2019-08-04 09:35:01 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c kernel/exit.c: release ptraced tasks before zap_pid_ns_processes 2019-02-06 19:43:07 +01:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:44:56 +02:00
fork.c fork: record start_time late 2019-01-13 10:05:32 +01:00
freezer.c
futex.c futex: Fix futex lock the wrong page 2019-06-22 08:18:22 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
hung_task.c kernel/hung_task.c: break RCU locks based on jiffies 2019-02-20 10:13:14 +01:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-16 10:33:55 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_core.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-24 10:07:36 +02:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c kprobes: Fix error check when reusing optimized probes 2019-04-27 09:34:02 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-09-09 20:04:34 +02:00
latencytop.c
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:37:26 +01:00
memremap.c mm, devm_memremap_pages: kill mapping "System RAM" support 2019-01-13 10:05:32 +01:00
module-internal.h
module.c module: exclude SHN_UNDEF symbols from kallsyms api 2018-10-10 08:52:07 +02:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c padata: use smp_mb in padata_reorder to avoid orphaned padata jobs 2019-08-04 09:34:51 +02:00
panic.c kernel/panic.c: add missing \n 2017-07-05 14:37:19 +02:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2018-04-13 19:50:03 +02:00
pid_namespace.c signal/pid_namespace: Fix reboot_pid_ns to use send_sig not force_sig 2019-08-04 09:34:42 +02:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2018-02-25 11:03:44 +01:00
ptrace.c ptrace: Fix ->ptracer_cred handling for PTRACE_TRACEME 2019-07-10 09:56:42 +02:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:49:00 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:32:05 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-07-25 10:18:27 +02:00
signal.c kernel/signal.c: trace_signal_deliver when signal_group_exit 2019-06-11 12:24:10 +02:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys.c kernel/sys.c: prctl: fix false positive in validate_prctl_map() 2019-06-22 08:18:18 +02:00
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sysctl.c sysctl: return -EINVAL if val violates minmax 2019-06-22 08:18:17 +02:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-16 10:06:47 +02:00
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 20:04:35 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:04:35 +02:00
watchdog.c kernel/watchdog: use nmi registers snapshot in hardlockup handler 2017-01-06 11:16:16 +01:00
workqueue.c workqueue: use put_device() instead of kfree() 2018-05-30 07:49:04 +02:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 17:13:11 +01:00