android_kernel_oneplus_msm8998/kernel
Daniel Borkmann e25dc63aa3 bpf: generally move prog destruction to RCU deferral
[ Upstream commit 1aacde3d22c42281236155c1ef6d7a5aa32a826b ]

Jann Horn reported following analysis that could potentially result
in a very hard to trigger (if not impossible) UAF race, to quote his
event timeline:

 - Set up a process with threads T1, T2 and T3
 - Let T1 set up a socket filter F1 that invokes another filter F2
   through a BPF map [tail call]
 - Let T1 trigger the socket filter via a unix domain socket write,
   don't wait for completion
 - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
 - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
 - Let T3 close the file descriptor for F2, dropping the reference
   count of F2 to 2
 - At this point, T1 should have looked up F2 from the map, but not
   finished executing it
 - Let T3 remove F2 from the BPF map, dropping the reference count of
   F2 to 1
 - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
   the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
   via schedule_work()
 - At this point, the BPF program could be freed
 - BPF execution is still running in a freed BPF program

While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
event fd we're doing the syscall on doesn't disappear from underneath us
for whole syscall time, it may not be the case for the bpf fd used as
an argument only after we did the put. It needs to be a valid fd pointing
to a BPF program at the time of the call to make the bpf_prog_get() and
while T2 gets preempted, F2 must have dropped reference to 1 on the other
CPU. The fput() from the close() in T3 should also add additionally delay
to the reference drop via exit_task_work() when bpf_prog_release() gets
called as well as scheduling bpf_prog_free_deferred().

That said, it makes nevertheless sense to move the BPF prog destruction
generally after RCU grace period to guarantee that such scenario above,
but also others as recently fixed in ceb56070359b ("bpf, perf: delay release
of BPF prog after grace period") with regards to tail calls won't happen.
Integrating bpf_prog_free_deferred() directly into the RCU callback is
not allowed since the invocation might happen from either softirq or
process context, so we're not permitted to block. Reviewing all bpf_prog_put()
invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
their destruction) with call_rcu() look good to me.

Since we don't know whether at the time of attaching the program, we're
already part of a tail call map, we need to use RCU variant. However, due
to this, there won't be severely more stress on the RCU callback queue:
situations with above bpf_prog_get() and bpf_prog_put() combo in practice
normally won't lead to releases, but even if they would, enough effort/
cycles have to be put into loading a BPF program into the kernel already.

Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10 07:41:37 -08:00
..
bpf bpf: generally move prog destruction to RCU deferral 2018-11-10 07:41:37 -08:00
configs kconfig: tinyconfig: provide whole choice blocks to avoid warnings 2016-09-24 10:07:42 +02:00
debug kdb: make "mdr" command repeat 2018-05-30 07:49:17 +02:00
events bpf: generally move prog destruction to RCU deferral 2018-11-10 07:41:37 -08:00
gcov gcov: disable for COMPILE_TEST 2018-01-23 19:50:10 +01:00
irq genirq: Delay incrementing interrupt count if it's disabled/pending 2018-09-15 09:40:40 +02:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking locking/osq_lock: Fix osq_lock queue corruption 2018-09-19 22:48:56 +02:00
power PM / sleep: wakeup: Fix build error caused by missing SRCU support 2018-09-09 20:04:34 +02:00
printk braille-console: Fix value returned by _braille_console_setup 2018-03-22 09:23:23 +01:00
rcu rcu: Allow for page faults in NMI handlers 2017-10-18 09:20:41 +02:00
sched sched/cgroup: Fix cgroup entity load tracking tear-down 2018-11-10 07:41:35 -08:00
time alarmtimer: Prevent overflow for relative nanosleep 2018-10-10 08:52:05 +02:00
trace tracing: Skip more functions when doing stack tracing of events 2018-11-10 07:41:35 -08:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-10 09:27:08 +01:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-16 20:09:45 +01:00
audit.c audit: return on memory error to avoid null pointer dereference 2018-05-30 07:49:16 +02:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_fsnotify.c audit: clean simple fsnotify implementation 2015-08-06 16:14:53 -04:00
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c audit: fix use-after-free in audit_add_watch 2018-09-26 08:35:08 +02:00
auditfilter.c audit: allow not equal op for audit by executable 2018-08-06 16:24:38 +02:00
auditsc.c audit: allow not equal op for audit by executable 2018-08-06 16:24:38 +02:00
backtracetest.c
bounds.c
capability.c exec: Ensure mm->user_ns contains the execed files 2017-01-06 11:16:14 +01:00
cgroup.c cgroup: Fix deadlock in cpu hotplug path 2018-10-13 09:11:33 +02:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
compat.c compat: cleanup coding in compat_get_bitmap() and compat_put_bitmap() 2015-06-04 23:57:18 +02:00
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu.c stable-fixup: hotplug: fix unused function warning 2017-01-12 11:22:48 +01:00
cpu_pm.c kernel/cpu_pm: fix cpu_cluster_pm_exit comment 2015-09-03 02:42:20 +02:00
cpuset.c sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs 2017-10-12 11:27:35 +02:00
crash_dump.c
cred.c cred: Reject inodes with invalid ids in set_create_file_as() 2016-09-15 08:27:49 +02:00
delayacct.c
dma.c
elfcore.c
exec_domain.c Remove rest of exec domains. 2015-04-12 21:03:31 +02:00
exit.c kernel/exit.c: avoid undefined behaviour when calling wait4() 2018-05-26 08:48:51 +02:00
extable.c kernel/extable.c: mark core_kernel_text notrace 2017-07-21 07:44:56 +02:00
fork.c kthread: Fix use-after-free if kthread fork fails 2018-09-19 22:48:55 +02:00
freezer.c
futex.c futex: futex_wake_op, fix sign_extend32 sign bits 2018-05-26 08:48:51 +02:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
hung_task.c kernel/hung_task.c: change hung_task.c to use for_each_process_thread() 2015-04-15 16:35:22 -07:00
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c jump_label: Invoke jump_label_test() via early_initcall() 2017-12-16 10:33:55 +01:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/qrwlock: Rename QUEUE_RWLOCK to QUEUED_RWLOCKS 2015-05-12 09:46:00 +02:00
Kconfig.preempt
kexec.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_core.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_file.c kexec: fix double-free when failing to relocate the purgatory 2016-09-24 10:07:36 +02:00
kexec_internal.h kexec: split kexec_file syscall code to kexec_file.c 2015-09-10 13:29:01 -07:00
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c kprobes: Make list and blacklist root user read only 2018-09-05 09:18:40 +02:00
ksysfs.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
kthread.c kthread, tracing: Don't expose half-written comm when creating kthreads 2018-09-09 20:04:34 +02:00
latencytop.c
Makefile sys_membarrier(): system-wide memory barrier (generic, x86) 2015-09-11 15:21:34 -07:00
membarrier.c Fix: Disable sys_membarrier when nohz_full is enabled 2017-03-12 06:37:26 +01:00
memremap.c mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done} 2017-01-19 20:17:18 +01:00
module-internal.h
module.c module: exclude SHN_UNDEF symbols from kallsyms api 2018-10-10 08:52:07 +02:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 08:40:25 -07:00
nsproxy.c
padata.c padata: free correct variable 2017-05-20 14:27:02 +02:00
panic.c kernel/panic.c: add missing \n 2017-07-05 14:37:19 +02:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid() 2018-04-13 19:50:03 +02:00
pid_namespace.c pid_ns: Sleep in TASK_INTERRUPTIBLE in zap_pid_ns_processes 2017-05-25 14:30:11 +02:00
profile.c profile: hide unused functions when !CONFIG_PROC_FS 2018-02-25 11:03:44 +01:00
ptrace.c ptrace: Properly initialize ptracer_cred on fork 2017-06-14 13:16:20 +02:00
range.c
reboot.c kexec: split kexec_load syscall from kexec core code 2015-09-10 13:29:01 -07:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-05-30 07:49:00 +02:00
resource.c resource: fix integer overflow at reallocation 2018-04-24 09:32:05 +02:00
seccomp.c seccomp: Move speculation migitation control to arch code 2018-07-25 10:18:27 +02:00
signal.c kernel/signal.c: avoid undefined behaviour in kill_something_info 2018-05-30 07:48:52 +02:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:04:35 +02:00
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sysctl.c sched/sysctl: Check user input value of sysctl_sched_time_avg 2018-09-05 09:18:33 +02:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
task_work.c task_work: remove fifo ordering guarantee 2015-09-05 13:46:58 -07:00
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Do not warn on ENOMEM 2018-05-16 10:06:47 +02:00
tsacct.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2018-01-10 09:27:10 +01:00
up.c
user-return-notifier.c
user.c
user_namespace.c userns: move user access out of the mutex 2018-09-09 20:04:35 +02:00
utsname.c
utsname_sysctl.c sys: don't hold uts_sem while accessing userspace memory 2018-09-09 20:04:35 +02:00
watchdog.c kernel/watchdog: use nmi registers snapshot in hardlockup handler 2017-01-06 11:16:16 +01:00
workqueue.c workqueue: use put_device() instead of kfree() 2018-05-30 07:49:04 +02:00
workqueue_internal.h workqueue: Fix NULL pointer dereference 2017-11-15 17:13:11 +01:00