android_kernel_oneplus_msm8998/kernel
Peter Zijlstra 43b1bfec0e sched/fair: Fix cfs_rq avg tracking underflow
commit 8974189222159154c55f24ddad33e3613960521a upstream.

As per commit:

  b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization")

> the code generated from update_cfs_rq_load_avg():
>
> 	if (atomic_long_read(&cfs_rq->removed_load_avg)) {
> 		s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0);
> 		sa->load_avg = max_t(long, sa->load_avg - r, 0);
> 		sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0);
> 		removed_load = 1;
> 	}
>
> turns into:
>
> ffffffff81087064:       49 8b 85 98 00 00 00    mov    0x98(%r13),%rax
> ffffffff8108706b:       48 85 c0                test   %rax,%rax
> ffffffff8108706e:       74 40                   je     ffffffff810870b0 <update_blocked_averages+0xc0>
> ffffffff81087070:       4c 89 f8                mov    %r15,%rax
> ffffffff81087073:       49 87 85 98 00 00 00    xchg   %rax,0x98(%r13)
> ffffffff8108707a:       49 29 45 70             sub    %rax,0x70(%r13)
> ffffffff8108707e:       4c 89 f9                mov    %r15,%rcx
> ffffffff81087081:       bb 01 00 00 00          mov    $0x1,%ebx
> ffffffff81087086:       49 83 7d 70 00          cmpq   $0x0,0x70(%r13)
> ffffffff8108708b:       49 0f 49 4d 70          cmovns 0x70(%r13),%rcx
>
> Which you'll note ends up with sa->load_avg -= r in memory at
> ffffffff8108707a.

So I _should_ have looked at other unserialized users of ->load_avg,
but alas. Luckily nikbor reported a similar /0 from task_h_load() which
instantly triggered recollection of this here problem.

Aside from the intermediate value hitting memory and causing problems,
there's another problem: the underflow detection relies on the signed
bit. This reduces the effective width of the variables, IOW its
effectively the same as having these variables be of signed type.

This patch changes to a different means of unsigned underflow
detection to not rely on the signed bit. This allows the variables to
use the 'full' unsigned range. And it does so with explicit LOAD -
STORE to ensure any intermediate value will never be visible in
memory, allowing these unserialized loads.

Note: GCC generates crap code for this, might warrant a look later.

Note2: I say 'full' above, if we end up at U*_MAX we'll still explode;
       maybe we should do clamping on add too.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yuyang Du <yuyang.du@intel.com>
Cc: bsegall@google.com
Cc: kernel@kyup.com
Cc: morten.rasmussen@arm.com
Cc: pjt@google.com
Cc: steve.muckle@linaro.org
Fixes: 9d89c257df ("sched/fair: Rewrite runnable load and utilization average tracking")
Link: http://lkml.kernel.org/r/20160617091948.GJ30927@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-07-27 09:47:31 -07:00
..
bpf bpf, inode: disallow userns mounts 2016-06-24 10:18:17 -07:00
configs
debug
events bpf, perf: delay release of BPF prog after grace period 2016-07-11 09:31:11 -07:00
gcov
irq genirq: Validate action before dereferencing it in handle_irq_event_percpu() 2016-03-03 15:07:11 -08:00
livepatch livepatch: x86: fix relocation computation with kASLR 2015-11-11 17:36:04 +01:00
locking locking/qspinlock: Fix spin_unlock_wait() some more 2016-07-27 09:47:29 -07:00
power PM / sleep: Clear pm_suspend_global_flags upon hibernate 2016-04-12 09:09:05 -07:00
printk printk: do cond_resched() between lines while outputting to consoles 2016-02-17 12:30:57 -08:00
rcu Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD 2015-10-07 16:06:25 -07:00
sched sched/fair: Fix cfs_rq avg tracking underflow 2016-07-27 09:47:31 -07:00
time tick/nohz: Set the correct expiry when switching to nohz/lowres mode 2016-03-03 15:07:26 -08:00
trace ring-buffer: Prevent overflow of size in ring_buffer_resize() 2016-06-01 12:15:49 -07:00
.gitignore certs: add .gitignore to stop git nagging about x509_certificate_list 2015-10-21 15:18:35 +01:00
acct.c
async.c
audit.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
audit.h audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_fsnotify.c
audit_tree.c audit: audit_tree_match can be boolean 2015-11-04 08:23:51 -05:00
audit_watch.c
auditfilter.c audit: fix comment block whitespace 2015-11-04 08:23:51 -05:00
auditsc.c
backtracetest.c
bounds.c
capability.c
cgroup.c cgroup: make sure a parent css isn't freed before its children 2016-05-04 14:48:49 -07:00
cgroup_freezer.c cgroup: fix handling of multi-destination migration from subtree_control enabling 2015-12-03 10:18:21 -05:00
cgroup_pids.c cgroup_pids: don't account for the root cgroup 2015-12-03 10:18:21 -05:00
compat.c
configs.c
context_tracking.c context_tracking: avoid irq_save/irq_restore on guest entry and exit 2015-11-10 12:06:23 +01:00
cpu.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-11-03 18:03:50 -08:00
cpu_pm.c
cpuset.c cgroup, cpuset: replace cpuset_post_attach_flush() with cgroup_subsys->post_attach callback 2016-05-04 14:48:49 -07:00
crash_dump.c
cred.c
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c wait/ptrace: assume __WALL if the child is traced 2016-06-07 18:14:35 -07:00
extable.c
fork.c sched/core: Reset task's lockless wake-queues on fork() 2016-01-06 11:01:07 +01:00
freezer.c
futex.c futex: Acknowledge a new waiter in counter before plist 2016-05-04 14:48:43 -07:00
futex_compat.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
groups.c
hung_task.c
irq_work.c treewide: Remove old email address 2015-11-23 09:44:58 +01:00
jump_label.c locking/static_key: Fix concurrent static_key_slow_inc() 2016-07-27 09:47:29 -07:00
kallsyms.c
kcmp.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_core.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_file.c kexec: use file name as the output message prefix 2015-11-06 17:50:42 -08:00
kexec_internal.h
kmod.c kmod: don't run async usermode helper as a child of kworker thread 2015-10-23 17:55:10 +09:00
kprobes.c
ksysfs.c
kthread.c
latencytop.c
Makefile
membarrier.c
memremap.c devm_memremap: Fix error value when memremap failed 2016-03-03 15:07:08 -08:00
module-internal.h
module.c modules: fix longstanding /proc/kallsyms vs module insertion race. 2016-03-09 15:34:56 -08:00
module_signing.c KEYS: Merge the type-specific data with the payload data 2015-10-21 15:18:36 +01:00
notifier.c
nsproxy.c
padata.c
panic.c printk: do cond_resched() between lines while outputting to consoles 2016-02-17 12:30:57 -08:00
params.c Nothing exciting, minor tweaks and cleanups. 2015-11-09 15:53:39 -08:00
pid.c pidns: fix NULL dereference in __task_pid_nr_ns() 2015-11-24 12:03:55 -08:00
pid_namespace.c
profile.c
ptrace.c ptrace: use fsuid, fsgid, effective creds for fs access checks 2016-02-25 12:01:16 -08:00
range.c
reboot.c
relay.c
resource.c kernel/resource.c: fix muxed resource handling in __request_region() 2016-03-03 15:07:29 -08:00
seccomp.c seccomp: always propagate NO_NEW_PRIVS on tsync 2016-03-03 15:07:25 -08:00
signal.c kernel/signal.c: unexport sigsuspend() 2015-11-20 16:17:32 -08:00
smp.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
smpboot.c stop_machine: Kill smp_hotplug_thread->pre_unpark, introduce stop_machine_unpark() 2015-10-20 10:23:55 +02:00
smpboot.h
softirq.c
stacktrace.c
stop_machine.c kernel: remove stop_machine() Kconfig dependency 2015-12-12 10:15:34 -08:00
sys.c prctl: take mmap sem for writing to protect against others 2016-02-25 12:01:25 -08:00
sys_ni.c mm: mlock: add new mlock system call 2015-11-05 19:34:48 -08:00
sysctl.c pipe: limit the per-user amount of pages allocated in pipes 2016-06-07 18:14:35 -07:00
sysctl_binary.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
task_work.c
taskstats.c
test_kprobes.c
torture.c torture: Consolidate cond_resched_rcu_qs() into stutter_wait() 2015-10-06 11:25:01 -07:00
tracepoint.c tracepoint: Give priority to probes of tracepoints 2015-10-25 21:33:54 -04:00
tsacct.c
uid16.c
up.c
user-return-notifier.c
user.c
user_namespace.c
utsname.c
utsname_sysctl.c
watchdog.c watchdog: don't run proc_watchdog_update if new value is same as old 2016-04-12 09:08:54 -07:00
workqueue.c workqueue: fix rebind bound workers warning 2016-05-18 17:06:50 -07:00
workqueue_internal.h