android_kernel_oneplus_msm8998/kernel/time
Thomas Gleixner e01b04be3e timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
commit 9c1645727b8fa90d07256fdfcc45bf831242a3ab upstream.

The clocksource delta to nanoseconds conversion is using signed math, but
the delta is unsigned. This makes the conversion space smaller than
necessary and in case of a multiplication overflow the conversion can
become negative. The conversion is done with scaled math:

    s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;

Shifting a signed integer right obvioulsy preserves the sign, which has
interesting consequences:

 - Time jumps backwards

 - __iter_div_u64_rem() which is used in one of the calling code pathes
   will take forever to piecewise calculate the seconds/nanoseconds part.

This has been reported by several people with different scenarios:

David observed that when stopping a VM with a debugger:

 "It was essentially the stopped by debugger case.  I forget exactly why,
  but the guest was being explicitly stopped from outside, it wasn't just
  scheduling lag.  I think it was something in the vicinity of 10 minutes
  stopped."

 When lifting the stop the machine went dead.

The stopped by debugger case is not really interesting, but nevertheless it
would be a good thing not to die completely.

But this was also observed on a live system by Liav:

 "When the OS is too overloaded, delta will get a high enough value for the
  msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
  after the shift the nsec variable will gain a value similar to
  0xffffffffff000000."

Unfortunately this has been reintroduced recently with commit 6bd58f09e1d8
("time: Add cycles to nanoseconds translation"). It had been fixed a year
ago already in commit 35a4933a8959 ("time: Avoid signed overflow in
timekeeping_get_ns()").

Though it's not surprising that the issue has been reintroduced because the
function itself and the whole call chain uses s64 for the result and the
propagation of it. The change in this recent commit is subtle:

   s64 nsec;

-  nsec = (d * m + n) >> s:
+  nsec = d * m + n;
+  nsec >>= s;

d being type of cycle_t adds another level of obfuscation.

This wouldn't have happened if the previous change to unsigned computation
would have made the 'nsec' variable u64 right away and a follow up patch
had cleaned up the whole call chain.

There have been patches submitted which basically did a revert of the above
patch leaving everything else unchanged as signed. Back to square one. This
spawned a admittedly pointless discussion about potential users which rely
on the unsigned behaviour until someone pointed out that it had been fixed
before. The changelogs of said patches added further confusion as they made
finally false claims about the consequences for eventual users which expect
signed results.

Despite delta being cycle_t, aka. u64, it's very well possible to hand in
a signed negative value and the signed computation will happily return the
correct result. But nobody actually sat down and analyzed the code which
was added as user after the propably unintended signed conversion.

Though in sensitive code like this it's better to analyze it proper and
make sure that nothing relies on this than hunting the subtle wreckage half
a year later. After analyzing all call chains it stands that no caller can
hand in a negative value (which actually would work due to the s64 cast)
and rely on the signed math to do the right thing.

Change the conversion function to unsigned math. The conversion of all call
chains is done in a follow up patch.

This solves the starvation issue, which was caused by the negative result,
but it does not solve the underlying problem. It merily procrastinates
it. When the timekeeper update is deferred long enough that the unsigned
multiplication overflows, then time going backwards is observable again.

It does neither solve the issue of clocksources with a small counter width
which will wrap around possibly several times and cause random time stamps
to be generated. But those are usually not found on systems used for
virtualization, so this is likely a non issue.

I took the liberty to claim authorship for this simply because
analyzing all callsites and writing the changelog took substantially
more time than just making the simple s/s64/u64/ change and ignore the
rest.

Fixes: 6bd58f09e1d8 ("time: Add cycles to nanoseconds translation")
Reported-by: David Gibson <david@gibson.dropbear.id.au>
Reported-by: Liav Rehana <liavr@mellanox.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Parit Bhargava <prarit@redhat.com>
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-01-09 08:07:43 +01:00
..
alarmtimer.c alarmtimer: Get rid of unused return value 2015-04-22 17:06:52 +02:00
clockevents.c clockevents: Remove unused set_mode() callback 2015-09-14 11:00:55 +02:00
clocksource.c clocksource: Allow unregistering the watchdog 2016-09-15 08:27:47 +02:00
hrtimer.c hrtimer: Catch illegal clockids 2016-09-15 08:27:44 +02:00
itimer.c itimers: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-02-25 12:01:25 -08:00
jiffies.c tick: Move clocksource related stuff to timekeeping.h 2015-04-01 14:22:58 +02:00
Kconfig rcu: Drop RCU_USER_QS in favor of NO_HZ_FULL 2015-07-06 13:52:18 -07:00
Makefile time: Remove development rules from Kbuild/Makefile 2015-07-01 09:57:35 +02:00
ntp.c ntp: Fix ADJ_SETOFFSET being used w/ ADJ_NANO 2016-09-15 08:27:47 +02:00
ntp_internal.h ntp/pps: use timespec64 for hardpps() 2015-10-01 09:57:59 -07:00
posix-clock.c posix-clock: Fix return code on the poll method's error path 2016-03-03 15:07:15 -08:00
posix-cpu-timers.c posix_cpu_timer: Exit early when process has been reaped 2016-08-10 11:49:29 +02:00
posix-timers.c posix-timers: Handle relative timers with CONFIG_TIME_LOW_RES proper 2016-02-25 12:01:25 -08:00
sched_clock.c timers, sched/clock: Clean up the code a bit 2015-03-27 08:34:01 +01:00
test_udelay.c time: Rename udelay_test.c to test_udelay.c 2014-11-21 11:59:55 -08:00
tick-broadcast-hrtimer.c kernel: broadcast-hrtimer: Migrate to new 'set-state' interface 2015-08-10 11:41:08 +02:00
tick-broadcast.c tick: Move the export of tick_broadcast_oneshot_control to the proper place 2015-07-14 12:01:04 +02:00
tick-common.c clockevents: Remove unused set_mode() callback 2015-09-14 11:00:55 +02:00
tick-internal.h timer: Minimize nohz off overhead 2015-06-19 15:18:28 +02:00
tick-oneshot.c clockevents: Provide functions to set and get the state 2015-06-02 14:40:47 +02:00
tick-sched.c tick/nohz: Set the correct expiry when switching to nohz/lowres mode 2016-03-03 15:07:26 -08:00
tick-sched.h tick/broadcast: Make idle check independent from mode and config 2015-07-07 18:46:47 +02:00
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-09-01 14:04:50 -07:00
timeconst.bc timeconst: Update path in comment 2015-10-26 10:06:06 +09:00
timeconv.c
timecounter.c timecounter: keep track of accumulated fractional nanoseconds 2014-12-30 18:29:27 -05:00
timekeeping.c timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion 2017-01-09 08:07:43 +01:00
timekeeping.h hrtimer: Make offset update smarter 2015-04-22 17:06:49 +02:00
timekeeping_debug.c timekeeping: Cap array access in timekeeping_debug 2016-09-15 08:27:52 +02:00
timekeeping_internal.h clocksource: Move cycle_last validation to core code 2014-07-23 15:01:51 -07:00
timer.c timers: Use proper base migration in add_timer_on() 2015-11-04 20:23:19 +01:00
timer_list.c hrtimer: Handle remaining time proper for TIME_LOW_RES 2016-02-17 12:30:57 -08:00
timer_stats.c timer: Stats: Simplify the flags handling 2015-06-19 15:18:27 +02:00