ARM: smp: Fix hrtimer_interrupt race with sys_reboot

There are high chances for hrtimer_interrupt() interrupt on
one of the core and other core executing sys_reboot simultaneously.
In such a situation we could see ipi_cpu_stop() marking a cpu
as 'offline' but hrtimer_wakeup() can schedule a task on
offline core, resulting BUG_ON in smp_send_reschedule().
In fact that the CPU is not really offline but ipi_cpu_stop()
is marking it as offline.
CPU0                      CPU1				CPU2
sys_reboot()
 kernel_restart()
  machine_restart()
   machine_shutdown()
    smp_send_stop()					...
    ...                   ipi_cpu_stop()		hrtimer_interrupt()
                           set_cpu_online(1, false)	 __run_hrtimer()
                            local_irq_disable()		  hrtimer_wakeup()
                             while(1)			   try_to_wake_up()
							    ttwu_do_wakeup()
							     check_preempt_curr()
							      smp_send_reschedule()
							       BUG_ON(cpu_is_offline(1));

This is easily reproducible in device continuous reboot testing.
Since the CPU is not really offline and hasn't gone through the
proper steps to be marked as such, let's mark the CPU as inactive.

Change-Id: Ia1daea407220578d4212ef6c65c4be837ca370fd
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
This commit is contained in:
Stephen Boyd 2016-03-02 11:52:00 +05:30 committed by David Keitel
parent 1b7778354b
commit 78c34e6ea3

View file

@ -575,7 +575,7 @@ static void ipi_cpu_stop(unsigned int cpu)
raw_spin_unlock(&stop_lock);
}
set_cpu_online(cpu, false);
set_cpu_active(cpu, false);
local_fiq_disable();
local_irq_disable();
@ -701,10 +701,10 @@ void smp_send_stop(void)
/* Wait up to one second for other CPUs to stop */
timeout = USEC_PER_SEC;
while (num_online_cpus() > 1 && timeout--)
while (num_active_cpus() > 1 && timeout--)
udelay(1);
if (num_online_cpus() > 1)
if (num_active_cpus() > 1)
pr_warn("SMP: failed to stop secondary CPUs\n");
}