Commit graph

18973 commits

Author SHA1 Message Date
Borislav Petkov
b82ad3d394 x86, pageattr: Correct WBINVD spelling in comment
It is WBINVD, for INValiDate and not "wbindv". Use caps for instruction
names, while at it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394633584-5509-4-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:32:45 -07:00
Borislav Petkov
5314feebab x86, crash: Unify ifdef
Merge two back-to-back CONFIG_X86_32 ifdefs into one.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394633584-5509-3-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:32:44 -07:00
Borislav Petkov
3e920b532a x86, boot: Correct max ramdisk size name
The name in struct bootparam is ->initrd_addr_max and not ramdisk_max.
Fix that.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1394633584-5509-2-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-13 15:32:42 -07:00
Frederic Weisbecker
073d8224d2 arch: Remove stub cputime.h headers
Many architectures have a stub cputime.h that only include the default
cputime.h

Lets remove the useless headers, we only need to mention that we want
the default headers on the Kbuild files.

Cc: Archs <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-03-13 16:09:30 +01:00
Gabriel L. Somlo
100943c54e kvm: x86: ignore ioapic polarity
Both QEMU and KVM have already accumulated a significant number of
optimizations based on the hard-coded assumption that ioapic polarity
will always use the ActiveHigh convention, where the logical and
physical states of level-triggered irq lines always match (i.e.,
active(asserted) == high == 1, inactive == low == 0). QEMU guests
are expected to follow directions given via ACPI and configure the
ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
QEMU will still use the ActiveHigh signaling convention when
interfacing with KVM.

This patch modifies KVM to completely ignore ioapic polarity as set by
the guest OS, enabling misbehaving guests to work alongside those which
comply with the ActiveHigh polarity specified by QEMU's ACPI tables.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Gabriel L. Somlo <somlo@cmu.edu>
[Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-13 11:58:21 +01:00
Linus Torvalds
18f2af2d68 The ARM patch fixes a build breakage with randconfig. The x86 one
fixes Windows guests on AMD processors.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTIJkTAAoJEBvWZb6bTYbyqMcP/1lQTqmo5rux8CBzW6QbVmDg
 VTof8XGqGbZdEZUymeSQZgemRMyXfxDXP3Mx+ayWvPCfpKRp4VUyOdQtxiF50fUY
 PgwamCcmlgCbn+S5nm7xjo6EyCZ5CH/5WT7/9evz6jRaVsNODbreqiRS6ERmIdPo
 qNOptOtOxPbojAhN3kpuKobvq98Go+YMZMswrC/shMKOFPSLZWYazNAyedbRXiGA
 UvcRKRtQI6LzSIzN/3mcDJW4gtX2StAlOgZFzOL2Ft+V8DdR+VTIT/TFRQQEdYWd
 o6E9Xw1Wu11H7uXuXzvkGvwunECEqLChI6UomLw8YdnnMwy4VivuOIiA48yNU5Nc
 QBhKS/HsZUMurv8EGRfuvQyL9NTwlrPnTBfwApn0MoEKWIjmMivb8HnIeUKsm/pq
 LE+/Kyahru5kWzwMG5xbUU9ET+QRGIYz/fIf9I1qpPeY1sXDrcEBzrvFmOAgJ9Yh
 oZpWEv9J6e7aJfVSRXwvYDmzNMpLTH2M/q/X/HlEah5CNOML9IUGS13eExqaI17U
 h+D32tzq/9XuIKMTawwCRrUHonlE9sMwCyFA371n0/tR48+carUtZ7dVG36b5FHK
 b0FTX8YkJcGITe4EuVBXuTbm3mA9wYhVKXo2yDElxfedeEEYR21/hiQoLBEQvjk8
 Df3qbm+bahCzh7k9ydKy
 =gJNC
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "The ARM patch fixes a build breakage with randconfig.  The x86 one
  fixes Windows guests on AMD processors"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: fix cr8 intercept window
  ARM: KVM: fix non-VGIC compilation
2014-03-12 17:27:23 -07:00
Radim Krčmář
596f3142d2 KVM: SVM: fix cr8 intercept window
We always disable cr8 intercept in its handler, but only re-enable it
if handling KVM_REQ_EVENT, so there can be a window where we do not
intercept cr8 writes, which allows an interrupt to disrupt a higher
priority task.

Fix this by disabling intercepts in the same function that re-enables
them when needed. This fixes BSOD in Windows 2008.

Cc: <stable@vger.kernel.org>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-12 18:21:10 +01:00
Thomas Gleixner
ffb12cf002 Merge branch 'irq/for-gpio' into irq/core
Merge the request/release callbacks which are in a separate branch for
consumption by the gpio folks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-12 16:01:07 +01:00
Stephane Eranian
4191c29f05 perf/x86/uncore: Fix compilation warning in snb_uncore_imc_init_box()
This patch fixes a compilation problem (unused variable) with the
new SNB/IVB/HSW uncore IMC code.

[ In -v2 we simplify the fix as suggested by Peter Zjilstra. ]

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140311235329.GA28624@quad
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-12 10:49:13 +01:00
Alexei Starovoitov
fdfaf64e75 x86: bpf_jit: support negative offsets
Commit a998d43423 claimed to introduce negative offset support to x86 jit,
but it couldn't be working, since at the time of the execution
of LD+ABS or LD+IND instructions via call into
bpf_internal_load_pointer_neg_helper() the %edx (3rd argument of this func)
had junk value instead of access size in bytes (1 or 2 or 4).

Store size into %edx instead of %ecx (what original commit intended to do)

Fixes: a998d43423 ("bpf jit: Let the x86 jit handle negative offsets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Jan Seiffert <kaffeemonster@googlemail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-03-11 23:25:22 -04:00
Suresh Siddha
731bd6a93a x86, fpu: Check tsk_used_math() in kernel_fpu_end() for eager FPU
For non-eager fpu mode, thread's fpu state is allocated during the first
fpu usage (in the context of device not available exception). This
(math_state_restore()) can be a blocking call and hence we enable
interrupts (which were originally disabled when the exception happened),
allocate memory and disable interrupts etc.

But the eager-fpu mode, call's the same math_state_restore() from
kernel_fpu_end(). The assumption being that tsk_used_math() is always
set for the eager-fpu mode and thus avoid the code path of enabling
interrupts, allocating fpu state using blocking call and disable
interrupts etc.

But the below issue was noticed by Maarten Baert, Nate Eldredge and
few others:

If a user process dumps core on an ecrypt fs while aesni-intel is loaded,
we get a BUG() in __find_get_block() complaining that it was called with
interrupts disabled; then all further accesses to our ecrypt fs hang
and we have to reboot.

The aesni-intel code (encrypting the core file that we are writing) needs
the FPU and quite properly wraps its code in kernel_fpu_{begin,end}(),
the latter of which calls math_state_restore(). So after kernel_fpu_end(),
interrupts may be disabled, which nobody seems to expect, and they stay
that way until we eventually get to __find_get_block() which barfs.

For eager fpu, most the time, tsk_used_math() is true. At few instances
during thread exit, signal return handling etc, tsk_used_math() might
be false.

In kernel_fpu_end(), for eager-fpu, call math_state_restore()
only if tsk_used_math() is set. Otherwise, don't bother. Kernel code
path which cleared tsk_used_math() knows what needs to be done
with the fpu state.

Reported-by: Maarten Baert <maarten-baert@hotmail.com>
Reported-by: Nate Eldredge <nate@thatsmathematics.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <sbsiddha@gmail.com>
Link: http://lkml.kernel.org/r/1391410583.3801.6.camel@europa
Cc: George Spelvin <linux@horizon.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-11 12:32:52 -07:00
Dave Jones
09df7c4c80 x86: Remove CONFIG_X86_OOSTORE
This was an optimization that made memcpy type benchmarks a little
faster on ancient (Circa 1998) IDT Winchip CPUs.  In real-life
workloads, it wasn't even noticable, and I doubt anyone is running
benchmarks on 16 year old silicon any more.

Given this code has likely seen very little use over the last decade,
let's just remove it.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-11 10:16:18 -07:00
Bjorn Helgaas
36fc5500bb sched: Remove unused mc_capable() and smt_capable()
Remove mc_capable() and smt_capable().  Neither is used.

Both were added by 5c45bf279d ("sched: mc/smt power savings sched
policy").  Uses of both were removed by 8e7fbcbc22 ("sched: Remove stale
power aware scheduling remnants and dysfunctional knobs").

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Link: http://lkml.kernel.org/r/20140304210737.16893.54289.stgit@bhelgaas-glaptop.roam.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:05:45 +01:00
Jan Kiszka
ea7bdc65bc x86/apic: Plug racy xAPIC access of CPU hotplug code
apic_icr_write() and its users in smpboot.c were apparently
written under the assumption that this code would only run
during early boot. But nowadays we also execute it when onlining
a CPU later on while the system is fully running. That will make
wakeup_cpu_via_init_nmi and, thus, also native_apic_icr_write
run in plain process context. If we migrate the caller to a
different CPU at the wrong time or interrupt it and write to
ICR/ICR2 to send unrelated IPIs, we can end up sending INIT,
SIPI or NMIs to wrong CPUs.

Fix this by disabling interrupts during the write to the ICR
halves and disable preemption around waiting for ICR
availability and using it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Tested-By: Igor Mammedov <imammedo@redhat.com>
Link: http://lkml.kernel.org/r/52E6AFFE.3030004@siemens.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:03:31 +01:00
Steven Rostedt
7743a536be i386: Remove unneeded test of 'task' in dump_trace() (again)
Commit 028a690a1e "i386: Remove unneeded test of 'task' in
dump_trace()" correctly removed the unneeded 'task != NULL'
check because it would be set to current if it was NULL.

Commit 2bc5f927d4 "i386: split out dumpstack code from
traps_32.c" moved the code from traps_32.c to its own file
dump_stack.c for preparation of the i386 / x86_64 merge.

Commit 8a541665b9 "dumpstack: x86: various small unification
steps" worked to make i386 and x86_64 dump_stack logic similar.
But this actually reverted the correct change from
028a690a1e.

Commit d0caf29250 "x86/dumpstack: Remove unneeded check in
dump_trace()" removed the unneeded "task != NULL" check for
x86_64 but left that same unneeded check for i386, that was
added because x86_64 had it!

This chain of events ironically had i386 add back the unneeded
task != NULL check because x86_64 did it, and then the fix for
x86_64 was fixed by Dan. And even more ironically, it was Dan's
smatch bot that told me that a change to dump_stack_32 I made
may be wrong if current can be NULL (it can't), as there was a
check for it by assigning task to current, and then checking if
task is NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
Cc: Jesper Juhl <jesper.juhl@gmail.com>
Link: http://lkml.kernel.org/r/20140307105242.79a0befd@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 12:02:31 +01:00
Dave Jones
b7b4839d93 perf/x86: Fix leak in uncore_type_init failure paths
The error path of uncore_type_init() frees up any allocations
that were made along the way, but it relies upon type->pmus
being set, which only happens if the function succeeds. As
type->pmus remains null in this case, the call to
uncore_type_exit will do nothing.

Moving the assignment earlier will allow us to actually free
those allocations should something go awry.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306172028.GA552@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:59:34 +01:00
Dongsheng Yang
ef11dadb83 perf/x86/uncore: Add __init for uncore_cpumask_init()
Commit:

  411cf180fa perf/x86/uncore: fix initialization of cpumask

introduced the function uncore_cpumask_init(), which is only
called in __init intel_uncore_init(). But it is not marked
with __init, which produces the following warning:

	WARNING: vmlinux.o(.text+0x2464a): Section mismatch in reference from the function uncore_cpumask_init() to the function .init.text:uncore_cpu_setup()
	The function uncore_cpumask_init() references
	the function __init uncore_cpu_setup().
	This is often because uncore_cpumask_init lacks a __init
	annotation or the annotation of uncore_cpu_setup is wrong.

This patch marks uncore_cpumask_init() with __init.

Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Acked-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1394013516-4964-1-git-send-email-yangds.fnst@cn.fujitsu.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:57:56 +01:00
Ingo Molnar
0066f3b93e Merge branch 'perf/urgent' into perf/core
Merge the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:53:50 +01:00
Ingo Molnar
a02ed5e3e0 Merge branch 'sched/urgent' into sched/core
Pick up fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:34:27 +01:00
Paolo Bonzini
facb013969 KVM: svm: Allow the guest to run with dirty debug registers
When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit.  We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:04 +01:00
Paolo Bonzini
5315c716b6 KVM: svm: set/clear all DR intercepts in one swoop
Unlike other intercepts, debug register intercepts will be modified
in hot paths if the guest OS is bad or otherwise gets tricked into
doing so.

Avoid calling recalc_intercepts 16 times for debug registers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:04 +01:00
Paolo Bonzini
d16c293e4e KVM: nVMX: Allow nested guests to run with dirty debug registers
When preparing the VMCS02, the CPU-based execution controls is computed
by vmx_exec_control.  Turn off DR access exits there, too, if the
KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:03 +01:00
Paolo Bonzini
81908bf443 KVM: vmx: Allow the guest to run with dirty debug registers
When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit.  We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:03 +01:00
Paolo Bonzini
c77fb5fe6f KVM: x86: Allow the guest to run with dirty debug registers
When not running in guest-debug mode, the guest controls the debug
registers and having to take an exit for each DR access is a waste
of time.  If the guest gets into a state where each context switch
causes DR to be saved and restored, this can take away as much as 40%
of the execution time from the guest.

After this patch, VMX- and SVM-specific code can set a flag in
switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
registers might be dirty and need to be reloaded (syncing will be taken
care of by a new callback in kvm_x86_ops).  This flag can be set on the
first access to a debug registers, so that multiple accesses to the
debug registers only cause one vmexit.

Note that since the guest will be able to read debug registers and
enable breakpoints in DR7, we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Paolo Bonzini
360b948d88 KVM: x86: change vcpu->arch.switch_db_regs to a bit mask
The next patch will add another bit that we can test with the
same "if".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Paolo Bonzini
c845f9c646 KVM: vmx: we do rely on loading DR7 on entry
Currently, this works even if the bit is not in "min", because the bit is always
set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:01 +01:00
Jan Kiszka
c9a7953f09 KVM: x86: Remove return code from enable_irq/nmi_window
It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:47 +01:00
Jan Kiszka
220c567297 KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt
According to SDM 27.2.3, IDT vectoring information will not be valid on
vmexits caused by external NMIs. So we have to avoid creating such
scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
have a pending interrupt because that one would be migrated to L1's IDT
vectoring info on nested exit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:46 +01:00
Jan Kiszka
f4124500c2 KVM: nVMX: Fully emulate preemption timer
We cannot rely on the hardware-provided preemption timer support because
we are holding L2 in HLT outside non-root mode. Furthermore, emulating
the preemption will resolve tick rate errata on older Intel CPUs.

The emulation is based on hrtimer which is started on L2 entry, stopped
on L2 exit and evaluated via the new check_nested_events hook. As we no
longer rely on hardware features, we can enable both the preemption
timer support and value saving unconditionally.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:45 +01:00
Jan Kiszka
b6b8a1451f KVM: nVMX: Rework interception of IRQs and NMIs
Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:45 +01:00
Mathias Krause
6cce16f99d x86, threadinfo: Redo "x86: Use inline assembler to get sp"
This patch restores the changes of commit dff38e3e93 "x86: Use inline
assembler instead of global register variable to get sp". They got lost
in commit 198d208df4 "x86: Keep thread_info on thread stack in x86_32"
while moving the code to arch/x86/kernel/irq_32.c.

Quoting Andi from commit dff38e3e93:

"""
LTO in gcc 4.6/47. has trouble with global register variables. They were
used to read the stack pointer. Use a simple inline assembler statement
with a mov instead.

This also helps LLVM/clang, which does not support global register
variables.
"""

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/1394178752-18047-1-git-send-email-minipli@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-10 17:32:01 -07:00
Linus Torvalds
b01d4e6893 x86: fix compile error due to X86_TRAP_NMI use in asm files
It's an enum, not a #define, you can't use it in asm files.

Introduced in commit 5fa10196bd ("x86: Ignore NMIs that come in during
early boot"), and sadly I didn't compile-test things like I should have
before pushing out.

My weak excuse is that the x86 tree generally doesn't introduce stupid
things like this (and the ARM pull afterwards doesn't cause me to do a
compile-test either, since I don't cross-compile).

Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-07 18:58:40 -08:00
H. Peter Anvin
5fa10196bd x86: Ignore NMIs that come in during early boot
Don Zickus reports:

A customer generated an external NMI using their iLO to test kdump
worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
made things work.

I speculated the external NMI fired, caused the machine to panic (as
expected) and the perf NMI from the watchdog came in and was latched.
My guess was this somehow caused the hang.

   ----

It appears that the latched NMI stays latched until the early page
table generation on 64 bits, which causes exceptions to happen which
end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
during early execution, until we have proper exception handling.

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
2014-03-07 15:08:14 -08:00
Steven Rostedt
2223f6f6ee x86: Clean up dumpstack_64.c code
The dump_trace() function in dumpstack_64.c is hard to follow.
The test for exception stack is processed differently than the
test for irq stack, and the normal stack is outside completely.

By restructuring this code to have all the stacks determined by
a single function that returns an enum of the following:

 STACK_IS_NORMAL
 STACK_IS_EXCEPTION
 STACK_IS_IRQ
 STACK_IS_UNKNOWN

and has the logic of each within a switch statement.
This should make the code much easier to read and understand.

Link: http://lkml.kernel.org/r/20110806012354.684598995@goodmis.org

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144322.086050042@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
Steven Rostedt
198d208df4 x86: Keep thread_info on thread stack in x86_32
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.

x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.

Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
Steven Rostedt
0788aa6a23 x86: Prepare removal of previous_esp from i386 thread_info structure
The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).

The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).

By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.

The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt (Red Hat)
b807902a88 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined
and never been used. Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt
2432e1364b x86: Nuke the supervisor_stack field in i386 thread_info
Nothing references the supervisor_stack in the thread_info field,
and it does not exist in x86_64. To make the two more the same,
it is being removed.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Peter Zijlstra
d4078e2322 x86, trace: Further robustify CR2 handling vs tracing
Building on commit 0ac09f9f8c ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:

 - Now that read_cr2() is lifted into trace_do_page_fault(), we should
   pass the address to trace_page_fault_entries() to avoid it
   re-reading a potentially changed cr2.

 - Put both trace_do_page_fault() and trace_page_fault_entries() under
   CONFIG_TRACING.

 - Mark both fault entry functions {,trace_}do_page_fault() as notrace
   to avoid getting __mcount or other function entry trace callbacks
   before we've observed CR2.

 - Mark __do_page_fault() as noinline to guarantee the function tracer
   does get to see the fault.

Cc: <jolsa@redhat.com>
Cc: <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 10:58:18 -08:00
Heiko Carstens
378a10f3ae fs/compat: optional preadv64/pwrite64 compat system calls
The preadv64/pwrite64 have been implemented for the x32 ABI, in order
to allow passing 64 bit arguments from user space without splitting
them into two 32 bit parameters, like it would be necessary for usual
compat tasks.
Howevert these two system calls are only being used for the x32 ABI,
so add __ARCH_WANT_COMPAT defines for these two compat syscalls and
make these two only visible for x86.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06 15:35:09 +01:00
Thomas Gleixner
7ff42473eb x86: hardirq: Make irq_hv_callback_count available for CONFIG_HYPERV=m as well
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-06 12:08:37 +01:00
H. Peter Anvin
fb3bd7b19b x86, reboot: Only use CF9_COND automatically, not CF9
Only CF9_COND is appropriate for inclusion in the default chain, not
CF9; the latter will poke that register unconditionally, whereas
CF9_COND will at least look for PCI configuration method #1 or #2
first (a weak check, but better than nothing.)

CF9 should be used for explicit system configuration (command line or
DMI) only.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-05 15:41:15 -08:00
Li, Aubrey
a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list
Reboot is the last service linux OS provides to the end user. We are
supposed to make this function more robust than today. This patch adds
all of the known reboot methods into the default attempt list. The
machines requiring reboot=efi or reboot=p or reboot=bios get a chance
to reboot automatically now.

If there is a new reboot method emerged, we are supposed to add it to
the default list as well, instead of adding the endless dmidecode entry.

If one method required is in the default list in this patch but the
machine reboot still hangs, that means some methods ahead of the
required method cause the system hangs, then reboot the machine by
passing reboot= arguments and submit the reboot dmidecode table quirk.

We are supposed to remove the reboot dmidecode table from the kernel,
but to be safe, we keep it. This patch prevents us from adding more.
If you happened to have a machine listed in the reboot dmidecode
table and this patch makes reboot work on your machine, please submit
a patch to remove the quirk.

The default reboot order with this patch is now:

    ACPI > KBD > ACPI > KBD > EFI > CF9_COND > BIOS

Because BIOS and TRIPLE are mutually exclusive (either will either
work or hang the machine) that method is not included.

[ hpa: as with any changes to the reboot order, this patch will have
  to be monitored carefully for regressions. ]

Signed-off-by: Aubrey Li <aubrey.li@intel.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-05 15:27:07 -08:00
Matt Fleming
617b3c37da Merge branch 'mixed-mode' into efi-for-mingo 2014-03-05 18:18:50 +00:00
Matt Fleming
994448f1af Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo
Conflicts:
	arch/x86/kernel/setup.c
	arch/x86/platform/efi/efi.c
	arch/x86/platform/efi/efi_64.c
2014-03-05 18:15:37 +00:00
Matt Fleming
4fd69331ad Merge remote-tracking branch 'tip/x86/urgent' into efi-for-mingo
Conflicts:
	arch/x86/include/asm/efi.h
2014-03-05 17:31:41 +00:00
Thomas Gleixner
76d388cd72 x86: hyperv: Fixup the (brain) damage caused by the irq cleanup
Compiling last minute changes without setting the proper config
options is not really clever.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-05 13:42:14 +01:00
Matt Fleming
3db4cafdfd x86/boot: Fix non-EFI build
The kbuild test robot reported the following errors, introduced with
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table"),

 arch/x86/boot/compressed/head_32.o: In function `efi32_config':
>> (.data+0x58): undefined reference to `efi_call_phys'

 arch/x86/boot/compressed/head_64.o: In function `efi64_config':
>> (.data+0x90): undefined reference to `efi_call6'

Wrap the efi*_config structures in #ifdef CONFIG_EFI_STUB so that we
don't make references to EFI functions if they're not compiled in.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-05 10:19:07 +00:00
Matt Fleming
b663a68583 x86, tools: Fix up compiler warnings
The kbuild test robot reported the following errors that were introduced
with commit 993c30a04e ("x86, tools: Consolidate #ifdef code"),

  arch/x86/boot/tools/build.c: In function 'update_pecoff_setup_and_reloc':
>> arch/x86/boot/tools/build.c:252:1: error: parameter name omitted
    static inline void update_pecoff_setup_and_reloc(unsigned int) {}
    ^
  arch/x86/boot/tools/build.c: In function 'update_pecoff_text':
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted
    static inline void update_pecoff_text(unsigned int, unsigned int) {}
    ^
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted

   arch/x86/boot/tools/build.c: In function 'main':
>> arch/x86/boot/tools/build.c:372:2: warning: implicit declaration of function 'efi_stub_entry_update' [-Wimplicit-function-declaration]
    efi_stub_entry_update();
    ^
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-05 10:12:39 +00:00
Jiri Olsa
0ac09f9f8c x86, trace: Fix CR2 corruption when tracing page faults
The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

v2: Moving the cr2 read before exception_enter, because
    it could trigger tracepoint as well.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu
Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
2014-03-04 16:00:14 -08:00