Commit graph

19067 commits

Author SHA1 Message Date
Ingo Molnar
0066f3b93e Merge branch 'perf/urgent' into perf/core
Merge the latest fixes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:53:50 +01:00
Ingo Molnar
a02ed5e3e0 Merge branch 'sched/urgent' into sched/core
Pick up fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-03-11 11:34:27 +01:00
Paolo Bonzini
facb013969 KVM: svm: Allow the guest to run with dirty debug registers
When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit.  We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:04 +01:00
Paolo Bonzini
5315c716b6 KVM: svm: set/clear all DR intercepts in one swoop
Unlike other intercepts, debug register intercepts will be modified
in hot paths if the guest OS is bad or otherwise gets tricked into
doing so.

Avoid calling recalc_intercepts 16 times for debug registers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:04 +01:00
Paolo Bonzini
d16c293e4e KVM: nVMX: Allow nested guests to run with dirty debug registers
When preparing the VMCS02, the CPU-based execution controls is computed
by vmx_exec_control.  Turn off DR access exits there, too, if the
KVM_DEBUGREG_WONT_EXIT bit is set in switch_db_regs.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:03 +01:00
Paolo Bonzini
81908bf443 KVM: vmx: Allow the guest to run with dirty debug registers
When not running in guest-debug mode (i.e. the guest controls the debug
registers, having to take an exit for each DR access is a waste of time.
If the guest gets into a state where each context switch causes DR to be
saved and restored, this can take away as much as 40% of the execution
time from the guest.

If the guest is running with vcpu->arch.db == vcpu->arch.eff_db, we
can let it write freely to the debug registers and reload them on the
next exit.  We still need to exit on the first access, so that the
KVM_DEBUGREG_WONT_EXIT flag is set in switch_db_regs; after that, further
accesses to the debug registers will not cause a vmexit.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:03 +01:00
Paolo Bonzini
c77fb5fe6f KVM: x86: Allow the guest to run with dirty debug registers
When not running in guest-debug mode, the guest controls the debug
registers and having to take an exit for each DR access is a waste
of time.  If the guest gets into a state where each context switch
causes DR to be saved and restored, this can take away as much as 40%
of the execution time from the guest.

After this patch, VMX- and SVM-specific code can set a flag in
switch_db_regs, telling vcpu_enter_guest that on the next exit the debug
registers might be dirty and need to be reloaded (syncing will be taken
care of by a new callback in kvm_x86_ops).  This flag can be set on the
first access to a debug registers, so that multiple accesses to the
debug registers only cause one vmexit.

Note that since the guest will be able to read debug registers and
enable breakpoints in DR7, we need to ensure that they are synchronized
on entry to the guest---including DR6 that was not synced before.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Paolo Bonzini
360b948d88 KVM: x86: change vcpu->arch.switch_db_regs to a bit mask
The next patch will add another bit that we can test with the
same "if".

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:02 +01:00
Paolo Bonzini
c845f9c646 KVM: vmx: we do rely on loading DR7 on entry
Currently, this works even if the bit is not in "min", because the bit is always
set in MSR_IA32_VMX_ENTRY_CTLS.  Mention it for the sake of documentation, and
to avoid surprises if we later switch to MSR_IA32_VMX_TRUE_ENTRY_CTLS.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 10:46:01 +01:00
Jan Kiszka
c9a7953f09 KVM: x86: Remove return code from enable_irq/nmi_window
It's no longer possible to enter enable_irq_window in guest mode when
L1 intercepts external interrupts and we are entering L2. This is now
caught in vcpu_enter_guest. So we can remove the check from the VMX
version of enable_irq_window, thus the need to return an error code from
both enable_irq_window and enable_nmi_window.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:47 +01:00
Jan Kiszka
220c567297 KVM: nVMX: Do not inject NMI vmexits when L2 has a pending interrupt
According to SDM 27.2.3, IDT vectoring information will not be valid on
vmexits caused by external NMIs. So we have to avoid creating such
scenarios by delaying EXIT_REASON_EXCEPTION_NMI injection as long as we
have a pending interrupt because that one would be migrated to L1's IDT
vectoring info on nested exit.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:46 +01:00
Jan Kiszka
f4124500c2 KVM: nVMX: Fully emulate preemption timer
We cannot rely on the hardware-provided preemption timer support because
we are holding L2 in HLT outside non-root mode. Furthermore, emulating
the preemption will resolve tick rate errata on older Intel CPUs.

The emulation is based on hrtimer which is started on L2 entry, stopped
on L2 exit and evaluated via the new check_nested_events hook. As we no
longer rely on hardware features, we can enable both the preemption
timer support and value saving unconditionally.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:45 +01:00
Jan Kiszka
b6b8a1451f KVM: nVMX: Rework interception of IRQs and NMIs
Move the check for leaving L2 on pending and intercepted IRQs or NMIs
from the *_allowed handler into a dedicated callback. Invoke this
callback at the relevant points before KVM checks if IRQs/NMIs can be
injected. The callback has the task to switch from L2 to L1 if needed
and inject the proper vmexit events.

The rework fixes L2 wakeups from HLT and provides the foundation for
preemption timer emulation.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-11 08:41:45 +01:00
Mathias Krause
6cce16f99d x86, threadinfo: Redo "x86: Use inline assembler to get sp"
This patch restores the changes of commit dff38e3e93 "x86: Use inline
assembler instead of global register variable to get sp". They got lost
in commit 198d208df4 "x86: Keep thread_info on thread stack in x86_32"
while moving the code to arch/x86/kernel/irq_32.c.

Quoting Andi from commit dff38e3e93:

"""
LTO in gcc 4.6/47. has trouble with global register variables. They were
used to read the stack pointer. Use a simple inline assembler statement
with a mov instead.

This also helps LLVM/clang, which does not support global register
variables.
"""

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Link: http://lkml.kernel.org/r/1394178752-18047-1-git-send-email-minipli@googlemail.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-10 17:32:01 -07:00
Linus Torvalds
b01d4e6893 x86: fix compile error due to X86_TRAP_NMI use in asm files
It's an enum, not a #define, you can't use it in asm files.

Introduced in commit 5fa10196bd ("x86: Ignore NMIs that come in during
early boot"), and sadly I didn't compile-test things like I should have
before pushing out.

My weak excuse is that the x86 tree generally doesn't introduce stupid
things like this (and the ARM pull afterwards doesn't cause me to do a
compile-test either, since I don't cross-compile).

Cc: Don Zickus <dzickus@redhat.com>
Cc: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-03-07 18:58:40 -08:00
H. Peter Anvin
5fa10196bd x86: Ignore NMIs that come in during early boot
Don Zickus reports:

A customer generated an external NMI using their iLO to test kdump
worked.  Unfortunately, the machine hung.  Disabling the nmi_watchdog
made things work.

I speculated the external NMI fired, caused the machine to panic (as
expected) and the perf NMI from the watchdog came in and was latched.
My guess was this somehow caused the hang.

   ----

It appears that the latched NMI stays latched until the early page
table generation on 64 bits, which causes exceptions to happen which
end in IRET, which re-enable NMI.  Therefore, ignore NMIs that come in
during early execution, until we have proper exception handling.

Reported-and-tested-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1394221143-29713-1-git-send-email-dzickus@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.5+, older with some backport effort
2014-03-07 15:08:14 -08:00
Petr Mladek
7f11f5ecf4 ftrace/x86: BUG when ftrace recovery fails
Ftrace modifies function calls using Int3 breakpoints on x86.
The breakpoints are handled only when the patching is in progress.
If something goes wrong, there is a recovery code that removes
the breakpoints. If this fails, the system might get silently
rebooted when a remaining break is not handled or an invalid
instruction is proceed.

We should BUG() when the breakpoint could not be removed. Otherwise,
the system silently crashes when the function finishes the Int3
handler is disabled.

Note that we need to modify remove_breakpoint() to return non-zero
value only when there is an error. The return value was ignored before,
so it does not cause any troubles.

Link: http://lkml.kernel.org/r/1393258342-29978-4-git-send-email-pmladek@suse.cz

Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:16 -05:00
Jiri Slaby
3a36cb11ca ftrace: Do not pass data to ftrace_dyn_arch_init
As the data parameter is not really used by any ftrace_dyn_arch_init,
remove that from ftrace_dyn_arch_init. This also removes the addr
local variable from ftrace_init which is now unused.

Note the documentation was imprecise as it did not suggest to set
(*data) to 0.

Link: http://lkml.kernel.org/r/1393268401-24379-4-git-send-email-jslaby@suse.cz

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:14 -05:00
Jiri Slaby
af64a7cb09 ftrace: Pass retval through return in ftrace_dyn_arch_init()
No architecture uses the "data" parameter in ftrace_dyn_arch_init() in any
way, it just sets the value to 0. And this is used as a return value
in the caller -- ftrace_init, which just checks the retval against
zero.

Note there is also "return 0" in every ftrace_dyn_arch_init.  So it is
enough to check the retval and remove all the indirect sets of data on
all archs.

Link: http://lkml.kernel.org/r/1393268401-24379-3-git-send-email-jslaby@suse.cz

Cc: linux-arch@vger.kernel.org
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:06:13 -05:00
Steven Rostedt (Red Hat)
92550405c4 ftrace/x86: Have ftrace_write() return -EPERM and clean up callers
Having ftrace_write() return -EPERM on failure, as that's what the callers
return, then we can clean up the code a bit. That is, instead of:

  if (ftrace_write(...))
     return -EPERM;
  return 0;

or

  if (ftrace_write(...)) {
     ret = -EPERM;
     goto_out;
  }

We can instead have:

  return ftrace_write(...);

or

  ret = ftrace_write(...);
  if (ret)
    goto out;

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-07 10:05:50 -05:00
Steven Rostedt
2223f6f6ee x86: Clean up dumpstack_64.c code
The dump_trace() function in dumpstack_64.c is hard to follow.
The test for exception stack is processed differently than the
test for irq stack, and the normal stack is outside completely.

By restructuring this code to have all the stacks determined by
a single function that returns an enum of the following:

 STACK_IS_NORMAL
 STACK_IS_EXCEPTION
 STACK_IS_IRQ
 STACK_IS_UNKNOWN

and has the logic of each within a switch statement.
This should make the code much easier to read and understand.

Link: http://lkml.kernel.org/r/20110806012354.684598995@goodmis.org

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144322.086050042@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
Steven Rostedt
198d208df4 x86: Keep thread_info on thread stack in x86_32
x86_64 uses a per_cpu variable kernel_stack to always point to
the thread stack of current. This is where the thread_info is stored
and is accessed from this location even when the irq or exception stack
is in use. This removes the complexity of having to maintain the
thread info on the stack when interrupts are running and having to
copy the preempt_count and other fields to the interrupt stack.

x86_32 uses the old method of copying the thread_info from the thread
stack to the exception stack just before executing the exception.

Having the two different requires #ifdefs and also the x86_32 way
is a bit of a pain to maintain. By converting x86_32 to the same
method of x86_64, we can remove #ifdefs, clean up the x86_32 code
a little, and remove the overhead of the copy.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012354.263834829@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.852942014@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:55 -08:00
Steven Rostedt
0788aa6a23 x86: Prepare removal of previous_esp from i386 thread_info structure
The i386 thread_info contains a previous_esp field that is used
to daisy chain the different stacks for dump_stack()
(ie. irq, softirq, thread stacks).

The goal is to eventual make i386 handling of thread_info the same
as x86_64, which means that the thread_info will not be in the stack
but as a per_cpu variable. We will no longer depend on thread_info
being able to daisy chain different stacks as it will only exist
in one location (the thread stack).

By moving previous_esp to the end of thread_info and referencing
it as an offset instead of using a thread_info field, this becomes
a stepping stone to moving the thread_info.

The offset to get to the previous stack is rather ugly in this
patch, but this is only temporary and the prev_esp will be changed
in the next commit. This commit is more for sanity checks of the
change.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Robert Richter <rric@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.891757693@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.608754481@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt (Red Hat)
b807902a88 x86: Nuke GET_THREAD_INFO_WITH_ESP() macro for i386
According to a git log -p, GET_THREAD_INFO_WITH_ESP() has only been defined
and never been used. Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20140206144321.409045251@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Steven Rostedt
2432e1364b x86: Nuke the supervisor_stack field in i386 thread_info
Nothing references the supervisor_stack in the thread_info field,
and it does not exist in x86_64. To make the two more the same,
it is being removed.

Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20110806012353.546183789@goodmis.org
Link: http://lkml.kernel.org/r/20140206144321.203619611@goodmis.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 16:56:54 -08:00
Peter Zijlstra
d4078e2322 x86, trace: Further robustify CR2 handling vs tracing
Building on commit 0ac09f9f8c ("x86, trace: Fix CR2 corruption when
tracing page faults") this patch addresses another few issues:

 - Now that read_cr2() is lifted into trace_do_page_fault(), we should
   pass the address to trace_page_fault_entries() to avoid it
   re-reading a potentially changed cr2.

 - Put both trace_do_page_fault() and trace_page_fault_entries() under
   CONFIG_TRACING.

 - Mark both fault entry functions {,trace_}do_page_fault() as notrace
   to avoid getting __mcount or other function entry trace callbacks
   before we've observed CR2.

 - Mark __do_page_fault() as noinline to guarantee the function tracer
   does get to see the fault.

Cc: <jolsa@redhat.com>
Cc: <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140306145300.GO9987@twins.programming.kicks-ass.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-06 10:58:18 -08:00
Heiko Carstens
378a10f3ae fs/compat: optional preadv64/pwrite64 compat system calls
The preadv64/pwrite64 have been implemented for the x32 ABI, in order
to allow passing 64 bit arguments from user space without splitting
them into two 32 bit parameters, like it would be necessary for usual
compat tasks.
Howevert these two system calls are only being used for the x32 ABI,
so add __ARCH_WANT_COMPAT defines for these two compat syscalls and
make these two only visible for x86.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-06 15:35:09 +01:00
Thomas Gleixner
7ff42473eb x86: hardirq: Make irq_hv_callback_count available for CONFIG_HYPERV=m as well
Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-06 12:08:37 +01:00
H. Peter Anvin
fb3bd7b19b x86, reboot: Only use CF9_COND automatically, not CF9
Only CF9_COND is appropriate for inclusion in the default chain, not
CF9; the latter will poke that register unconditionally, whereas
CF9_COND will at least look for PCI configuration method #1 or #2
first (a weak check, but better than nothing.)

CF9 should be used for explicit system configuration (command line or
DMI) only.

Cc: Aubrey Li <aubrey.li@intel.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-05 15:41:15 -08:00
Li, Aubrey
a4f1987e4c x86, reboot: Add EFI and CF9 reboot methods into the default list
Reboot is the last service linux OS provides to the end user. We are
supposed to make this function more robust than today. This patch adds
all of the known reboot methods into the default attempt list. The
machines requiring reboot=efi or reboot=p or reboot=bios get a chance
to reboot automatically now.

If there is a new reboot method emerged, we are supposed to add it to
the default list as well, instead of adding the endless dmidecode entry.

If one method required is in the default list in this patch but the
machine reboot still hangs, that means some methods ahead of the
required method cause the system hangs, then reboot the machine by
passing reboot= arguments and submit the reboot dmidecode table quirk.

We are supposed to remove the reboot dmidecode table from the kernel,
but to be safe, we keep it. This patch prevents us from adding more.
If you happened to have a machine listed in the reboot dmidecode
table and this patch makes reboot work on your machine, please submit
a patch to remove the quirk.

The default reboot order with this patch is now:

    ACPI > KBD > ACPI > KBD > EFI > CF9_COND > BIOS

Because BIOS and TRIPLE are mutually exclusive (either will either
work or hang the machine) that method is not included.

[ hpa: as with any changes to the reboot order, this patch will have
  to be monitored carefully for regressions. ]

Signed-off-by: Aubrey Li <aubrey.li@intel.com>
Acked-by: Matthew Garrett <mjg59@srcf.ucam.org>
Link: http://lkml.kernel.org/r/53130A46.1010801@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-05 15:27:07 -08:00
Matt Fleming
617b3c37da Merge branch 'mixed-mode' into efi-for-mingo 2014-03-05 18:18:50 +00:00
Matt Fleming
994448f1af Merge remote-tracking branch 'tip/x86/efi-mixed' into efi-for-mingo
Conflicts:
	arch/x86/kernel/setup.c
	arch/x86/platform/efi/efi.c
	arch/x86/platform/efi/efi_64.c
2014-03-05 18:15:37 +00:00
Matt Fleming
4fd69331ad Merge remote-tracking branch 'tip/x86/urgent' into efi-for-mingo
Conflicts:
	arch/x86/include/asm/efi.h
2014-03-05 17:31:41 +00:00
Thomas Gleixner
76d388cd72 x86: hyperv: Fixup the (brain) damage caused by the irq cleanup
Compiling last minute changes without setting the proper config
options is not really clever.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-05 13:42:14 +01:00
Matt Fleming
3db4cafdfd x86/boot: Fix non-EFI build
The kbuild test robot reported the following errors, introduced with
commit 54b52d8726 ("x86/efi: Build our own EFI services pointer
table"),

 arch/x86/boot/compressed/head_32.o: In function `efi32_config':
>> (.data+0x58): undefined reference to `efi_call_phys'

 arch/x86/boot/compressed/head_64.o: In function `efi64_config':
>> (.data+0x90): undefined reference to `efi_call6'

Wrap the efi*_config structures in #ifdef CONFIG_EFI_STUB so that we
don't make references to EFI functions if they're not compiled in.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-05 10:19:07 +00:00
Matt Fleming
b663a68583 x86, tools: Fix up compiler warnings
The kbuild test robot reported the following errors that were introduced
with commit 993c30a04e ("x86, tools: Consolidate #ifdef code"),

  arch/x86/boot/tools/build.c: In function 'update_pecoff_setup_and_reloc':
>> arch/x86/boot/tools/build.c:252:1: error: parameter name omitted
    static inline void update_pecoff_setup_and_reloc(unsigned int) {}
    ^
  arch/x86/boot/tools/build.c: In function 'update_pecoff_text':
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted
    static inline void update_pecoff_text(unsigned int, unsigned int) {}
    ^
>> arch/x86/boot/tools/build.c:253:1: error: parameter name omitted

   arch/x86/boot/tools/build.c: In function 'main':
>> arch/x86/boot/tools/build.c:372:2: warning: implicit declaration of function 'efi_stub_entry_update' [-Wimplicit-function-declaration]
    efi_stub_entry_update();
    ^
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-05 10:12:39 +00:00
Jiri Olsa
0ac09f9f8c x86, trace: Fix CR2 corruption when tracing page faults
The trace_do_page_fault function trigger tracepoint
and then handles the actual page fault.

This could lead to error if the tracepoint caused page
fault. The original cr2 value gets lost and the original
page fault handler kills current process with SIGSEGV.

This happens if you record page faults with callchain
data, the user part of it will cause tracepoint handler
to page fault:

  # perf record -g -e exceptions:page_fault_user ls

Fixing this by saving the original cr2 value
and using it after tracepoint handler is done.

v2: Moving the cr2 read before exception_enter, because
    it could trigger tracepoint as well.

Reported-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1402211701380.6395@vincent-weaver-1.um.maine.edu
Link: http://lkml.kernel.org/r/20140228160526.GD1133@krava.brq.redhat.com
2014-03-04 16:00:14 -08:00
H. Peter Anvin
3c0b566334 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
causes a crash during boot - Borislav Petkov
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJTFmV4AAoJEC84WcCNIz1VeSgP/1gykrBiH3Vr4H4c32la/arZ
 ktXRAT+RHdiebEXopt6+A1Pv6iyRYz3fBB1cKKb7q8fhDmVVmefGVkyO4qg4NbgL
 Ic1dP6uPgiFidcYML9/c6+UIovbwDD7f5wwJEjXxoZKg7b7P0TIykd8z8YPQ/6A6
 fIY5Z2L9A8eVt4k1m6Dg2PUJ2B+XKOLa5BbL7gXB6u9avgAVAoLM9oQStss+V9Pv
 JQu+BZxeEwuxgi0LOxk1sFWXaAoRtgVDNH6nPK93CRvO2H83voWFf+OcW9hQWsF5
 tLam6aMkvjuM5Dv1IA69PBAWrE/jxcMvjEdJyrGMWRKWtH/iTN4ez4EoFiO38o4R
 IgzXh9L5xbP3o+g3rntIu4h3/5yde9TRM18mER0lLTdNFZ8QzmLt4L0TptntRoBv
 bTXffILACq0uQU6T10P+EwseT472HphMeswaWVDkRxkN2hTCipFqQX0ekb0qbw3O
 yQeRyz1/t7DeA77iAGG96SfeziMdr44d6u6zDQPrTAJV+H1ZZ3XNIlJDi01CAg3b
 PDT6nHb/9V0tPZQebntZnczVRP+5EBdn5RbJaAMldEwVgjdjlFNY5slsF01KeCSF
 6Cx6UyAI9XKuZMoayC3QDHKKWi+BTOwbKcVAdtZ1c9AMLt9pyxsDfdoOAV4zBiQa
 PsVs9t/l7TZoL1MQHlEJ
 =YIab
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Disable the new EFI 1:1 virtual mapping for SGI UV because using it
   causes a crash during boot - Borislav Petkov

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-03-04 15:50:06 -08:00
Borislav Petkov
a5d90c923b x86/efi: Quirk out SGI UV
Alex reported hitting the following BUG after the EFI 1:1 virtual
mapping work was merged,

 kernel BUG at arch/x86/mm/init_64.c:351!
 invalid opcode: 0000 [#1] SMP
 Call Trace:
  [<ffffffff818aa71d>] init_extra_mapping_uc+0x13/0x15
  [<ffffffff818a5e20>] uv_system_init+0x22b/0x124b
  [<ffffffff8108b886>] ? clockevents_register_device+0x138/0x13d
  [<ffffffff81028dbb>] ? setup_APIC_timer+0xc5/0xc7
  [<ffffffff8108b620>] ? clockevent_delta2ns+0xb/0xd
  [<ffffffff818a3a92>] ? setup_boot_APIC_clock+0x4a8/0x4b7
  [<ffffffff8153d955>] ? printk+0x72/0x74
  [<ffffffff818a1757>] native_smp_prepare_cpus+0x389/0x3d6
  [<ffffffff818957bc>] kernel_init_freeable+0xb7/0x1fb
  [<ffffffff81535530>] ? rest_init+0x74/0x74
  [<ffffffff81535539>] kernel_init+0x9/0xff
  [<ffffffff81541dfc>] ret_from_fork+0x7c/0xb0
  [<ffffffff81535530>] ? rest_init+0x74/0x74

Getting this thing to work with the new mapping scheme would need more
work, so automatically switch to the old memmap layout for SGI UV.

Acked-by: Russ Anderson <rja@sgi.com>
Cc: Alex Thorlton <athorlton@sgi.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 23:43:33 +00:00
Thomas Gleixner
13b5be56d1 x86: hyperv: Fix brown paperbag typos reported by Fenguangs build robot
Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
2014-03-04 23:53:33 +01:00
Thomas Gleixner
3c433679ab x86: hyperv: Make it build with CONFIG_HYPERV=m again
Commit 1aec16967 (x86: Hyperv: Cleanup the irq mess) removed the
ability to build the hyperv stuff as a module. Bring it back.

Reported-by: fengguang.wu@intel.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
2014-03-04 23:41:44 +01:00
Matt Fleming
18c46461d9 x86/efi: Re-disable interrupts after calling firmware services
Some firmware appears to enable interrupts during boot service calls,
even if we've explicitly disabled them prior to the call. This is
actually allowed per the UEFI spec because boottime services expect to
be called with interrupts enabled.

So that's fine, we just need to ensure that we disable them again in
efi_enter32() before switching to a 64-bit GDT, otherwise an interrupt
may fire causing a 32-bit IRQ handler to run after we've left
compatibility mode.

Despite efi_enter32() being called both for boottime and runtime
services, this really only affects boottime because the runtime services
callchain is executed with interrupts disabled. See efi_thunk().

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:44:00 +00:00
Matt Fleming
108d3f44b1 x86/boot: Don't overwrite cr4 when enabling PAE
Some EFI firmware makes use of the FPU during boottime services and
clearing X86_CR4_OSFXSR by overwriting %cr4 causes the firmware to
crash.

Add the PAE bit explicitly instead of trashing the existing contents,
leaving the rest of the bits as the firmware set them.

Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:59 +00:00
Matt Fleming
7d453eee36 x86/efi: Wire up CONFIG_EFI_MIXED
Add the Kconfig option and bump the kernel header version so that boot
loaders can check whether the handover code is available if they want.

The xloadflags field in the bzImage header is also updated to reflect
that the kernel supports both entry points by setting both of
XLF_EFI_HANDOVER_32 and XLF_EFI_HANDOVER_64 when CONFIG_EFI_MIXED=y.
XLF_CAN_BE_LOADED_ABOVE_4G is disabled so that the kernel text is
guaranteed to be addressable with 32-bits.

Note that no boot loaders should be using the bits set in xloadflags to
decide which entry point to jump to. The entire scheme is based on the
concept that 32-bit bootloaders always jump to ->handover_offset and
64-bit loaders always jump to ->handover_offset + 512. We set both bits
merely to inform the boot loader that it's safe to use the native
handover offset even if the machine type in the PE/COFF header claims
otherwise.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:57 +00:00
Matt Fleming
4f9dbcfc40 x86/efi: Add mixed runtime services support
Setup the runtime services based on whether we're booting in EFI native
mode or not. For non-native mode we need to thunk from 64-bit into
32-bit mode before invoking the EFI runtime services.

Using the runtime services after SetVirtualAddressMap() is slightly more
complicated because we need to ensure that all the addresses we pass to
the firmware are below the 4GB boundary so that they can be addressed
with 32-bit pointers, see efi_setup_page_tables().

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:43:14 +00:00
Matt Fleming
b8ff87a615 x86/efi: Firmware agnostic handover entry points
The EFI handover code only works if the "bitness" of the firmware and
the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
possible to mix the two. This goes against the tradition that a 32-bit
kernel can be loaded on a 64-bit BIOS platform without having to do
anything special in the boot loader. Linux distributions, for one thing,
regularly run only 32-bit kernels on their live media.

Despite having only one 'handover_offset' field in the kernel header,
EFI boot loaders use two separate entry points to enter the kernel based
on the architecture the boot loader was compiled for,

    (1) 32-bit loader: handover_offset
    (2) 64-bit loader: handover_offset + 512

Since we already have two entry points, we can leverage them to infer
the bitness of the firmware we're running on, without requiring any boot
loader modifications, by making (1) and (2) valid entry points for both
CONFIG_X86_32 and CONFIG_X86_64 kernels.

To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
loader will always use (2). It's just that, if a single kernel image
supports (1) and (2) that image can be used with both 32-bit and 64-bit
boot loaders, and hence both 32-bit and 64-bit EFI.

(1) and (2) must be 512 bytes apart at all times, but that is already
part of the boot ABI and we could never change that delta without
breaking existing boot loaders anyhow.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:06 +00:00
Matt Fleming
c116e8d60a x86/efi: Split the boot stub into 32/64 code paths
Make the decision which code path to take at runtime based on
efi_early->is64.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:05 +00:00
Matt Fleming
0154416a71 x86/efi: Add early thunk code to go from 64-bit to 32-bit
Implement the transition code to go from IA32e mode to protected mode in
the EFI boot stub. This is required to use 32-bit EFI services from a
64-bit kernel.

Since EFI boot stub is executed in an identity-mapped region, there's
not much we need to do before invoking the 32-bit EFI boot services.
However, we do reload the firmware's global descriptor table
(efi32_boot_gdt) in case things like timer events are still running in
the firmware.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:04 +00:00
Matt Fleming
54b52d8726 x86/efi: Build our own EFI services pointer table
It's not possible to dereference the EFI System table directly when
booting a 64-bit kernel on a 32-bit EFI firmware because the size of
pointers don't match.

In preparation for supporting the above use case, build a list of
function pointers on boot so that callers don't have to worry about
converting pointer sizes through multiple levels of indirection.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:03 +00:00
Matt Fleming
677703cef0 efi: Add separate 32-bit/64-bit definitions
The traditional approach of using machine-specific types such as
'unsigned long' does not allow the kernel to interact with firmware
running in a different CPU mode, e.g. 64-bit kernel with 32-bit EFI.

Add distinct EFI structure definitions for both 32-bit and 64-bit so
that we can use them in the 32-bit and 64-bit code paths.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:25:02 +00:00