Commit graph

19067 commits

Author SHA1 Message Date
Matt Fleming
099240ac11 x86/efi: Delete dead code when checking for non-native
Both efi_free_boot_services() and efi_enter_virtual_mode() are invoked
from init/main.c, but only if the EFI runtime services are available.
This is not the case for non-native boots, e.g. where a 64-bit kernel is
booted with 32-bit EFI firmware.

Delete the dead code.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:24:59 +00:00
Matt Fleming
426e34cc4f x86/mm/pageattr: Always dump the right page table in an oops
Now that we have EFI-specific page tables we need to lookup the pgd when
dumping those page tables, rather than assuming that swapper_pgdir is
the current pgdir.

Remove the double underscore prefix, which is usually reserved for
static functions.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:23:36 +00:00
Matt Fleming
993c30a04e x86, tools: Consolidate #ifdef code
Instead of littering main() with #ifdef CONFIG_EFI_STUB, move the logic
into separate functions that do nothing if the config option isn't set.
This makes main() much easier to read.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:23:35 +00:00
Matt Fleming
86134a1b39 x86/boot: Cleanup header.S by removing some #ifdefs
handover_offset is now filled out by build.c. Don't set a default value
as it will be overwritten anyway.

Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 21:23:34 +00:00
Michael Opdenacker
d20d2efbf2 x86: Remove deprecated IRQF_DISABLED
This patch removes the IRQF_DISABLED flag from x86 architecture
code. It's a NOOP since 2.6.35 and it will be removed one day.

Signed-off-by: Michael Opdenacker <michael.opdenacker@free-electrons.com>
Cc: venki@google.com
Link: http://lkml.kernel.org/r/1393965305-17248-1-git-send-email-michael.opdenacker@free-electrons.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 21:47:51 +01:00
Thomas Gleixner
1aec169673 x86: Hyperv: Cleanup the irq mess
The vmbus/hyperv interrupt handling is another complete trainwreck and
probably the worst of all currently in tree.

If CONFIG_HYPERV=y then the interrupt delivery to the vmbus happens
via the direct HYPERVISOR_CALLBACK_VECTOR. So far so good, but:

  The driver requests first a normal device interrupt. The only reason
  to do so is to increment the interrupt stats of that device
  interrupt. For no reason it also installs a private flow handler.

  We have proper accounting mechanisms for direct vectors, but of
  course it's too much effort to add that 5 lines of code.

  Aside of that the alloc_intr_gate() is not protected against
  reallocation which makes module reload impossible.

Solution to the problem is simple to rip out the whole mess and
implement it correctly.

First of all move all that code to arch/x86/kernel/cpu/mshyperv.c and
merily install the HYPERVISOR_CALLBACK_VECTOR with proper reallocation
protection and use the proper direct vector accounting mechanism.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linuxdrivers <devel@linuxdriverproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212739.028307673@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 17:37:54 +01:00
Thomas Gleixner
929320e4b4 x86: Add proper vector accounting for HYPERVISOR_CALLBACK_VECTOR
HyperV abuses a device interrupt to account for the
HYPERVISOR_CALLBACK_VECTOR.

Provide proper accounting as we have for the other vectors as well.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212738.681855582@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 17:37:54 +01:00
Thomas Gleixner
770144ea7b x86: Xen: Use the core irq stats function
Let the core do the irq_desc resolution.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xen <xen-devel@lists.xenproject.org>
Cc: x86 <x86@kernel.org>
Link: http://lkml.kernel.org/r/20140223212737.869264085@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-03-04 17:37:53 +01:00
Borislav Petkov
fabb37c736 x86/efi: Split efi_enter_virtual_mode
... into a kexec flavor for better code readability and simplicity. The
original one was getting ugly with ifdeffery.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:19 +00:00
Borislav Petkov
b7b898ae0c x86/efi: Make efi virtual runtime map passing more robust
Currently, running SetVirtualAddressMap() and passing the physical
address of the virtual map array was working only by a lucky coincidence
because the memory was present in the EFI page table too. Until Toshi
went and booted this on a big HP box - the krealloc() manner of resizing
the memmap we're doing did allocate from such physical addresses which
were not mapped anymore and boom:

http://lkml.kernel.org/r/1386806463.1791.295.camel@misato.fc.hp.com

One way to take care of that issue is to reimplement the krealloc thing
but with pages. We start with contiguous pages of order 1, i.e. 2 pages,
and when we deplete that memory (shouldn't happen all that often but you
know firmware) we realloc the next power-of-two pages.

Having the pages, it is much more handy and easy to map them into the
EFI page table with the already existing mapping code which we're using
for building the virtual mappings.

Thanks to Toshi Kani and Matt for the great debugging help.

Reported-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:18 +00:00
Borislav Petkov
42a5477251 x86, pageattr: Export page unmapping interface
We will use it in efi so expose it.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:17 +00:00
Borislav Petkov
11cc851254 x86/efi: Dump the EFI page table
This is very useful for debugging issues with the recently added
pagetable switching code for EFI virtual mode.

Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:17:17 +00:00
Borislav Petkov
ef6bea6ddf x86, ptdump: Add the functionality to dump an arbitrary pagetable
With reusing the ->trampoline_pgd page table for mapping EFI regions in
order to use them after having switched to EFI virtual mode, it is very
useful to be able to dump aforementioned page table in dmesg. This adds
that functionality through the walk_pgd_level() interface which can be
called from somewhere else.

The original functionality of dumping to debugfs remains untouched.

Cc: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Tested-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:17 +00:00
Joe Perches
9b7d204982 x86/efi: Style neatening
Coalesce formats and remove spaces before tabs.
Move __initdata after the variable declaration.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:17 +00:00
Madper Xie
5db80c6514 x86/efi: Delete out-of-date comments of efi_query_variable_store
For now we only ensure about 5kb free space for avoiding our board
refusing boot. But the comment lies that we retain 50% space.

Signed-off-by: Madper Xie <cxie@redhat.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:17 +00:00
Matt Fleming
0f8093a92d efi: Set feature flags inside feature init functions
It makes more sense to set the feature flag in the success path of the
detection function than it does to rely on the caller doing it. Apart
from it being more logical to group the code and data together, it sets
a much better example for new EFI architectures.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:16 +00:00
Matt Fleming
3e90959921 efi: Move facility flags to struct efi
As we grow support for more EFI architectures they're going to want the
ability to query which EFI features are available on the running system.
Instead of storing this information in an architecture-specific place,
stick it in the global 'struct efi', which is already the central
location for EFI state.

While we're at it, let's change the return value of efi_enabled() to be
bool and replace all references to 'facility' with 'feature', which is
the usual word used to describe the attributes of the running system.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-03-04 16:16:16 +00:00
Paolo Bonzini
1c2af4968e Merge tag 'kvm-for-3.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into kvm-next 2014-03-04 15:58:00 +01:00
Andrew Jones
332967a3ea x86: kvm: introduce periodic global clock updates
commit 0061d53daf introduced a mechanism to execute a global clock
update for a vm. We can apply this periodically in order to propagate
host NTP corrections. Also, if all vcpus of a vm are pinned, then
without an additional trigger, no guest NTP corrections can propagate
either, as the current trigger is only vcpu cpu migration.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-04 11:50:54 +01:00
Andrew Jones
7e44e4495a x86: kvm: rate-limit global clock updates
When we update a vcpu's local clock it may pick up an NTP correction.
We can't wait an indeterminate amount of time for other vcpus to pick
up that correction, so commit 0061d53daf introduced a global clock
update. However, we can't request a global clock update on every vcpu
load either (which is what happens if the tsc is marked as unstable).
The solution is to rate-limit the global clock updates. Marcelo
calculated that we should delay the global clock updates no more
than 0.1s as follows:

Assume an NTP correction c is applied to one vcpu, but not the other,
then in n seconds the delta of the vcpu system_timestamps will be
c * n. If we assume a correction of 500ppm (worst-case), then the two
vcpus will diverge 50us in 0.1s, which is a considerable amount.

Signed-off-by: Andrew Jones <drjones@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-04 11:50:47 +01:00
Heiko Carstens
0473c9b5f0 compat: let architectures define __ARCH_WANT_COMPAT_SYS_GETDENTS64
For architecture dependent compat syscalls in common code an architecture
must define something like __ARCH_WANT_<WHATEVER> if it wants to use the
code.
This however is not true for compat_sys_getdents64 for which architectures
must define __ARCH_OMIT_COMPAT_SYS_GETDENTS64 if they do not want the code.

This leads to the situation where all architectures, except mips, get the
compat code but only x86_64, arm64 and the generic syscall architectures
actually use it.

So invert the logic, so that architectures actively must do something to
get the compat code.

This way a couple of architectures get rid of otherwise dead code.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
2014-03-04 09:05:33 +01:00
Petr Mladek
12729f14d8 ftrace/x86: One more missing sync after fixup of function modification failure
If a failure occurs while modifying ftrace function, it bails out and will
remove the tracepoints to be back to what the code originally was.

There is missing the final sync run across the CPUs after the fix up is done
and before the ftrace int3 handler flag is reset.

Here's the description of the problem:

	CPU0				CPU1
	----				----
  remove_breakpoint();
  modifying_ftrace_code = 0;

				[still sees breakpoint]
				<takes trap>
				[sees modifying_ftrace_code as zero]
				[no breakpoint handler]
				[goto failed case]
				[trap exception - kernel breakpoint, no
				 handler]
				BUG()

Link: http://lkml.kernel.org/r/1393258342-29978-2-git-send-email-pmladek@suse.cz

Fixes: 8a4d0a687a "ftrace: Use breakpoint method to update ftrace caller"
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Petr Mladek <pmladek@suse.cz>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-03 21:23:07 -05:00
Steven Rostedt (Red Hat)
c932c6b7c9 ftrace/x86: Run a sync after fixup on failure
If a failure occurs while enabling a trace, it bails out and will remove
the tracepoints to be back to what the code originally was. But the fix
up had some bugs in it. By injecting a failure in the code, the fix up
ran to completion, but shortly afterward the system rebooted.

There was two bugs here.

The first was that there was no final sync run across the CPUs after the
fix up was done, and before the ftrace int3 handler flag was reset. That
means that other CPUs could still see the breakpoint and trigger on it
long after the flag was cleared, and the int3 handler would think it was
a spurious interrupt. Worse yet, the int3 handler could hit other breakpoints
because the ftrace int3 handler flag would have prevented the int3 handler
from going further.

Here's a description of the issue:

	CPU0				CPU1
	----				----
  remove_breakpoint();
  modifying_ftrace_code = 0;

				[still sees breakpoint]
				<takes trap>
				[sees modifying_ftrace_code as zero]
				[no breakpoint handler]
				[goto failed case]
				[trap exception - kernel breakpoint, no
				 handler]
				BUG()

The second bug was that the removal of the breakpoints required the
"within()" logic updates instead of accessing the ip address directly.
As the kernel text is mapped read-only when CONFIG_DEBUG_RODATA is set, and
the removal of the breakpoint is a modification of the kernel text.
The ftrace_write() includes the "within()" logic, where as, the
probe_kernel_write() does not. This prevented the breakpoint from being
removed at all.

Link: http://lkml.kernel.org/r/1392650573-3390-1-git-send-email-pmladek@suse.cz

Reported-by: Petr Mladek <pmladek@suse.cz>
Tested-by: Petr Mladek <pmladek@suse.cz>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-03-03 21:23:06 -05:00
Paolo Bonzini
ccf9844e5d kvm, vmx: Really fix lazy FPU on nested guest
Commit e504c9098e (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup.  In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c9098e
Reported-by: Kashyap Chamarty <kchamart@redhat.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Kashyap Chamarty <kchamart@redhat.com>
Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-03 12:49:42 +01:00
Greg Kroah-Hartman
13df797743 Merge 3.14-rc5 into driver-core-next
We want the fixes in here.
2014-03-02 20:09:08 -08:00
Linus Torvalds
3154da34be Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Misc fixes, most of them on the tooling side"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools: Fix strict alias issue for find_first_bit
  perf tools: fix BFD detection on opensuse
  perf: Fix hotplug splat
  perf/x86: Fix event scheduling
  perf symbols: Destroy unused symsrcs
  perf annotate: Check availability of annotate when processing samples
2014-03-02 11:37:07 -06:00
Linus Torvalds
55de1ed2f5 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "The VMCOREINFO patch I'll pushing for this release to avoid having a
  release with kASLR and but without that information.

  I was hoping to include the FPU patches from Suresh, but ran into a
  problem (see other thread); will try to make them happen next week"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, kaslr: add missed "static" declarations
  x86, kaslr: export offset in VMCOREINFO ELF notes
2014-03-01 22:48:14 -06:00
Linus Torvalds
d8efcf38b1 Three x86 fixes and one for ARM/ARM64. In particular, nested
virtualization on Intel is broken in 3.13 and fixed by this
 pull request.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJTEIYBAAoJEBvWZb6bTYbyv7AP/iOE8ybTr7MSfZPTF4Ip13o+
 rvzlnzUtDMsZAN5dZAXhsR3lPeXygjTmeI6FWdiVwdalp9wpg2FLAZ0BDj8KIg0r
 cAINYOmJ0jhC1mTOfMJghsrE9b1aaXwWVSlXkivzyIrPJCZFlqDKOXleHJXyqNTY
 g259tPI8VWS7Efell9NclUNXCdD4g6wG/RdEjumjv9JLiXlDVviXvzvIEl/S3Ud1
 D2xxX7vvGHikyTwuls/bWzJzzRRlsb1VVOOQtBuC9NyaAY7bpQjGQvo6XzxowtIZ
 h4F4iU/umln5WcDiJU8XXiV/TOCVzqgdLk3Pr5Kgv3yO8/XbE/CcnyJmeaSgMoJB
 i7vJ6tUX5mfGsLNxfshXw0RsY/y9KMLnbt62eiPImWBxgpDZNxKpAqCA7GsOb87g
 Vjzl3poEwe+5eN6Usbpd78rRgfgbbZF+Pf2qsphtQhFQGaogz1Ltz0B0hY3MYxx3
 y9OJMJyt1MI4+hvvdjhSnmIo6APwuGSr+hhdKCPSlMiWJun2XRHXTHBNAS+dsjgs
 Sx2Bzao/lki5l7y9Ea1fR4yerigbFJF4L1iV04sSbsoh0I/nN5qjXFrc22Ju0i3i
 uIrVwfSSdX4HQwQYdBGKQQRGq/W0wOjEDoA5qZmxg3s4j8KSd7ooBtRk/VepVH7E
 kaUrekJ+KWs/sVNW2MtU
 =zQTn
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "Three x86 fixes and one for ARM/ARM64.

  In particular, nested virtualization on Intel is broken in 3.13 and
  fixed by this pull request"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  kvm, vmx: Really fix lazy FPU on nested guest
  kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
  arm/arm64: KVM: detect CPU reset on CPU_PM_EXIT
  KVM: MMU: drop read-only large sptes when creating lower level sptes
2014-02-28 11:45:03 -08:00
Paolo Bonzini
1b385cbdd7 kvm, vmx: Really fix lazy FPU on nested guest
Commit e504c9098e (kvm, vmx: Fix lazy FPU on nested guest, 2013-11-13)
highlighted a real problem, but the fix was subtly wrong.

nested_read_cr0 is the CR0 as read by L2, but here we want to look at
the CR0 value reflecting L1's setup.  In other words, L2 might think
that TS=0 (so nested_read_cr0 has the bit clear); but if L1 is actually
running it with TS=1, we should inject the fault into L1.

The effective value of CR0 in L2 is contained in vmcs12->guest_cr0, use
it.

Fixes: e504c9098e
Reported-by: Kashyap Chamarty <kchamart@redhat.com>
Reported-by: Stefan Bader <stefan.bader@canonical.com>
Tested-by: Kashyap Chamarty <kchamart@redhat.com>
Tested-by: Anthoine Bourgeois <bourgeois@bertin.fr>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27 22:54:11 +01:00
Andrew Honig
a08d3b3b99 kvm: x86: fix emulator buffer overflow (CVE-2014-0049)
The problem occurs when the guest performs a pusha with the stack
address pointing to an mmio address (or an invalid guest physical
address) to start with, but then extending into an ordinary guest
physical address.  When doing repeated emulated pushes
emulator_read_write sets mmio_needed to 1 on the first one.  On a
later push when the stack points to regular memory,
mmio_nr_fragments is set to 0, but mmio_is_needed is not set to 0.

As a result, KVM exits to userspace, and then returns to
complete_emulated_mmio.  In complete_emulated_mmio
vcpu->mmio_cur_fragment is incremented.  The termination condition of
vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments is never achieved.
The code bounces back and fourth to userspace incrementing
mmio_cur_fragment past it's buffer.  If the guest does nothing else it
eventually leads to a a crash on a memcpy from invalid memory address.

However if a guest code can cause the vm to be destroyed in another
vcpu with excellent timing, then kvm_clear_async_pf_completion_queue
can be used by the guest to control the data that's pointed to by the
call to cancel_work_item, which can be used to gain execution.

Fixes: f78146b0f9
Signed-off-by: Andrew Honig <ahonig@google.com>
Cc: stable@vger.kernel.org (3.5+)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27 19:35:22 +01:00
Takuya Yoshikawa
684851a157 KVM: x86: Break kvm_for_each_vcpu loop after finding the VP_INDEX
No need to scan the entire VCPU array.

Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya_b1@lab.ntt.co.jp>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-27 19:25:39 +01:00
Aravind Gopalakrishnan
85a8885bd0 amd64_edac: Add support for newer F16h models
Extend ECC decoding support for F16h M30h. Tested on F16h M30h with ECC
turned on using mce_amd_inj module and the patch works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392913726-16961-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Tested-by: Arindam Nath <Arindam.Nath@amd.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
2014-02-27 18:03:16 +01:00
H. Peter Anvin
da4aaa7d86 x86, cpufeature: If we disable CLFLUSH, we should disable CLFLUSHOPT
If we explicitly disable the use of CLFLUSH, we should disable the use
of CLFLUSHOPT as well.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-jtdv7btppr4jgzxm3sxx1e74@git.kernel.org
2014-02-27 08:36:31 -08:00
H. Peter Anvin
840d2830e6 x86, cpufeature: Rename X86_FEATURE_CLFLSH to X86_FEATURE_CLFLUSH
We call this "clflush" in /proc/cpuinfo, and have
cpu_has_clflush()... let's be consistent and just call it that.

Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Alan Cox <alan@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-mlytfzjkvuf739okyn40p8a5@git.kernel.org
2014-02-27 08:31:30 -08:00
Ross Zwisler
8b80fd8b45 x86: Use clflushopt in clflush_cache_range
If clflushopt is available on the system, use it instead of clflush in
clflush_cache_range.

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Link: http://lkml.kernel.org/r/1393441612-19729-3-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:26:10 -08:00
Ross Zwisler
171699f763 x86: Add support for the clflushopt instruction
Add support for the new clflushopt instruction.  This instruction was
announced in the document "Intel Architecture Instruction Set Extensions
Programming Reference" with Ref # 319433-018.

http://download-software.intel.com/sites/default/files/managed/50/1a/319433-018.pdf

[ hpa: changed the feature flag to simply X86_FEATURE_CLFLUSHOPT - if
  that is what we want to report in /proc/cpuinfo anyway... ]

Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Link: http://lkml.kernel.org/r/1393441612-19729-2-git-send-email-ross.zwisler@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:23:28 -08:00
H. Peter Anvin
b5660ba76b x86, platforms: Remove NUMAQ
The NUMAQ support seems to be unmaintained, remove it.

Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/n/530CFD6C.7040705@zytor.com
2014-02-27 08:07:39 -08:00
H. Peter Anvin
c5f9ee3d66 x86, platforms: Remove SGI Visual Workstation
The SGI Visual Workstation seems to be dead; remove support so we
don't have to continue maintaining it.

Cc: Andrey Panin <pazke@donpac.ru>
Cc: Michael Reed <mdr@sgi.com>
Link: http://lkml.kernel.org/r/530CFD6C.7040705@zytor.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-27 08:07:39 -08:00
Peter Zijlstra
c347a2f179 perf/x86: Add a few more comments
Add a few comments on the ->add(), ->del() and ->*_txn()
implementation.

Requested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-he3819318c245j7t5e1e22tr@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:43:25 +01:00
Ingo Molnar
ff5a7088f0 Merge branch 'perf/urgent' into perf/core
Merge the latest fixes before queueing up new changes.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:41:17 +01:00
Peter Zijlstra
26e61e8939 perf/x86: Fix event scheduling
Vince "Super Tester" Weaver reported a new round of syscall fuzzing (Trinity) failures,
with perf WARN_ON()s triggering. He also provided traces of the failures.

This is I think the relevant bit:

	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_disable: x86_pmu_disable
	>    pec_1076_warn-2804  [000] d...   147.926153: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926156: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926158: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926159: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926160: x86_pmu_state: n_events: 1, n_added: 0, n_txn: 1
	>    pec_1076_warn-2804  [000] d...   147.926161: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926162: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926163: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926166: collect_events: Adding event: 1 (ffff880119ec8800)

So we add the insn:p event (fd[23]).

At this point we should have:

  n_events = 2, n_added = 1, n_txn = 1

	>    pec_1076_warn-2804  [000] d...   147.926170: collect_events: Adding event: 0 (ffff8800c9e01800)
	>    pec_1076_warn-2804  [000] d...   147.926172: collect_events: Adding event: 4 (ffff8800cbab2c00)

We try and add the {BP,cycles,br_insn} group (fd[3], fd[4], fd[15]).
These events are 0:cycles and 4:br_insn, the BP event isn't x86_pmu so
that's not visible.

	group_sched_in()
	  pmu->start_txn() /* nop - BP pmu */
	  event_sched_in()
	     event->pmu->add()

So here we should end up with:

  0: n_events = 3, n_added = 2, n_txn = 2
  4: n_events = 4, n_added = 3, n_txn = 3

But seeing the below state on x86_pmu_enable(), the must have failed,
because the 0 and 4 events aren't there anymore.

Looking at group_sched_in(), since the BP is the leader, its
event_sched_in() must have succeeded, for otherwise we would not have
seen the sibling adds.

But since neither 0 or 4 are in the below state; their event_sched_in()
must have failed; but I don't see why, the complete state: 0,0,1:p,4
fits perfectly fine on a core2.

However, since we try and schedule 4 it means the 0 event must have
succeeded!  Therefore the 4 event must have failed, its failure will
have put group_sched_in() into the fail path, which will call:

	event_sched_out()
	  event->pmu->del()

on 0 and the BP event.

Now x86_pmu_del() will reduce n_events; but it will not reduce n_added;
giving what we see below:

 n_event = 2, n_added = 2, n_txn = 2

	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_enable: x86_pmu_enable
	>    pec_1076_warn-2804  [000] d...   147.926177: x86_pmu_state: Events: {
	>    pec_1076_warn-2804  [000] d...   147.926179: x86_pmu_state:   0: state: .R config: ffffffffffffffff (          (null))
	>    pec_1076_warn-2804  [000] d...   147.926181: x86_pmu_state:   33: state: AR config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926182: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: n_events: 2, n_added: 2, n_txn: 2
	>    pec_1076_warn-2804  [000] d...   147.926184: x86_pmu_state: Assignment: {
	>    pec_1076_warn-2804  [000] d...   147.926186: x86_pmu_state:   0->33 tag: 1 config: 0 (ffff88011ac99800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state:   1->0 tag: 1 config: 1 (ffff880119ec8800)
	>    pec_1076_warn-2804  [000] d...   147.926188: x86_pmu_state: }
	>    pec_1076_warn-2804  [000] d...   147.926190: x86_pmu_enable: S0: hwc->idx: 33, hwc->last_cpu: 0, hwc->last_tag: 1 hwc->state: 0

So the problem is that x86_pmu_del(), when called from a
group_sched_in() that fails (for whatever reason), and without x86_pmu
TXN support (because the leader is !x86_pmu), will corrupt the n_added
state.

Reported-and-Tested-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Dave Jones <davej@redhat.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20140221150312.GF3104@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-27 12:38:02 +01:00
Rafael J. Wysocki
38953d3945 Merge back earlier 'acpi-processor' material. 2014-02-27 00:22:42 +01:00
Dan Carpenter
b3bd5869fd crypto: remove a duplicate checks in __cbc_decrypt()
We checked "nbytes < bsize" before so it can't happen here.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Jussi Kivilinna <jussi.kivilinna@iki.fi>
Acked-by: Johannes Götzfried <johannes.goetzfried@cs.fau.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2014-02-27 05:56:54 +08:00
Marcelo Tosatti
404381c583 KVM: MMU: drop read-only large sptes when creating lower level sptes
Read-only large sptes can be created due to read-only faults as
follows:

- QEMU pagetable entry that maps guest memory is read-only
due to COW.
- Guest read faults such memory, COW is not broken, because
it is a read-only fault.
- Enable dirty logging, large spte not nuked because it is read-only.
- Write-fault on such memory causes guest to loop endlessly
(which must go down to level 1 because dirty logging is enabled).

Fix by dropping large spte when necessary.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-26 17:23:32 +01:00
Marcelo Tosatti
d3714010c3 KVM: x86: emulator_cmpxchg_emulated should mark_page_dirty
emulator_cmpxchg_emulated writes to guest memory, therefore it should
update the dirty bitmap accordingly.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-26 10:11:08 +01:00
Kees Cook
e2b32e6785 x86, kaslr: randomize module base load address
Randomize the load address of modules in the kernel to make kASLR
effective for modules.  Modules can only be loaded within a particular
range of virtual address space.  This patch adds 10 bits of entropy to
the load address by adding 1-1024 * PAGE_SIZE to the beginning range
where modules are loaded.

The single base offset was chosen because randomizing each module
load ends up wasting/fragmenting memory too much. Prior approaches to
minimizing fragmentation while doing randomization tend to result in
worse entropy than just doing a single base address offset.

Example kASLR boot without this change, with a single module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0001000           4K     ro     GLB x  pte
0xffffffffc0001000-0xffffffffc0002000           4K     ro     GLB NX pte
0xffffffffc0002000-0xffffffffc0004000           8K     RW     GLB NX pte
0xffffffffc0004000-0xffffffffc0200000        2032K                   pte
0xffffffffc0200000-0xffffffffff000000        1006M                   pmd
---[ End Modules ]---

Example kASLR boot after this change, same module loaded:
---[ Modules ]---
0xffffffffc0000000-0xffffffffc0200000           2M                   pmd
0xffffffffc0200000-0xffffffffc03bf000        1788K                   pte
0xffffffffc03bf000-0xffffffffc03c0000           4K     ro     GLB x  pte
0xffffffffc03c0000-0xffffffffc03c1000           4K     ro     GLB NX pte
0xffffffffc03c1000-0xffffffffc03c3000           8K     RW     GLB NX pte
0xffffffffc03c3000-0xffffffffc0400000         244K                   pte
0xffffffffc0400000-0xffffffffff000000        1004M                   pmd
---[ End Modules ]---

Signed-off-by: Andy Honig <ahonig@google.com>
Link: http://lkml.kernel.org/r/20140226005916.GA27083@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 17:07:26 -08:00
Kees Cook
e290e8c59d x86, kaslr: add missed "static" declarations
This silences build warnings about unexported variables and functions.

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20140209215644.GA30339@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:59:29 -08:00
Eugene Surovegin
b6085a8657 x86, kaslr: export offset in VMCOREINFO ELF notes
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging.

[ hpa: pushing this for v3.14 to avoid having a kernel version with
  kASLR where we can't debug output. ]

Signed-off-by: Eugene Surovegin <surovegin@google.com>
Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-25 16:57:47 -08:00
Liu, Jinsong
390bd528ae KVM: x86: Enable Intel MPX for guest
From 44c2abca2c2eadc6f2f752b66de4acc8131880c4 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:12:31 +0800
Subject: [PATCH v5 3/3] KVM: x86: Enable Intel MPX for guest

This patch enable Intel MPX feature to guest.

Signed-off-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-25 20:17:12 +01:00
Liu, Jinsong
0dd376e709 KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save
From 5d5a80cd172ea6fb51786369bcc23356b1e9e956 Mon Sep 17 00:00:00 2001
From: Liu Jinsong <jinsong.liu@intel.com>
Date: Mon, 24 Feb 2014 18:11:55 +0800
Subject: [PATCH v5 2/3] KVM: x86: add MSR_IA32_BNDCFGS to msrs_to_save

Add MSR_IA32_BNDCFGS to msrs_to_save, and corresponding logic
to kvm_get/set_msr().

Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-25 20:17:09 +01:00