Commit graph

18327 commits

Author SHA1 Message Date
Borislav Petkov
2a929242ee lockdep, x86/alternatives: Drop ancient lockdep fixup message
It messes up the output of the nodes/cores bootup table and it
is obsolete anyway, see

  17abecfe65 x86: fix up alternatives with lockdep enabled

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: huawei.libin@huawei.com
Cc: wangyijing@huawei.com
Cc: fenghua.yu@intel.com
Cc: guohanjun@huawei.com
Cc: paul.gortmaker@windriver.com
Link: http://lkml.kernel.org/r/20130927143442.GE4422@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:09:41 +02:00
Peter Zijlstra
75f93fed50 sched: Revert need_resched() to look at TIF_NEED_RESCHED
Yuanhan reported a serious throughput regression in his pigz
benchmark. Using the ftrace patch I found that several idle
paths need more TLC before we can switch the generic
need_resched() over to preempt_need_resched.

The preemption paths benefit most from preempt_need_resched and
do indeed use it; all other need_resched() users don't really
care that much so reverting need_resched() back to
tif_need_resched() is the simple and safe solution.

Reported-by: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: lkp@linux.intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20130927153003.GF15690@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-28 10:04:47 +02:00
Konrad Rzeszutek Wilk
b1922a519e xen/mmu: Correct PAT MST setting.
Jan Beulich spotted that the PAT MSR settings in the Xen public
document that "the first (PAT6) column was wrong across the
board, and the column for PAT7 was missing altogether."

This updates it to be in sync.

CC: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
2013-09-27 09:04:45 -04:00
Suravee Suthikulpanit
accd1e823e x86/microcode/AMD: Fix patch level reporting for family 15h
On AMD family 14h, applying microcode patch on the a core (core0)
would also affect the other core (core1) in the same compute
unit. The driver would skip applying the patch on core1, but it
still need to update kernel structures to reflect the proper
patch level.

The current logic is not updating the struct
ucode_cpu_info.cpu_sig.rev of the skipped core. This causes the
/sys/devices/system/cpu/cpu1/microcode/version to report
incorrect patch level as shown below:

  $ grep . cpu?/microcode/version
  cpu0/microcode/version:0x600063d
  cpu1/microcode/version:0x6000626
  cpu2/microcode/version:0x600063d
  cpu3/microcode/version:0x6000626
  cpu4/microcode/version:0x600063d

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: <bp@alien8.de>
Cc: <jacob.w.shin@gmail.com>
Cc: <herrmann.der.user@googlemail.com>
Link: http://lkml.kernel.org/r/1285806432-1995-1-git-send-email-suravee.suthikulpanit@amd.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-27 09:29:27 +02:00
Kees Cook
809373e29a x86, build: move build output statistics away from stderr
When building on x86, the final image building step always emits stats
to stderr, even though this information is neither a warning nor an error:

  BUILD   arch/x86/boot/bzImage
Setup is 16188 bytes (padded to 16384 bytes).
System is 6368 kB
CRC cbe50c61

Validating automated builds would be cleaner if stderr did not have to
filter out these lines. Instead, change how tools/build is called, and
make the zoffset header unconditional, and write to a specified file
instead of to stdout, which can then be used for statistics, leaving
stderr open for legitimate warnings and errors, like the output from
die().

Signed-off-by: Kees Cook <keescook@chromium.org>
Link: http://lkml.kernel.org/r/20130906181532.GA31260@www.outflux.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-09-26 14:05:41 -07:00
Masoud Sharbiani
b5eafc6f07 x86/reboot: Remove the duplicate C6100 entry in the reboot quirks list
Two entries for the same system type were added, with two different vendor
names: 'Dell' and 'Dell, Inc.'.

Since a prefix match is being used by the DMI parsing code, we can eliminate
the latter as redundant.

Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Masoud Sharbiani <msharbiani@twitter.com>
Cc: holt@sgi.com
Link: http://lkml.kernel.org/r/1380216643-4683-1-git-send-email-masoud.sharbiani@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:52:37 +02:00
Ingo Molnar
474a83b76f Merge branch 'linus'
Merge in the relevant upstream merge point to queue up dependent patch.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-26 20:49:53 +02:00
Linus Torvalds
4b97280675 Bug-fixes:
- Fix PV spinlocks triggering jump_label code bug
  - Remove extraneous code in the tpm front driver
  - Fix ballooning out of pages when non-preemptible
  - Fix deadlock when using a 32-bit initial domain with large amount of memory.
  - Add xen_nopvpsin parameter to the documentation
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQEcBAABAgAGBQJSQvzCAAoJEFjIrFwIi8fJyCIIAMENABapdLhrOiRdQ1Y7T5v1
 4bogPDLwpVxHzwo/vnHcNpl35/dUZrC6wQa51Bkoqq0V8o1XmjFy3SY/EBGjEAvw
 hh4qxGY0p0NNi6hKrWC8mH9u2TcluZGm1uecabkXUhl9mrAB5oBsfJdbBZ5N69gO
 QXXt0j7Xwv1APwH86T0e1Lz+lulhdw2ItXP4osYkEbRYNSaaGnuwsd0Jxcb4DeMk
 qhKgP7QMn3C7zDDaapJo1axeYQRBNEtv5M8+0wwMleX4yX1+IBRZeQTsRfMr7RB/
 8FhssWiH15xU6Gmzgi/VR8xhTEIbQh5GWsVReGf6pqIYSxGSYTvvyhm0bVRH4JI=
 =c+7u
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes and one update to the kernel-paramters.txt documentation.

   - Fix PV spinlocks triggering jump_label code bug
   - Remove extraneous code in the tpm front driver
   - Fix ballooning out of pages when non-preemptible
   - Fix deadlock when using a 32-bit initial domain with large amount
     of memory
   - Add xen_nopvpsin parameter to the documentation"

* tag 'stable/for-linus-3.12-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/spinlock: Document the xen_nopvspin parameter.
  xen/p2m: check MFN is in range before using the m2p table
  xen/balloon: don't alloc page while non-preemptible
  xen: Do not enable spinlocks before jump_label_init() has executed
  tpm: xen-tpmfront: Remove the locality sysfs attribute
  tpm: xen-tpmfront: Fix default durations
2013-09-25 15:50:53 -07:00
Linus Torvalds
654fdd0412 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "An EFI fix and two reboot-quirk fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
  x86/reboot: Add quirk to make Dell C6100 use reboot=pci automatically
  x86, efi: Don't map Boot Services on i386
2013-09-25 13:29:18 -07:00
Linus Torvalds
bdc5663fa1 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Assorted standalone fixes"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Add model number for Avoton Silvermont
  perf: Fix capabilities bitfield compatibility in 'struct perf_event_mmap_page'
  perf/x86/intel/uncore: Don't use smp_processor_id() in validate_group()
  perf: Update ABI comment
  tools lib lk: Uninclude linux/magic.h in debugfs.c
  perf tools: Fix old GCC build error in trace-event-parse.c:parse_proc_kallsyms()
  perf probe: Fix finder to find lines of given function
  perf session: Check for SIGINT in more loops
  perf tools: Fix compile with libelf without get_phdrnum
  perf tools: Fix buildid cache handling of kallsyms with kcore
  perf annotate: Fix objdump line parsing offset validation
  perf tools: Fill in new definitions for madvise()/mmap() flags
  perf tools: Sharpen the libaudit dependencies test
2013-09-25 13:28:08 -07:00
Li Fei
16c21ae5ca reboot: Allow specifying warm/cold reset for CF9 boot type
In current implementation for reboot type CF9 and CF9_COND,
warm and cold reset are not differentiated, and both are
performed by writing 0x06 to port 0xCF9.

This commit will differentiate warm and cold reset:

For warm reset, write 0x06 to port 0xCF9;
For cold reset, write 0x0E to port 0xCF9.

[ hpa: This meaning of "cold" and "warm" reset is different from other
  reboot types use, where "warm" means "bypass BIOS POST".  It is also
  not entirely clear that it actually solves any actual problem.  However,
  it would seem fairly harmless to offer this additional option.

  Also note that we do not mask bit 3 in the "warm reset" case.  This
  preserves the behavior on existing systems, including ones quirked
  to use CF9.  It seems reasonable that on any system where the
  warm/cold distinction actually matters that bit 3 would be read as
  zero. ]

From: Liu Chuansheng <chuansheng.liu@intel.com>
Signed-off-by: Li Fei <fei.li@intel.com>
Link: http://lkml.kernel.org/r/1377072837.24556.2.camel@fli24-HP-Compaq-8100-Elite-CMT-PC
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 19:33:22 +02:00
Yinghai Lu
bee7f9c83e ACPI / x86: Increase override tables number limit
Current ACPI tables in initrd is limited to 10, that is too small.
64 should be good enough as we have 35 sigs and could have several
SSDT.

Two problems in current code prevent us from increasing limit:
 1. The cpio file info array is put in stack, as every element is 32
    bytes, could run out of stack if we have that array size to 64.
    We can move it out from stack, make it global and put it into the
    __initdata section.
 2. early_ioremap() only can remap 256k one time. Current code maps
    10 tables at a time. If we increased that limit, the whole size
    could be more than 256k, so early_ioremap() would fail with that.
    We can map chunks one by one during copying, instead of mapping
    all of them together.

Signed-off-by: Yinghai <yinghai@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Thomas Renninger <trenn@suse.de>
Reviewed-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Tang Chen <tangchen@cn.fujitsu.com>
Acked-by: Toshi Kani <toshi.kani@hp.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 16:59:39 +02:00
David Vrabel
0160676bba xen/p2m: check MFN is in range before using the m2p table
On hosts with more than 168 GB of memory, a 32-bit guest may attempt
to grant map an MFN that is error cannot lookup in its mapping of the
m2p table.  There is an m2p lookup as part of m2p_add_override() and
m2p_remove_override().  The lookup falls off the end of the mapped
portion of the m2p and (because the mapping is at the highest virtual
address) wraps around and the lookup causes a fault on what appears to
be a user space address.

do_page_fault() (thinking it's a fault to a userspace address), tries
to lock mm->mmap_sem.  If the gntdev device is used for the grant map,
m2p_add_override() is called from from gnttab_mmap() with mm->mmap_sem
already locked.  do_page_fault() then deadlocks.

The deadlock would most commonly occur when a 64-bit guest is started
and xenconsoled attempts to grant map its console ring.

Introduce mfn_to_pfn_no_overrides() which checks the MFN is within the
mapped portion of the m2p table before accessing the table and use
this in m2p_add_override(), m2p_remove_override(), and mfn_to_pfn()
(which already had the correct range check).

All faults caused by accessing the non-existant parts of the m2p are
thus within the kernel address space and exception_fixup() is called
without trying to lock mm->mmap_sem.

This means that for MFNs that are outside the mapped range of the m2p
then mfn_to_pfn() will always look in the m2p overrides.  This is
correct because it must be a foreign MFN (and the PFN in the m2p in
this case is only relevant for the other domain).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Stefano Stabellini <stefano.stabellini@citrix.com>
Cc: Jan Beulich <JBeulich@suse.com>
--
v3: check for auto_translated_physmap in mfn_to_pfn_no_overrides()
v2: in mfn_to_pfn() look in m2p_overrides if the MFN is out of
    range as it's probably foreign.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
2013-09-25 09:00:03 -04:00
Peter Zijlstra
1a338ac32c sched, x86: Optimize the preempt_schedule() call
Remove the bloat of the C calling convention out of the
preempt_enable() sites by creating an ASM wrapper which allows us to
do an asm("call ___preempt_schedule") instead.

calling.h bits by Andi Kleen

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-tk7xdi1cvvxewixzke8t8le1@git.kernel.org
[ Fixed build error. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:23:07 +02:00
Peter Zijlstra
c2daa3bed5 sched, x86: Provide a per-cpu preempt_count implementation
Convert x86 to use a per-cpu preemption count. The reason for doing so
is that accessing per-cpu variables is a lot cheaper than accessing
thread_info variables.

We still need to save/restore the actual preemption count due to
PREEMPT_ACTIVE so we place the per-cpu __preempt_count variable in the
same cache-line as the other hot __switch_to() variables such as
current_task.

NOTE: this save/restore is required even for !PREEMPT kernels as
cond_resched() also relies on preempt_count's PREEMPT_ACTIVE to ignore
task_struct::state.

Also rename thread_info::preempt_count to ensure nobody is
'accidentally' still poking at it.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-gzn5rfsf8trgjoqx8hyayy3q@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:57 +02:00
Peter Zijlstra
bdb4380658 sched: Extract the basic add/sub preempt_count modifiers
Rewrite the preempt_count macros in order to extract the 3 basic
preempt_count value modifiers:

  __preempt_count_add()
  __preempt_count_sub()

and the new:

  __preempt_count_dec_and_test()

And since we're at it anyway, replace the unconventional
$op_preempt_count names with the more conventional preempt_count_$op.

Since these basic operators are equivalent to the previous _notrace()
variants, do away with the _notrace() versions.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-ewbpdbupy9xpsjhg960zwbv8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:54 +02:00
Peter Zijlstra
a787870924 sched, arch: Create asm/preempt.h
In order to prepare to per-arch implementations of preempt_count move
the required bits into an asm-generic header and use this for all
archs.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-h5j0c1r3e3fk015m30h8f1zx@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 14:07:50 +02:00
Peter Zijlstra
ea81174789 sched, idle: Fix the idle polling state logic
Mike reported that commit 7d1a9417 ("x86: Use generic idle loop")
regressed several workloads and caused excessive reschedule
interrupts.

The patch in question failed to notice that the x86 code had an
inverted sense of the polling state versus the new generic code (x86:
default polling, generic: default !polling).

Fix the two prominent x86 mwait based idle drivers and introduce a few
new generic polling helpers (fixing the wrong smp_mb__after_clear_bit
usage).

Also switch the idle routines to using tif_need_resched() which is an
immediate TIF_NEED_RESCHED test as opposed to need_resched which will
end up being slightly different.

Reported-by: Mike Galbraith <bitbucket@online.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: lenb@kernel.org
Cc: tglx@linutronix.de
Link: http://lkml.kernel.org/n/tip-nc03imb0etuefmzybzj7sprf@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:10 +02:00
Peter Zijlstra
0c44c2d0f4 x86: Use asm goto to implement better modify_and_test() functions
Linus suggested using asm goto to get rid of the typical SETcc + TEST
instruction pair -- which also clobbers an extra register -- for our
typical modify_and_test() functions.

Because asm goto doesn't allow output fields it has to include an
unconditinal memory clobber when it changes a memory variable to force
a reload.

Luckily all atomic ops already imply a compiler barrier to go along
with their memory barrier semantics.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-0mtn9siwbeo1d33bap1422se@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 13:53:08 +02:00
Roy Franz
46f4582e7c efi: Generalize handle_ramdisks() and rename to handle_cmdline_files().
The handle_cmdline_files now takes the option to handle as a string,
and returns the loaded data through parameters, rather than taking
an x86 specific setup_header structure.  For ARM, this will be used
to load a device tree blob in addition to initrd images.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:43 +01:00
Roy Franz
0e1cadb05b efi: Allow efi_free() to be called with size of 0
Make efi_free() safely callable with size of 0, similar to free() being
callable with NULL pointers, and do nothing in that case.
Remove size checks that this makes redundant.  This also avoids some
size checks in the ARM EFI stub code that will be added as well.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:42 +01:00
Roy Franz
ae8e9060a3 efi: use efi_get_memory_map() to get final map for x86
Replace the open-coded memory map getting with the
efi_get_memory_map() that is now general enough to use.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:41 +01:00
Roy Franz
5fef3870c5 efi: Move unicode to ASCII conversion to shared function.
Move the open-coded conversion to a shared function for
use by all architectures.  Change the allocation to prefer
a high address for ARM, as this is required to avoid conflicts
with reserved regions in low memory.  We don't know the specifics
of these regions until after we process the command line and
device tree.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:38 +01:00
Roy Franz
4a9f3a7c33 efi: Generalize relocate_kernel() for use by other architectures.
Rename relocate_kernel() to efi_relocate_kernel(), and take
parameters rather than x86 specific structure.  Add max_addr
argument as for ARM we have some address constraints that we
need to enforce when relocating the kernel.  Add alloc_size
parameter for use by ARM64 which uses an uncompressed kernel,
and needs to allocate space for BSS.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:38 +01:00
Roy Franz
c6866d7238 efi: Move relocate_kernel() to shared file.
The relocate_kernel() function will be generalized and used
by all architectures, as they all have similar requirements.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:37 +01:00
Roy Franz
40e4530a00 efi: Rename memory allocation/free functions
Rename them to be more similar, as low_free() could be used to free
memory allocated by both high_alloc() and low_alloc().
high_alloc() -> efi_high_alloc()
low_alloc()  -> efi_low_alloc()
low_free()   -> efi_free()

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:36 +01:00
Roy Franz
876dc36ace efi: Add system table pointer argument to shared functions.
Add system table pointer argument to shared EFI stub related functions
so they no longer use a global system table pointer as they did when part
of eboot.c.  For the ARM EFI stub this allows us to avoid global
variables completely and thereby not have to deal with GOT fixups.
Not having the EFI stub fixup its GOT, which is shared with the
decompressor, simplifies the relocating of the zImage to a
bootable address.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:35 +01:00
Roy Franz
7721da4c1e efi: Move common EFI stub code from x86 arch code to common location
No code changes made, just moving functions and #define from x86 arch
directory to common location.  Code is shared using #include, similar
to how decompression code is shared among architectures.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:34 +01:00
Roy Franz
ed37ddffe2 efi: Add proper definitions for some EFI function pointers.
The x86/AMD64 EFI stubs must use a call wrapper to convert between
the Linux and EFI ABIs, so void pointers are sufficient.  For ARM,
the ABIs are compatible, so we can directly invoke the function
pointers.  The functions that are used by the ARM stub are updated
to match the EFI definitions.
Also add some EFI types used by EFI functions.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Acked-by: Mark Salter <msalter@redhat.com>
Reviewed-by: Grant Likely <grant.likely@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:33 +01:00
Roy Franz
4172fe2f8a EFI stub documentation updates
Move efi-stub.txt out of x86 directory and into common directory
in preparation for adding ARM EFI stub support.

Signed-off-by: Roy Franz <roy.franz@linaro.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-09-25 12:34:32 +01:00
Gleb Natapov
bcd1c29495 KVM: VMX: do not check bit 12 of EPT violation exit qualification when undefined
Bit 12 is undefined in any of the following cases:
- If the "NMI exiting" VM-execution control is 1 and the "virtual NMIs"
  VM-execution control is 0.
- If the VM exit sets the valid bit in the IDT-vectoring information field

Signed-off-by: Gleb Natapov <gleb@redhat.com>
[Add parentheses around & within && - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-25 11:38:26 +02:00
Toshi Kani
1cad5e9a39 hotplug / x86: Disable ARCH_CPU_PROBE_RELEASE on x86
Commit d7c53c9e enabled ARCH_CPU_PROBE_RELEASE on x86 in order to
serialize CPU online/offline operations.  Although it is the config
option to enable CPU hotplug test interfaces, probe & release, it is
also the option to enable cpu_hotplug_driver_lock() as well.  Therefore,
this option had to be enabled on x86 with dummy arch_cpu_probe() and
arch_cpu_release().

Since then, lock_device_hotplug() was introduced to serialize CPU
online/offline & hotplug operations.  Therefore, this config option
is no longer required for the serialization.  This patch disables
this config option on x86 and revert the changes made by commit
d7c53c9e.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:10 +02:00
Toshi Kani
574b851e99 hotplug / x86: Add hotplug lock to missing places
lock_device_hotplug[_sysfs]() serializes CPU & Memory online/offline
and hotplug operations.  However, this lock is not held in the debug
interfaces below that initiate CPU online/offline operations.

 - _debug_hotplug_cpu(), cpu0 hotplug test interface enabled by
   CONFIG_DEBUG_HOTPLUG_CPU0.
 - cpu_probe_store() and cpu_release_store(), cpu hotplug test interface
   enabled by CONFIG_ARCH_CPU_PROBE_RELEASE.

This patch changes the above interfaces to hold lock_device_hotplug().

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Toshi Kani
f6913f9902 hotplug / x86: Fix online state in cpu0 debug interface
_debug_hotplug_cpu() is a debug interface that puts cpu0 offline during
boot-up when CONFIG_DEBUG_HOTPLUG_CPU0 is set.  After cpu0 is put offline
in this interface, however, /sys/devices/system/cpu/cpu0/online still
shows 1 (online).

This patch fixes _debug_hotplug_cpu() to update dev->offline when CPU
online/offline operation succeeded.

Signed-off-by: Toshi Kani <toshi.kani@hp.com>
Acked-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-25 10:38:09 +02:00
Dave Jones
7a20c2fad6 x86/reboot: Fix apparent cut-n-paste mistake in Dell reboot workaround
This seems to have been copied from the Optiplex 990 entry
above, but somoene forgot to change the ident text.

Signed-off-by: Dave Jones <davej@fedoraproject.org>
Link: http://lkml.kernel.org/r/20130925001344.GA13554@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-25 08:41:10 +02:00
Konrad Rzeszutek Wilk
a945928ea2 xen: Do not enable spinlocks before jump_label_init() has executed
xen_init_spinlocks() currently calls static_key_slow_inc() before
jump_label_init() is invoked. When CONFIG_JUMP_LABEL is set (which usually is
the case) the effect of this static_key_slow_inc() is deferred until after
jump_label_init(). This is different from when CONFIG_JUMP_LABEL is not set, in
which case the key is set immediately. Thus, depending on the value of config
option, we may observe different behavior.

In addition, when we come to __jump_label_transform() from jump_label_init(),
the key (paravirt_ticketlocks_enabled) is already enabled. On processors where
ideal_nop is not the same as default_nop this will cause a BUG() since it is
expected that before a key is enabled the latter is replaced by the former
during initialization.

To address this problem we need to move
static_key_slow_inc(&paravirt_ticketlocks_enabled) so that it is called
after jump_label_init(). We also need to make sure that this is done before
other cpus start to boot. early_initcall appears to be  a good place to do so.
(Note that we cannot move whole xen_init_spinlocks() there since pv_lock_ops
need to be set before alternative_instructions() runs.)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Added extra comments in the code]
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2013-09-24 16:22:26 -04:00
Jan Kiszka
92fbc7b195 KVM: nVMX: Enable unrestricted guest mode support
Now that we provide EPT support, there is no reason to torture our
guests by hiding the relieving unrestricted guest mode feature. We just
need to relax CR0 checks for always-on bits as PE and PG can now be
switched off.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:15 +02:00
Jan Kiszka
10ba54a589 KVM: nVMX: Implement support for EFER saving on VM-exit
Implement and advertise VM_EXIT_SAVE_IA32_EFER. L0 traps EFER writes
unconditionally, so we always find the current L2 value in the
architectural state.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:14 +02:00
Jan Kiszka
59ab5a8f44 KVM: nVMX: Do not set identity page map for L2
Fiddling with CR3 for L2 is L1's job. It may set its own, different
identity map or simple leave it alone if unrestricted guest mode is
enabled. This also fixes reading back the current CR3 on L2 exits for
reporting it to L1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:14 +02:00
Jan Kiszka
9e3e4dbf44 KVM: nVMX: Replace kvm_set_cr0 with vmx_set_cr0 in load_vmcs12_host_state
kvm_set_cr0 performs checks on the state transition that may prevent
loading L1's cr0. For now we rely on the hardware to catch invalid
states loaded by L1 into its VMCS. Still, consistency checks on the host
state part of the VMCS on guest entry will have to be improved later on.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Reviewed-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2013-09-24 19:12:13 +02:00
Ingo Molnar
8a1f4653f2 x86/UV: Check for alloc_cpumask_var() failures properly in uv_nmi_setup()
GCC warned about:

   arch/x86/platform/uv/uv_nmi.c: In function ‘uv_nmi_setup’:
   arch/x86/platform/uv/uv_nmi.c:664:2: warning: the address of ‘uv_nmi_cpu_mask’ will always evaluate as ‘true’

The reason is this code:

        alloc_cpumask_var(&uv_nmi_cpu_mask, GFP_KERNEL);
        BUG_ON(!uv_nmi_cpu_mask);

which is not the way to check for alloc_cpumask_var() failures - its
return code should be checked instead.

Cc: Mike Travis <travis@sgi.com>
Link: http://lkml.kernel.org/n/tip-2pXRemsjupmvonbpmmnzleo1@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:52:40 +02:00
Mike Travis
8eba18428a x86/UV: Add uvtrace support
This patch adds support for the uvtrace module by providing a
skeleton call to the registered trace function.  It also
provides another separate 'NMI' tracer that is triggered by the
system wide 'power nmi' command.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212501.185052551@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:04 +02:00
Mike Travis
12ba6c990f x86/UV: Add kdump to UV NMI handler
If a system has hung and it no longer responds to external
events, this patch adds the capability of doing a standard kdump
and system reboot then triggered by the system NMI command.

It is enabled when the nmi action is changed to "kdump" and the
kernel is built with CONFIG_KEXEC enabled.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.660567460@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:03 +02:00
Mike Travis
3c121d9a21 x86/UV: Add summary of cpu activity to UV NMI handler
The standard NMI handler dumps the states of all the cpus.  This
includes a full register dump and stack trace.  This can be way
more information than what is needed.  This patch adds a
"summary" dump that is basically a form of the "ps" command.  It
includes the symbolic IP address as well as the command field
and basic process information.

It is enabled when the nmi action is changed to "ips".

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.507922930@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:03 +02:00
Mike Travis
0d12ef0c90 x86/UV: Update UV support for external NMI signals
The current UV NMI handler has not been updated for the changes
in the system NMI handler and the perf operations.  The UV NMI
handler reads an MMR in the UV Hub to check to see if the NMI
event was caused by the external 'system NMI' that the operator
can initiate on the System Mgmt Controller.

The problem arises when the perf tools are running, causing
millions of perf events per second on very large CPU count
systems.  Previously this was okay because the perf NMI handler
ran at a higher priority on the NMI call chain and if the NMI
was a perf event, it would stop calling other NMI handlers
remaining on the NMI call chain.

Now the system NMI handler calls all the handlers on the NMI
call chain including the UV NMI handler.  This causes the UV NMI
handler to read the MMRs at the same millions per second rate.
This can lead to significant performance loss and possible
system failures.  It also can cause thousands of 'Dazed and
Confused' messages being sent to the system console.  This
effectively makes perf tools unusable on UV systems.

To avoid this excessive overhead when perf tools are running,
this code has been optimized to minimize reading of the MMRs as
much as possible, by moving to the NMI_UNKNOWN notifier chain.
This chain is called only when all the users on the standard
NMI_LOCAL call chain have been called and none of them have
claimed this NMI.

There is an exception where the NMI_LOCAL notifier chain is
used.  When the perf tools are in use, it's possible that the UV
NMI was captured by some other NMI handler and then either
ignored or mistakenly processed as a perf event.  We set a
per_cpu ('ping') flag for those CPUs that ignored the initial
NMI, and then send them an IPI NMI signal.  The NMI_LOCAL
handler on each cpu does not need to read the MMR, but instead
checks the in memory flag indicating it was pinged.  There are
two module variables, 'ping_count' indicating how many requested
NMI events occurred, and 'ping_misses' indicating how many stray
NMI events.  These most likely are perf events so it shows the
overhead of the perf NMI interrupts and how many MMR reads were avoided.

This patch also minimizes the reads of the MMRs by having the
first cpu entering the NMI handler on each node set a per HUB
in-memory atomic value.  (Having a per HUB value avoids sending
lock traffic over NumaLink.)  Both types of UV NMIs from the SMI
layer are supported.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.353547733@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Mike Travis
1e019421bc x86/UV: Move NMI support
This patch moves the UV NMI support from the x2apic file to a
new separate uv_nmi.c file in preparation for the next sequence
of patches. It prevents upcoming bloat of the x2apic file, and
has the added benefit of putting the upcoming /sys/module
parameters under the name 'uv_nmi' instead of 'x2apic_uv_x',
which was obscure.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Dimitri Sivanich <sivanich@sgi.com>
Reviewed-by: Hedi Berriche <hedi@sgi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Jason Wessel <jason.wessel@windriver.com>
Link: http://lkml.kernel.org/r/20130923212500.183295611@asylum.americas.sgi.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-09-24 09:02:02 +02:00
Jiang Liu
7e1f85f96d x86 / ACPI: simplify _acpi_map_lsapic()
In acpi_register_lapic(), it will generates a new logical cpu
number and maps to the local APIC id, this logical cpu number
can be returned to simplify _acpi_map_lsapic() implementation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:40 +02:00
Jiang Liu
d536bf3dc9 ACPI / processor: use apic_id and remove duplicated _MAT evaluation
Since APIC id is saved in processor struct, just use it and
remove the duplicated _MAT evaluation.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:39:39 +02:00
Zhang Rui
b408a05493 olpc_xo15_sci: convert acpi_evaluate_object() to acpi_execute_simple_method()
acpi_execute_simple_method() is a new ACPI API introduced to invoke
an ACPI control method that has single integer parameter and no return value.

Convert acpi_evaluate_object() to acpi_execute_simple_method()
in arch/x86/platform/olpc/olpc-xo15-sci.c

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
CC: Daniel Drake <dsd@laptop.org>
CC: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-09-24 01:37:53 +02:00
Yijing Wang
f8a26fe637 x86/pci: Use cached pci_dev->pcie_cap to simplify code
The PCI core caches the PCIe Capability offset in pci_dev->pcie_cap, so
use that instead of pci_find_capability().  Use pci_bus_set_ops() when
replacing the device pci_ops.  And use #defines instead of numeric
constants.

[bhelgaas: changelog, also use PCI_EXP_LNKCTL_ASPMC]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2013-09-23 17:30:03 -06:00