Commit graph

19082 commits

Author SHA1 Message Date
Linus Torvalds
161aa772f9 Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "A collection of small fixes:

   - There still seem to be problems with asm goto which requires the
     empty asm hack.
   - If SMAP is disabled at compile time, don't enable it nor try to
     interpret a page fault as an SMAP violation.
   - Fix a case of unbounded recursion while tracing"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off
  x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
  compiler/gcc4: Make quirk for asm_volatile_goto() unconditional
  x86: Use preempt_disable_notrace() in cycles_2_ns()
2014-02-14 11:09:11 -08:00
Matt Fleming
09503379dc x86/efi: Check status field to validate BGRT header
Madper reported seeing the following crash,

  BUG: unable to handle kernel paging request at ffffffffff340003
  IP: [<ffffffff81d85ba4>] efi_bgrt_init+0x9d/0x133
  Call Trace:
   [<ffffffff81d8525d>] efi_late_init+0x9/0xb
   [<ffffffff81d68f59>] start_kernel+0x436/0x450
   [<ffffffff81d6892c>] ? repair_env_string+0x5c/0x5c
   [<ffffffff81d68120>] ? early_idt_handlers+0x120/0x120
   [<ffffffff81d685de>] x86_64_start_reservations+0x2a/0x2c
   [<ffffffff81d6871e>] x86_64_start_kernel+0x13e/0x14d

This is caused because the layout of the ACPI BGRT header on this system
doesn't match the definition from the ACPI spec, and so we get a bogus
physical address when dereferencing ->image_address in efi_bgrt_init().

Luckily the status field in the BGRT header clearly marks it as invalid,
so we can check that field and skip BGRT initialisation.

Reported-by: Madper Xie <cxie@redhat.com>
Suggested-by: Toshi Kani <toshi.kani@hp.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-14 10:07:15 +00:00
Borislav Petkov
c55d016f7a x86/efi: Fix 32-bit fallout
We do not enable the new efi memmap on 32-bit and thus we need to run
runtime_code_page_mkexec() unconditionally there. Fix that.

Reported-and-tested-by: Lejun Zhu <lejun.zhu@intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-14 09:30:19 +00:00
Andi Kleen
67424d5a22 x86, lto: Disable LTO for the x86 VDSO
The VDSO does not play well with LTO, so just disable LTO for it.
Also pass a 32bit linker flag for the 32bit version.

[ hpa: change braces to parens to match kernel Makefile style ]

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391846481-31491-1-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 20:21:57 -08:00
Andi Kleen
634676c203 initconst, x86: Fix initconst mistake in ts5500 code
const data must be initconst.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-14-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:14:54 -08:00
Andi Kleen
a9143296dd asmlinkage, x86: Fix 32bit memcpy for LTO
These functions can be called implicitely from gcc, and thus need to be
visible.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-11-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:14:46 -08:00
Andi Kleen
40747ffa5a asmlinkage: Make jiffies visible
Jiffies is referenced by the linker script, so it has to be visible.

Handled both the generic and the x86 version.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1391845930-28580-3-git-send-email-ak@linux.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-13 18:12:09 -08:00
H. Peter Anvin
4640c7ee9b x86, smap: smap_violation() is bogus if CONFIG_X86_SMAP is off
If CONFIG_X86_SMAP is disabled, smap_violation() tests for conditions
which are incorrect (as the AC flag doesn't matter), causing spurious
faults.

The dynamic disabling of SMAP (nosmap on the command line) is fine
because it disables X86_FEATURE_SMAP, therefore causing the
static_cpu_has() to return false.

Found by Fengguang Wu's test system.

[ v3: move all predicates into smap_violation() ]
[ v2: use IS_ENABLED() instead of #ifdef ]

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 08:40:52 -08:00
H. Peter Anvin
03bbd596ac x86, smap: Don't enable SMAP if CONFIG_X86_SMAP is disabled
If SMAP support is not compiled into the kernel, don't enable SMAP in
CR4 -- in fact, we should clear it, because the kernel doesn't contain
the proper STAC/CLAC instructions for SMAP support.

Found by Fengguang Wu's test system.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Link: http://lkml.kernel.org/r/20140213124550.GA30497@localhost
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org> # v3.7+
2014-02-13 07:50:25 -08:00
Bjorn Helgaas
da5d727c97 x86/PCI: Fix function definition whitespace
Consistently put the function type, name, and parameters on one line,
wrapping only as necessary.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-12 17:31:54 -07:00
Bjorn Helgaas
affbda86fe x86/PCI: Reword comments
Reword comments so they make sense.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-12 17:31:54 -07:00
Bjorn Helgaas
8928d5a66d x86/PCI: Remove unnecessary local variable initialization
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2014-02-12 17:31:54 -07:00
David Rientjes
7cf6c94591 x86, apic: Remove support for IBM Summit/EXA chipset
There should no longer be any IBM x440 systems or those using the
Summit/EXA chipset out in the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this chipset and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>
2014-02-11 18:11:13 -08:00
David Rientjes
58f5d2d448 x86, apic: Remove support for ia32-based Unisys ES7000
There should no longer be any ia32-based Unisys ES7000 systems out in
the wild, so remove support for it.

We've done our due diligence in reaching out to any contact information
listed for this system and no indication was given that it should be
kept around.

Signed-off-by: David Rientjes <rientjes@google.com>
2014-02-11 17:47:48 -08:00
Steven Rostedt (Red Hat)
87fbb2ac60 ftrace/x86: Use breakpoints for converting function graph caller
When the conversion was made to remove stop machine and use the breakpoint
logic instead, the modification of the function graph caller is still
done directly as though it was being done under stop machine.

As it is not converted via stop machine anymore, there is a possibility
that the code could be layed across cache lines and if another CPU is
accessing that function graph call when it is being updated, it could
cause a General Protection Fault.

Convert the update of the function graph caller to use the breakpoint
method as well.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: stable@vger.kernel.org # 3.5+
Fixes: 08d636b6d4 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2014-02-11 20:19:44 -05:00
Nicolas Pitre
16f8b05abe sched/idle, x86: Remove redundant cpuidle_idle_call()
The core idle loop now takes care of it.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-sh@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: Russell King <linux@arm.linux.org.uk>
Cc: linaro-kernel@lists.linaro.org
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-3ioazimg4j5iq6kdefks04i8@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-11 09:58:28 +01:00
Marek Szyprowski
c091c71ad2 x86: dma-mapping: fix GFP_ATOMIC macro usage
GFP_ATOMIC is not a single gfp flag, but a macro which expands to the other
flags, where meaningful is the LACK of __GFP_WAIT flag. To check if caller
wants to perform an atomic allocation, the code must test for a lack of the
__GFP_WAIT flag. This patch fixes the issue introduced in v3.5-rc1.

CC: stable@vger.kernel.org
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-02-11 09:40:15 +01:00
Mel Gorman
a9c8e4beee xen: properly account for _PAGE_NUMA during xen pte translations
Steven Noonan forwarded a users report where they had a problem starting
vsftpd on a Xen paravirtualized guest, with this in dmesg:

  BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
  page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
  page flags: 0x2ffc0000000014(referenced|dirty)
  addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
  CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
  Call Trace:
    dump_stack+0x45/0x56
    print_bad_pte+0x22e/0x250
    unmap_single_vma+0x583/0x890
    unmap_vmas+0x65/0x90
    exit_mmap+0xc5/0x170
    mmput+0x65/0x100
    do_exit+0x393/0x9e0
    do_group_exit+0xcc/0x140
    SyS_exit_group+0x14/0x20
    system_call_fastpath+0x1a/0x1f
  Disabling lock debugging due to kernel taint
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
  BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1

The issue could not be reproduced under an HVM instance with the same
kernel, so it appears to be exclusive to paravirtual Xen guests.  He
bisected the problem to commit 1667918b64 ("mm: numa: clear numa
hinting information on mprotect") that was also included in 3.12-stable.

The problem was related to how xen translates ptes because it was not
accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
a pteval_present helper for use by xen so both bare metal and xen use
the same code when checking if a PTE is present.

[mgorman@suse.de: wrote changelog, proposed minor modifications]
[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Steven Noonan <steven@uplinklabs.net>
Tested-by: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Elena Ufimtseva <ufimtseva@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org>	[3.12+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-10 16:01:41 -08:00
Tim Chen
ddf1d169c0 locking/mcs: Allow architecture specific asm files to be used for contended case
This patch allows each architecture to add its specific assembly optimized
arch_mcs_spin_lock_contended and arch_mcs_spinlock_uncontended for
MCS lock and unlock functions.

Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: AswinChandramouleeswaran <aswin@hp.com>
Cc: George Spelvin <linux@horizon.com>
Cc: Rik vanRiel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: MichelLespinasse <walken@google.com>
Cc: Peter Hurley <peter@hurleysoftware.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Alex Shi <alex.shi@linaro.org>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Figo.zhang" <figo1802@gmail.com>
Cc: "Paul E.McKenney" <paulmck@linux.vnet.ibm.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Waiman Long <waiman.long@hp.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew R Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1390347382.3138.67.camel@schen9-DESK
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 21:18:52 +01:00
Yinghai Lu
ba6a328f7d x86/mm: Avoid duplicated pxm_to_node() calls
In slit init code, too many pxm_to_node() function calls are done.

We can store from_node/to_node instead of keep calling
pxm_to_node().

  - Before this patch: pxm_to_node() is called n*(1+n*3) times.
  - After  this patch: pxm_to_node() is called n*(1+n) times.

for  8 sockets, it will be   72 instead of  200.
for 32 sockets, it will be 1056 instead of 3104.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: David Rientjes <rientjes@google.com>
Link: http://lkml.kernel.org/r/1390770102-4007-1-git-send-email-yinghai@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:32:31 +01:00
David Rientjes
dc9788f40a x86/apic: Always define nox2apic and define it as initdata
The "nox2apic" variable can be defined as __initdata since it is
only used for bootstrap.  It can now unconditionally be defined
since it will later be freed.

At the same time, it is also better off as a bool.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354380.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:11 +01:00
David Rientjes
6d4989835e x86/apic: Remove unused function prototypes
Some function prototypes declared in asm/apic.h are never
defined, so remove them.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354210.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:09 +01:00
David Rientjes
465822cfc8 x86/apic: Switch wait_for_init_deassert() to a bool flag
Now that there is only a single wait_for_init_deassert()
function, just convert the member of struct apic to a bool to
determine whether we need to wait for init_deassert to become
non-zero.

There are no more callers of default_wait_for_init_deassert(),
so fold it into the caller.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042354010.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:08 +01:00
David Rientjes
d3c63ae1e2 x86/apic: Only use default_wait_for_init_deassert()
es7000_wait_for_init_deassert() is functionally equivalent to
default_wait_for_init_deassert(), so remove the duplicate code
and use only a single function.

Signed-off-by: David Rientjes <rientjes@google.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1402042353030.7839@chino.kir.corp.google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:15:07 +01:00
Ville Syrjälä
c71ef7b3c3 x86/gpu: Print the Intel graphics stolen memory range
Print an informative message when reserving the graphics stolen
memory region in the early quirk.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-4-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:31 +01:00
Ville Syrjälä
a4dff76924 x86/gpu: Add Intel graphics stolen memory quirk for gen2 platforms
There isn't an explicit stolen memory base register on gen2.
Some old comment in the i915 code suggests we should get it via
max_low_pfn_mapped, but that's clearly a bad idea on my MGM.

The e820 map in said machine looks like this:

	BIOS-e820: [mem 0x0000000000000000-0x000000000009f7ff] usable
	BIOS-e820: [mem 0x000000000009f800-0x000000000009ffff] reserved
	BIOS-e820: [mem 0x00000000000ce000-0x00000000000cffff] reserved
	BIOS-e820: [mem 0x00000000000dc000-0x00000000000fffff] reserved
	BIOS-e820: [mem 0x0000000000100000-0x000000001f6effff] usable
	BIOS-e820: [mem 0x000000001f6f0000-0x000000001f6f7fff] ACPI data
	BIOS-e820: [mem 0x000000001f6f8000-0x000000001f6fffff] ACPI NVS
	BIOS-e820: [mem 0x000000001f700000-0x000000001fffffff] reserved
	BIOS-e820: [mem 0x00000000fec10000-0x00000000fec1ffff] reserved
	BIOS-e820: [mem 0x00000000ffb00000-0x00000000ffbfffff] reserved
	BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved

That makes max_low_pfn_mapped = 1f6f0000, so assuming our stolen
memory would start there would place it on top of some ACPI
memory regions. So not a good idea as already stated.

The 9MB region after the ACPI regions at 0x1f700000 however
looks promising given that the macine reports the stolen memory
size to be 8MB. Looking at the PGTBL_CTL register, the GTT
entries are at offset 0x1fee00000, and given that the GTT
entries occupy 128KB, it looks like the stolen memory could
start at 0x1f700000 and the GTT entries would occupy the last
128KB of the stolen memory.

After some more digging through chipset documentation, I've
determined the BIOS first allocates space for something called
TSEG (something to do with SMM) from the top of memory, and then
it allocates the graphics stolen memory below that. Accordind to
the chipset documentation TSEG has a fixed size of 1MB on 855.
So that explains the top 1MB in the e820 region. And it also
confirms that the GTT entries are in fact at the end of the the
stolen memory region.

Derive the stolen memory base address on gen2 the same as the
BIOS does (TOM-TSEG_SIZE-stolen_size). There are a few
differences between the registers on various gen2 chipsets, so a
few different codepaths are required.

865G is again bit more special since it seems to support enough
memory to hit 4GB address space issues. This means the PCI
allocations will also affect the location of the stolen memory.
Fortunately there appears to be the TOUD register which may give
us the correct answer directly. But the chipset docs are a bit
unclear, so I'm not 100% sure that the graphics stolen memory is
always the last thing the BIOS steals. Someone would need to
verify it on a real system.

I tested this on the my 830 and 855 machines, and so far
everything looks peachy.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-3-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:30 +01:00
Ville Syrjälä
52ca70454e x86/gpu: Add vfunc for Intel graphics stolen memory base address
For gen2 devices we're going to need another way to determine
the stolen memory base address. Make that into a vfunc as well.

Also drop the bogus inline keyword from gen8_stolen_size().

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Link: http://lkml.kernel.org/r/1391628540-23072-2-git-send-email-ville.syrjala@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 15:11:30 +01:00
Don Zickus
90ed5b0fa5 perf/x86/p4: Block PMIs on init to prevent a stream of unkown NMIs
A bunch of unknown NMIs have popped up on a Pentium4 recently when booting
into a kdump kernel.  This was exposed because the watchdog timer went
from 60 seconds down to 10 seconds (increasing the ability to reproduce
this problem).

What is happening is on boot up of the second kernel (the kdump one),
the previous nmi_watchdogs were enabled on thread 0 and thread 1.  The
second kernel only initializes one cpu but the perf counter on thread 1
still counts.

Normally in a kdump scenario, the other cpus are blocking in an NMI loop,
but more importantly their local apics have the performance counters disabled
(iow LVTPC is masked).  So any counters that fire are masked and never get
through to the second kernel.

However, on a P4 the local apic is shared by both threads and thread1's PMI
(despite being configured to only interrupt thread1) will generate an NMI on
thread0.  Because thread0 knows nothing about this NMI, it is seen as an
unknown NMI.

This would be fine because it is a kdump kernel, strange things happen
what is the big deal about a single unknown NMI.

Unfortunately, the P4 comes with another quirk: clearing the overflow bit
to prevent a stream of NMIs.  This is the problem.

The kdump kernel can not execute because of the endless NMIs that happen.

To solve this, I instrumented the p4 perf init code, to walk all the counters
and zero them out (just like a normal reset would).

Now when the counters go off, they do not generate anything and no unknown
NMIs are seen.

I tested this on a P4 we have in our lab.  After two or three crashes, I could
normally reproduce the problem.  Now after 10 crashes, everything continues
to boot correctly.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140120154115.GZ25953@redhat.com
[ Fixed a stylistic detail. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:20:35 +01:00
Don Zickus
13beacee81 perf/x86/p4: Fix counter corruption when using lots of perf groups
On a P4 box stressing perf with:

   ./perf record -o perf.data ./perf stat -v ./perf bench all

it was noticed that a slew of unknown NMIs would pop out rather quickly.

Painfully debugging this ancient platform, led me to notice cross cpu counter
corruption.

The P4 machine is special in that it has 18 counters, half are used for cpu0
and the other half is for cpu1 (or all 18 if hyperthreading is disabled).  But
the splitting of the counters has to be actively managed by the software.

In this particular bug, one of the cpu0 specific counters was being used by
cpu1 and caused all sorts of random unknown nmis.

I am not entirely sure on the corruption path, but what happens is:

 o perf schedules a group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   but for a different cpu, so it 'swaps' the config bits and returns the
   updated 'assign' array with a _new_ index.
 o perf schedules another group with p4_pmu_schedule_events()
 o inside p4_pmu_schedule_events(), it notices an hwc pointer is being reused
   (the same one as above) but for the _same_ cpu [BUG!!], so it updates the
   'assign' array to use the _old_ (wrong cpu) index because the _new_ index is in
   an earlier part of the 'assign' array (and hasn't been committed yet).
 o perf commits the transaction using the wrong index and corrupts the other cpu

The [BUG!!] is because the 'hwc->config' is updated but not the 'hwc->idx'.  So
the check for 'p4_should_swap_ts()' is correct the first time around but
incorrect the second time around (because hwc->config was updated in between).

I think the spirit of perf was to not modify anything until all the
transactions had a chance to 'test' if they would succeed, and if so, commit
atomically.  However, P4 breaks this spirit by touching the hwc->config
element.

So my fix is to continue the un-perf like breakage, by assigning hwc->idx to -1
on swap to tell follow up group scheduling to find a new index.

Of course if the transaction fails rolling this back will be difficult, but
that is not different than how the current code works. :-)  And I wasn't sure
how much effort to cleanup the code I should do for a platform that is almost
10 years old by now.

Hence the lazy fix.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1391024270-19469-1-git-send-email-dzickus@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:23 +01:00
Peter Zijlstra
e90c785352 x86/nmi: Push duration printk() to irq context
Calling printk() from NMI context is bad (TM), so move it to IRQ
context.

In doing so we slightly change (probably wreck) the debugfs
nmi_longest_ns thingy, in that it doesn't update to reflect the
longest, nor does writing to it reset the count.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Link: http://lkml.kernel.org/n/tip-rdw0au56a5ymis1u8p48c12d@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:17:22 +01:00
Ingo Molnar
3c3d7cb1db Merge branch 'linus' into perf/core
Refresh the branch to a v3.14-rc base before queueing up new devel patches.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:13:45 +01:00
Steven Rostedt
569d6557ab x86: Use preempt_disable_notrace() in cycles_2_ns()
When debug preempt is enabled, preempt_disable() can be traced by
function and function graph tracing.

There's a place in the function graph tracer that calls trace_clock()
which eventually calls cycles_2_ns() outside of the recursion
protection. When cycles_2_ns() calls preempt_disable() it gets traced
and the graph tracer will go into a recursive loop causing a crash or
worse, a triple fault.

Simple fix is to use preempt_disable_notrace() in cycles_2_ns, which
makes sense because the preempt_disable() tracing may use that code
too, and it tracing it, even with recursion protection is rather
pointless.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140204141315.2a968a72@gandalf.local.home
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:09:08 +01:00
Peter Zijlstra
0e9f2204cf perf/x86: Fix Userspace RDPMC switch
The current code forgets to change the CR4 state on the current CPU.
Use on_each_cpu() instead of smp_call_function().

Reported-by: Mark Davies <junk@eslaf.co.uk>
Suggested-by: Mark Davies <junk@eslaf.co.uk>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
Link: http://lkml.kernel.org/n/tip-69efsat90ibhnd577zy3z9gh@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:25 +01:00
Peter Zijlstra
e97df76377 perf/x86/intel/p6: Add userspace RDPMC quirk for PPro
PPro machines can die hard when PCE gets enabled due to a CPU erratum.
The safe way it so disable it by default and keep it disabled.

See erratum 26 in:

  http://download.intel.com/design/archives/processors/pro/docs/24268935.pdf

Reported-and-Tested-by: Mark Davies <junk@eslaf.co.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Stephane Eranian <eranian@google.com>
Cc: Vince Weaver <vince@deater.net>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140206170815.GW2936@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-02-09 13:08:24 +01:00
Linus Torvalds
c1ff84317f Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Peter Anvin:
 "Quite a varied little collection of fixes.  Most of them are
  relatively small or isolated; the biggest one is Mel Gorman's fixes
  for TLB range flushing.

  A couple of AMD-related fixes (including not crashing when given an
  invalid microcode image) and fix a crash when compiled with gcov"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, microcode, AMD: Unify valid container checks
  x86, hweight: Fix BUG when booting with CONFIG_GCOV_PROFILE_ALL=y
  x86/efi: Allow mapping BGRT on x86-32
  x86: Fix the initialization of physnode_map
  x86, cpu hotplug: Fix stack frame warning in check_irq_vectors_for_cpu_disable()
  x86/intel/mid: Fix X86_INTEL_MID dependencies
  arch/x86/mm/srat: Skip NUMA_NO_NODE while parsing SLIT
  mm, x86: Revisit tlb_flushall_shift tuning for page flushes except on IvyBridge
  x86: mm: change tlb_flushall_shift for IvyBridge
  x86/mm: Eliminate redundant page table walk during TLB range flushing
  x86/mm: Clean up inconsistencies when flushing TLB ranges
  mm, x86: Account for TLB flushes only when debugging
  x86/AMD/NB: Fix amd_set_subcaches() parameter type
  x86/quirks: Add workaround for AMD F16h Erratum792
  x86, doc, kconfig: Fix dud URL for Microcode data
2014-02-08 11:54:43 -08:00
H. Peter Anvin
a3b072cd18 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJS9P2pAAoJEC84WcCNIz1VABsP/j0fwiWdaoEn/9p9EUQcBcMn
 CILuiKI8GoBR+0nH7vm/jSgUKfUxzM6T0+qjoZPVkO/u/qup+JCT5JxoywUyEbfb
 +wPNIhBgKSwJdWXPd4ZvObA9jknrP6r5sJTsixpESVpVWmcC8egPgkyBFILjokKS
 BCJqqruHZcXfWaDQTocYErl3T217J7RfnCG1o7yP96g4jteX8EdvIkPjEf2mM83I
 GXacHLy8IhGhdyPKSXcRqZ0ivhZ2WX1iFQYIY19FhmzBs364WulyXve+oV+l/4h4
 Kx7ks4Ob5AlUIizchGNwVz7058F9o/v7m0CezTgi4Q/RrZi34samxbjk/95/Xc60
 JSvWkekm3/jOODubad5zj+ZnJmG83/ZlCUuTaqsE47ftbaedSUHBN9QSpm8iHEex
 n3d4J3AdP/3amcP33kZ5MRALDYIFKb4ZxtDkADqDcXhS56COivGAdZe5hnyCpb/9
 RPUDXTOlxfJQK/y2Atcdb400JjJ/Yr9Kew81LRIt0UMZU3dKSh05UZ+a7Ym0yCkt
 3k0NNkgsFCZbYTO/Z3aPDcprwU5Lq9UrwjB17U2ev/qK+qRYDzCzSR0XGPrLMRv7
 C5Bnov6uCn/0ZG/NlAx8UXK9wdWDsLhp1QkBz+daX3sGwRAS+OiKBv6+l8dqsdOc
 1L8PMkTX2rgtELiv4PJ/
 =hLCC
 -----END PGP SIGNATURE-----

Merge tag 'efi-urgent' into x86/urgent

 * Avoid WARN_ON() when mapping BGRT on Baytrail (EFI 32-bit).

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-02-07 11:27:30 -08:00
Tang Chen
7bc35fdde6 arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
The following path will cause array out of bound.

memblock_add_region() will always set nid in memblock.reserved to
MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
correct valus in memblock.reserved, we called setup_node_data(), and
used memblock_alloc_nid() to allocate memory, with nid set to
MAX_NUMNODES.

The nodemask_t type can be seen as a bit array.  And the index is 0 ~
MAX_NUMNODES-1.

After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
MAX_NUMNODES-1].

See below:

numa_init()
 |---> numa_register_memblks()
 |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
 |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
 |      |......
 |      |---> setup_node_data()
 |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
 |......
 |---> numa_clear_kernel_node_hotplug()
        |---> node_set()			here, we have an index 1024, and overflowed

This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
this problem.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Tang Chen
017c217a26 arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
was not initialized.  So we need to initialize it.

[akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com>
Tested-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Reported-by: Dave Jones <davej@redhat.com>
Reported-by: David Rientjes <rientjes@google.com>
Tested-by: Dave Jones <davej@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-06 13:48:51 -08:00
Borislav Petkov
75a1ba5b2c x86, microcode, AMD: Unify valid container checks
For additional coverage, BorisO and friends unknowlingly did swap AMD
microcode with Intel microcode blobs in order to see what happens. What
did happen on 32-bit was

[    5.722656] BUG: unable to handle kernel paging request at be3a6008
[    5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0
[    5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000

because there was a valid initrd there but without valid microcode in it
and the container check happened *after* the relocated ramdisk handling
on 32-bit, which was clearly wrong.

While at it, take care of the ramdisk relocation on both 32- and 64-bit
as it is done on both. Also, comment what we're doing because this code
is a bit tricky.

Reported-and-tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-02-06 11:11:19 -08:00
Linus Torvalds
1cd731df09 Bug-fixes:
- Revert "xen/grant-table: Avoid m2p_override during mapping" as it broke Xen ARM build.
  - Fix CR4 not being set on AP processors in Xen PVH mode.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJS8AyQAAoJEFjIrFwIi8fJbD4IAJssMuaLI5CRsSWBgDFHHDFt
 srVJpDOYQiDr/TxkwFCVcL4sFy9Htb3KMArU4eIBl6uMqQbGa+3rHyXcHYI219YY
 XH3D8RG+9JChwsxtaeUEzwx1C8ehcygD34vtdcoQXa7eBuEi4TL3HeLifR+HrXKO
 UdFrTA34FmvpVFbSuRXkZh5sd6ca9et9xHuQHM8SIY6pVokY6xaEYOp17tfPZpwM
 7A6LFjUjXeugHC2L3+/H8UOHA9nSZQvnMiZOWq2Cusc2Dt2V7emzgk2wcc2CHttf
 EA6GbtiJzHqMPmt5EjubI9hHdSMB31HpY4hnQE38+ucl+BwiSdRE9z2Rm4TYClg=
 =IX4M
 -----END PGP SIGNATURE-----

Merge tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull Xen fixes from Konrad Rzeszutek Wilk:
 "Bug-fixes:
   - Revert "xen/grant-table: Avoid m2p_override during mapping" as it
     broke Xen ARM build.
   - Fix CR4 not being set on AP processors in Xen PVH mode"

* tag 'stable/for-linus-3.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen/pvh: set CR4 flags for APs
  Revert "xen/grant-table: Avoid m2p_override during mapping"
2014-02-05 16:01:11 -08:00
Matt Fleming
081cd62a01 x86/efi: Allow mapping BGRT on x86-32
CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
map (see commit 700870119f ("x86, efi: Don't map Boot Services on
i386")), and so efi_lookup_mapped_addr() will fail to return a valid
address. Executing the ioremap() path in efi_bgrt_init() causes the
following warning on x86-32 because we're trying to ioremap() RAM,

 WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
 Modules linked in:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
 Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
  00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
  00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
  00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
 Call Trace:
  [<c09a5196>] dump_stack+0x41/0x52
  [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
  [<c0448ce2>] warn_slowpath_null+0x22/0x30
  [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
  [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
  [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
  [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
  [<c043bc2b>] ioremap_nocache+0x1b/0x20
  [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
  [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
  [<c0cafd82>] efi_late_init+0x8/0xa
  [<c0c9bab2>] start_kernel+0x3ae/0x3c3
  [<c0c9b53b>] ? repair_env_string+0x51/0x51
  [<c0c9b378>] i386_start_kernel+0x12e/0x131

Switch to using early_memremap(), which won't trigger this warning, and
has the added benefit of more accurately conveying what we're trying to
do - map a chunk of memory.

This patch addresses the following bug report,

  https://bugzilla.kernel.org/show_bug.cgi?id=67911

Reported-by: Adam Williamson <awilliam@redhat.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-02-05 23:39:34 +00:00
Ingo Molnar
f8f2023482 x86: Disable CONFIG_X86_DECODER_SELFTEST in allmod/allyesconfigs
It can take some time to validate the image, make sure
{allyes|allmod}config doesn't enable it.

I'd say randconfig will cover it often enough, and the failure is also
borderline build coverage related: you cannot really make the decoder
test fail via source level changes, only with changes in the build
environment, so I agree with Andi that we can disable this one too.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Paul Gortmaker paul.gortmaker@windriver.com>
Suggested-and-acked-by: Andi Kleen andi@firstfloor.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-05 14:10:30 -08:00
Borislav Petkov
b399fe355b x86: Disable generation of traditional x87 instructions
We recently had the case where wrongly used floating-constant 'E' caused
the generation of traditional x87 instructions in kernel code and
wreaking all kinds of havoc.

Disable the generation of those too. This will save people a lot of time
when trying to debug such issues by erroring out of the build instead of
let them manifest themselves in very spectacular and happy-crappy ways
at runtime.

We're using -mno-fp-ret-in-387 in addition to -mno-80387 (which is ==
-msoft-float) because, as the gcc manpage says:

  On machines where a function returns floating-point results in the
  80387 register stack, some floating-point opcodes may be emitted even
  if -msoft-float is used.

so we want to turn off *all* non-integer instructions involving any
architectural FPU state, unless it is absolutely necessary (and those
cases need special handling anyway).

Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Michael Matz <matz@suse.de>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391561711-3023-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2014-02-04 20:00:35 -08:00
Paolo Bonzini
f244d910ea Two new features are added by this patch set:
- The floating interrupt controller (flic) that allows us to inject,
   clear and inspect non-vcpu local interrupts. This also gives us an
   opportunity to fix deficiencies in our existing interrupt definitions.
 - Support for asynchronous page faults via the pfault mechanism. Testing
   show significant guest performance improvements under host swap.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJS6kZMAAoJEBF7vIC1phx8OOcP/jbMTi3pbCPGfyL4tCyyIL6d
 FYmoVqJMg1lTxLzhdqhli81ahT683I0/jrH17oz1O+PdgxP9annZnNfNs1bO8YzG
 F4XvmrgQ/JnJLl2xKq0MVPOUbLivMiVkCemhPnzg39JKJDTGPSKBLBTEso+dMVKb
 5qxo8UDijJ/TJS4CLYG3AcNWIMtMBlQWCmFMTvVUmaDe8iFEeMoWZRgBJ3X43Hxj
 uQr6zBvG7zd4+IgfrXGaExLJw9EsLIs8MwkqTvYbuDdxoU/jjyvo3xSR5T+3u3fp
 raK2RHirfb31XWNC33hTm4VvTHxBQ0gjUZgWb8AZe8UnhpHgnMS1QWGbU9Tj3RKm
 d5p+AWMT2l+2TzKQRp85NSJCySnQQIgI9Vh3A6MTAsanRpLncpHzOVaBvYKmphWV
 RG94pjxa3NNDvZ3m8V2htWKArlB+rr1S0nz7R5IHM/1IwmPj5VfianhfT3qmjexC
 0CvPlS3FXTbSiTC8xnGET4wvKNUBMThq6alP2/MArqM+28NjDpwOG8/hDpAAIksL
 COho3/csBHH5mpxI0odkfoyIJLDDdIPOWE3kIQBKkg/Qt3nPUcPOKamKTsP7s7RA
 T3K1Kz6bljkvY1eDeJ2pWB85XPGxSig2UVxSdTs+ywQ5UcQnEF3aN5dFKjewk8Xv
 bUgdLiW7ABkj+rCOGIJ8
 =z8I9
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140130' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

Two new features are added by this patch set:
- The floating interrupt controller (flic) that allows us to inject,
  clear and inspect non-vcpu local interrupts. This also gives us an
  opportunity to fix deficiencies in our existing interrupt definitions.
- Support for asynchronous page faults via the pfault mechanism. Testing
  show significant guest performance improvements under host swap.
2014-02-04 04:23:37 +01:00
Marcelo Tosatti
4f34d683e5 KVM: x86: remove unused last_kernel_ns variable
Remove unused last_kernel_ns variable.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-02-04 04:20:42 +01:00
Mukesh Rathor
afca50132c xen/pvh: set CR4 flags for APs
During bootup in the 'probe_page_size_mask' these CR4 flags are
set in there. But for AP processors they are not set as we do not
use 'secondary_startup_64' which the baremetal kernels uses.
Instead do it in this function which we use in Xen PVH during our
startup for AP processors.

As such fix it up to make sure we have that flag set.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2014-02-03 15:44:18 -05:00
Bjorn Helgaas
ab6ffce35b x86/PCI: Remove acpi_get_pxm() usage
The PCI host bridge code doesn't care about _PXM values directly; it only
needs to know what NUMA node the hardware is on.

This uses acpi_get_node() directly and removes the _PXM stuff.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:39:03 -07:00
Bjorn Helgaas
8a3d01c740 x86/PCI: Use NUMA_NO_NODE, not -1, for unknown node
NUMA_NO_NODE is the usual value for "we don't know what node this is on,"
e.g., it is the error return from acpi_get_node().  This changes uses of -1
to NUMA_NO_NODE.  NUMA_NO_NODE is #defined to be -1 already, so this is not
a functional change.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:57 -07:00
Bjorn Helgaas
3323ab8f7a x86/PCI: Remove unnecessary list_empty(&pci_root_infos) check
list_for_each_entry() handles empty lists, so there's no need to check
whether the list is empty first.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:52 -07:00
Bjorn Helgaas
25453e9e52 x86/PCI: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node()
There are no callers of get_mp_bus_to_node(), so we no longer need
mp_bus_to_node[], get_mp_bus_to_node(), or set_mp_bus_to_node().
This removes them.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2014-02-03 10:38:46 -07:00