* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf probe: Fix to support libdwfl older than 0.148
perf tools: Fix lazy wildcard matching
perf buildid-list: Fix error return for success
perf buildid-cache: Fix symbolic link handling
perf symbols: Stop using vmlinux files with no symbols
perf probe: Fix use of kernel image path given by 'k' option
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, kexec: Limit the crashkernel address appropriately
NUMA boot code assumes that physical node ids start at 0, but the DIMMs
that the apic id represents may not be reachable. If this is the case,
node 0 is never online and cpus never end up getting appropriately
assigned to a node. This causes the cpumask of all online nodes to be
empty and machines crash with kernel code assuming online nodes have
valid cpus.
The fix is to appropriately map all the address ranges for physical nodes
and ensure the cpu to node mapping function checks all possible nodes (up
to MAX_NUMNODES) instead of simply checking nodes 0-N, where N is the
number of physical nodes, for valid address ranges.
This requires no longer "compressing" the address ranges of nodes in the
physical node map from 0-N, but rather leave indices in physnodes[] to
represent the actual node id of the physical node. Accordingly, the
topology exported by both amd_get_nodes() and acpi_get_nodes() no longer
must return the number of nodes to iterate through; all such iterations
will now be to MAX_NUMNODES.
This change also passes the end address of system RAM (which may be
different from normal operation if mem= is specified on the command line)
before the physnodes[] array is populated. ACPI parsed nodes are
truncated to fit within the address range that respect the mem=
boundaries and even some physical nodes may become unreachable in such
cases.
When NUMA emulation does succeed, any apicid to node mapping that exists
for unreachable nodes are given default values so that proximity domains
can still be assigned. This is important for node_distance() to
function as desired.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1012221702090.3701@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
It's necessary to fake the node-to-cpumask mapping so that an emulated
node ID returns a cpumask that includes all cpus that have affinity to
the memory it represents.
This is a little intrusive because it requires knowledge of the physical
topology of the system. setup_physnodes() gives us that information, but
since NUMA emulation ends up altering the physnodes array, it's necessary
to reset it before cpus are brought online.
Accordingly, the physnodes array is moved out of init.data and into
cpuinit.data since it will be needed on cpuup callbacks.
This works regardless of whether numa=fake is used on the command line,
or the setup of the fake node succeeds or fails. The physnodes array
always contains the physical topology of the machine if CONFIG_NUMA_EMU
is enabled and can be used to setup the correct node-to-cpumask mappings
in all cases since setup_physnodes() is called whenever the array needs
to be repopulated with the correct data.
To fake the actual mappings, numa_add_cpu() and numa_remove_cpu() are
rewritten for CONFIG_NUMA_EMU so that we first find the physical node to
which each cpu has local affinity, then iterate through all online nodes
to find the emulated nodes that have local affinity to that physical
node, and then finally map the cpu to each of those emulated nodes.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1012221701520.3701@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch adds the equivalent of acpi_fake_nodes() for AMD Northbridge
platforms. The goal is to fake the apicid-to-node mappings for NUMA
emulation so the physical topology of the machine is correctly maintained
within the kernel.
This change also fakes proximity domains for both ACPI and k8 code so the
physical distance between emulated nodes is maintained via
node_distance(). This exports the correct distances via
/sys/devices/system/node/.../distance based on the underlying topology.
A new helper function, fake_physnodes(), is introduced to correctly
invoke the correct NUMA code to fake these two mappings based on the
system type. If there is no underlying NUMA configuration, all cpus are
mapped to node 0 for local distance.
Since acpi_fake_nodes() is no longer called with CONFIG_ACPI_NUMA, it's
prototype can be removed from the header file for such a configuration.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1012221701360.3701@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Both acpi_get_nodes() and amd_get_nodes() are only necessary when
CONFIG_NUMA_EMU is enabled, so avoid compiling them when the option is
disabled.
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1012221701210.3701@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This patch changes the minimum fake node size from 64MB to 32MB so it is
possible to test NUMA code at a greater scale on smaller machines
(64 nodes on a 2G machine, 1024 nodes on 32G machine with
CONFIG_NODES_SHIFT=10).
Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1012221700590.3701@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Recent Intel new system have different order in MADT, aka will list all thread0
at first, then all thread1.
But SRAT table still old order, it will list cpus in one socket all together.
If the user have compiled limited NR_CPUS or boot with nr_cpus=, could have missed
to put some cpus apic id to node mapping into apicid_to_node[].
for example for 4 sockets system with 64 cpus with nr_cpus=32 will get crash...
[ 9.106288] Total of 32 processors activated (136190.88 BogoMIPS).
[ 9.235021] divide error: 0000 [#1] SMP
[ 9.235315] last sysfs file:
[ 9.235481] CPU 1
[ 9.235592] Modules linked in:
[ 9.245398]
[ 9.245478] Pid: 2, comm: kthreadd Not tainted 2.6.37-rc1-tip-yh-01782-ge92ef79-dirty #274 /Sun Fire x4800
[ 9.265415] RIP: 0010:[<ffffffff81075a8f>] [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
...
[ 9.645938] RIP [<ffffffff81075a8f>] select_task_rq_fair+0x4f0/0x623
[ 9.665356] RSP <ffff88103f8d1c40>
[ 9.665568] ---[ end trace 2296156d35fdfc87 ]---
So let just parse all cpu entries in SRAT.
Also add apicid checking with MAX_LOCAL_APIC, in case We could out of boundaries of
apicid_to_node[].
it fixes following bug too.
https://bugzilla.kernel.org/show_bug.cgi?id=22662
-v2: expand to 32bit according to hpa
need to add MAX_LOCAL_APIC for 32bit
Reported-and-Tested-by: Wu Fengguang <fengguang.wu@intel.com>
Reported-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Tested-by: Myron Stowe <myron.stowe@hp.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD486.9020704@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
We should use MAX_LOCAL_APIC for max apic ids and MAX_APICS as number
of local apics.
Also apic_version[] array should use MAX_LOCAL_APICs.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD464.2020408@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The x86 arch has shifted its use of the nmi_watchdog from a
local implementation to the global one provide by
kernel/watchdog.c. This shift has caused a whole bunch of
compile problems under different config options. I attempt to
simplify things with the patch below.
In order to simplify things, I had to come to terms with the
meaning of two terms ARCH_HAS_NMI_WATCHDOG and
CONFIG_HARDLOCKUP_DETECTOR. Basically they mean the same thing,
the former on a local level and the latter on a global level.
With the old x86 nmi watchdog gone, there is no need to rely on
defining the ARCH_HAS_NMI_WATCHDOG variable because it doesn't
make sense any more. x86 will now use the global
implementation.
The changes below do a few things. First it changes the few
places that relied on ARCH_HAS_NMI_WATCHDOG to use
CONFIG_X86_LOCAL_APIC (the former was an alias for the latter
anyway, so nothing unusual here). Those pieces of code were
relying more on local apic functionality the nmi watchdog
functionality, so the change should make sense.
Second, I removed the x86 implementation of
touch_nmi_watchdog(). It isn't need now, instead x86 will rely
on kernel/watchdog.c's implementation.
Third, I removed the #define ARCH_HAS_NMI_WATCHDOG itself from
x86. And tweaked the include/linux/nmi.h file to tell users to
look for an externally defined touch_nmi_watchdog in the case of
ARCH_HAS_NMI_WATCHDOG _or_ CONFIG_HARDLOCKUP_DETECTOR. This
changes removes some of the ugliness in that file.
Finally, I added a Kconfig dependency for
CONFIG_HARDLOCKUP_DETECTOR that said you can't have
ARCH_HAS_NMI_WATCHDOG _and_ CONFIG_HARDLOCKUP_DETECTOR. You can
only have one nmi_watchdog.
Tested with
ARCH=i386: allnoconfig, defconfig, allyesconfig, (various broken
configs) ARCH=x86_64: allnoconfig, defconfig, allyesconfig,
(various broken configs)
Hopefully, after this patch I won't get any more compile broken
emails. :-)
v3:
changed a couple of 'linux/nmi.h' -> 'asm/nmi.h' to pick-up correct function
prototypes when CONFIG_HARDLOCKUP_DETECTOR is not set.
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1293044403-14117-1-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Conflicts:
MAINTAINERS
arch/arm/mach-omap2/pm24xx.c
drivers/scsi/bfa/bfa_fcpim.c
Needed to update to apply fixes for which the old branch was too
outdated.
UV systems can be partitioned into multiple independent SSIs.
Large partitioned systems may have extra bits in the node_id
register. These bits are used when the total memory on all SSIs
exceeds 16TB. These extra bits need to be ignored when
calculating x2apic_extra_bits.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.972776133@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Early in boot, reading MMRs from the UV hub controller require
calls to early_ioremap()/early_iounmap(). Rather than
duplicating code, add a common function to do the
map/read/unmap.
Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20101130195926.834804371@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure we can map all of lowmem if we need to
x86, vt-d: Handle previous faults after enabling fault handling
x86: Enable the intr-remap fault handling after local APIC setup
x86, vt-d: Fix the vt-d fault handling irq migration in the x2apic mode
x86, vt-d: Quirk for masking vtd spec errors to platform error handling logic
x86, xsave: Use alloc_bootmem_align() instead of alloc_bootmem()
bootmem: Add alloc_bootmem_align()
x86, gcc-4.6: Use gcc -m options when building vdso
x86: HPET: Chose a paranoid safe value for the ETIME check
x86: io_apic: Avoid unused variable warning when CONFIG_GENERIC_PENDING_IRQ=n
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
perf: Fix off by one in perf_swevent_init()
perf: Fix duplicate events with multiple-pmu vs software events
ftrace: Have recordmcount honor endianness in fn_ELF_R_INFO
scripts/tags.sh: Add magic for trace-events
tracing: Fix panic when lseek() called on "trace" opened for writing
This patch adds support for up to 6 hardware counters for AMD family
15h cpus. There is a new MSR range for hardware counters beginning at
MSRC001_0200 Performance Event Select (PERF_CTL0).
Signed-off-by: Robert Richter <robert.richter@amd.com>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
x86: avoid high BIOS area when allocating address space
x86: avoid E820 regions when allocating address space
x86: avoid low BIOS area when allocating address space
resources: add arch hook for preventing allocation in reserved areas
Revert "resources: support allocating space within a region from the top down"
Revert "PCI: allocate bus resources from the top down"
Revert "x86/PCI: allocate space from the end of a region, not the beginning"
Revert "x86: allocate space within a region top-down"
Revert "PCI: fix pci_bus_alloc_resource() hang, prefer positive decode"
PCI: Update MCP55 quirk to not affect non HyperTransport variants
Use cmpxchg instead of xchg to realize this_cpu_xchg.
xchg will cause LOCK overhead since LOCK is always implied but cmpxchg
will not.
Baselines:
xchg() = 18 cycles (no segment prefix, LOCK semantics)
__this_cpu_xchg = 1 cycle
(simulated using this_cpu_read/write, two prefixes. Looks like the
cpu can use loop optimization to get rid of most of the overhead)
Cycles before:
this_cpu_xchg = 37 cycles (segment prefix and LOCK (implied by xchg))
After:
this_cpu_xchg = 11 cycle (using cmpxchg without lock semantics)
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Provide support as far as the hardware capabilities of the x86 cpus
allow.
Define CONFIG_CMPXCHG_LOCAL in Kconfig.cpu to allow core code to test for
fast cpuops implementations.
V1->V2:
- Take out the definition for this_cpu_cmpxchg_8 and move it into
a separate patch.
tj: - Reordered ops to better follow this_cpu_* organization.
- Renamed macro temp variables similar to their existing
neighbours.
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Keep the crash kernel address below 512 MiB for 32 bits and 896 MiB
for 64 bits. For 32 bits, this retains compatibility with earlier
kernel releases, and makes it work even if the vmalloc= setting is
adjusted.
For 64 bits, we should be able to increase this substantially once a
hard-coded limit in kexec-tools is fixed.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Stanislaw Gruszka <sgruszka@redhat.com>
Cc: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20101217195035.GE14502@redhat.com>
This prevents allocation of the last 2MB before 4GB.
The experiment described here shows Windows 7 ignoring the last 1MB:
https://bugzilla.kernel.org/show_bug.cgi?id=23542#c27
This patch ignores the top 2MB instead of just 1MB because H. Peter Anvin
says "There will be ROM at the top of the 32-bit address space; it's a fact
of the architecture, and on at least older systems it was common to have a
shadow 1 MiB below."
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
When we allocate address space, e.g., to assign it to a PCI device, don't
allocate anything mentioned in the BIOS E820 memory map.
On recent machines (2008 and newer), we assign PCI resources from the
windows described by the ACPI PCI host bridge _CRS. On many Dell
machines, these windows overlap some E820 reserved areas, e.g.,
BIOS-e820: 00000000bfe4dc00 - 00000000c0000000 (reserved)
pci_root PNP0A03:00: host bridge window [mem 0xbff00000-0xdfffffff]
If we put devices at 0xbff00000, they don't work, probably because
that's really RAM, not I/O memory. This patch prevents that by removing
the 0xbfe4dc00-0xbfffffff area from the "available" resource.
I'm not very happy with this solution because Windows solves the problem
differently (it seems to ignore E820 reserved areas and it allocates
top-down instead of bottom-up; details at comment 45 of the bugzilla
below). That means we're vulnerable to BIOS defects that Windows would not
trip over. For example, if BIOS described a device in ACPI but didn't
mention it in E820, Windows would work fine but Linux would fail.
Reference: https://bugzilla.kernel.org/show_bug.cgi?id=16228
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
This implements arch_remove_reservations() so allocate_resource() can
avoid any arch-specific reserved areas. This currently just avoids the
BIOS area (the first 1MB), but could be used for E820 reserved areas if
that turns out to be necessary.
We previously avoided this area in pcibios_align_resource(). This patch
moves the test from that PCI-specific path to a generic path, so *all*
resource allocations will avoid this area.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* 'kvm-updates/2.6.37' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Fix preemption counter leak in kvm_timer_init()
KVM: enlarge number of possible CPUID leaves
KVM: SVM: Do not report xsave in supported cpuid
KVM: Fix OSXSAVE after migration
- include/linux/percpu.h: this_cpu_add_return() and friends were
located next to __this_cpu_add_return(). However, the overall
organization is to first group by preemption safeness. Relocate
this_cpu_add_return() and friends to preemption-safe area.
- arch/x86/include/asm/percpu.h: Relocate percpu_add_return_op() after
other more basic operations. Relocate [__]this_cpu_add_return_8()
so that they're first grouped by preemption safeness.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Supply an implementation for x86 in order to generate more efficient code.
V2->V3:
- Cleanup
- Remove strange type checking from percpu_add_return_op.
tj: - Dropped unused typedef from percpu_add_return_op().
- Renamed ret__ to paro_ret__ in percpu_add_return_op().
- Minor indentation adjustments.
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use this_cpu_ops to reduce code size and simplify things in various places.
V3->V4:
Move instance of this_cpu_inc_return to a later patchset so that
this patch can be applied without infrastructure changes.
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Use this_cpu ops in various places to optimize per cpu data access.
Cc: Jason Baron <jbaron@redhat.com>
Cc: Namhyung Kim <namhyung@gmail.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
A relocatable kernel can be anywhere in lowmem -- and in the case of a
kdump kernel, is likely to be fairly high. Since the early page
tables map everything from address zero up we need to make sure we
allocate enough brk that we can map all of lowmem if we need to.
Reported-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Tested-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4D0AD3ED.8070607@kernel.org>
Extend the perf_pmu_register() interface to allow for named and
dynamic pmu types.
Because we need to support the existing static types we cannot use
dynamic types for everything, hence provide a type argument.
If we want to enumerate the PMUs they need a name, provide one.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20101117222056.259707703@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Some BIOSes use PMU resources, which can cause various bugs:
- Non-working or erratic PMU based statistics - the PMU can end up
counting the wrong thing, resulting in misleading statistics
- Profiling can stop working or it can profile the wrong thing
- A non-working or erratic NMI watchdog that cannot be relied on
- The kernel may disturb whatever thing the BIOS tries to use the
PMU for - possibly causing hardware malfunction in extreme cases.
- ... and other forms of potential misbehavior
Various forms of such misbehavior has been observed in practice - there are
BIOSes that just corrupt the PMU state, consequences be damned.
The PMU is a CPU resource that is handled by the kernel and the BIOS
stealing+corrupting it is not acceptable nor robust, so we detect it,
warn about it and further refuse to touch the PMU ourselves.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Two x86 patches broke lguest:
1) v2.6.35-492-g72d7c3b, which changed x86 to use the memblock allocator.
In lguest, the host places linear page tables at the top of mem, which
used to be enough to get us up to the swapper_pg_dir page tables. With
the first patch, the direct mapping tables used that memory:
Before: kernel direct mapping tables up to 4000000 @ 7000-1a000
After: kernel direct mapping tables up to 4000000 @ 3fed000-4000000
I initially fixed this by lying about the amount of memory we had, so
the kernel wouldn't blatt the lguest boot pagetables (yuk!), but then...
2) v2.6.36-rc8-54-gb40827f, which made x86 boot use initial_page_table.
This was initialized in a part of head_32.S which isn't executed by
lguest; it is then copied into swapper_pg_dir. So we have to initialize
it; and anyway we switch to it before we blatt the old tables, so that
fixes the previous damage as well.
For the moment, I cut & pasted the code into lguest's boot code, but
next merge window I will merge them.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: x86@kernel.org
lguest is dumb and drops *all* the pagetables for set_pte (which is
only used for kernel mapping manipulation, so it's OK without highmem).
But it's used a lot in boot, too. As a guest optimization, we
suppressed this flushing until the first page switch. Now we have
initial_page_table, that happens much earlier, so extend the heuristic
to wait until we switch to something other than the swapper_pg_dir or
initial_page_table.
As measured on my laptop under kvm, this dropped the time-to-mount-root
from 48 seconds to 4.3 seconds.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
fe25c7fc2e "x86: lguest: Convert to new irq chip functions" converted
enable_lguest_irq() to take a struct irq_data *, but didn't fix the one
internal caller.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
To: x86@kernel.org
Calling alloc_bootmem() for tiny chunks of memory over and over is really
slow; on an XO-1, it caused the time between when the kernel started
booting and when the display came alive (post-lxfb probe) to increase
to 44s. This patch optimizes the prom_early_alloc function by
calling alloc_bootmem for 4k-sized blocks of memory, and handing out
chunks of that to callers. With this patch, the time between kernel load
and display initialization decreased to 23s. If there's a better way to
do this early in the boot process, please let me know.
(Note: increasing the chunk size to 16k didn't noticably affect boot time,
and wasted 9k.)
v4: clarify comment, requested by hpa
v3: fix wasted memory buglet found by Milton Miller, and style fix.
v2: reorder prom_early_alloc as suggested by Grant.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101129153951.74202a84@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Make use of PROC_DEVICETREE to export the tree, and sparc's PROMTREE code to
call into OLPC's Open Firmware to build the tree.
v5: fix buglet with root node check (introduced in v4)
v4: address some minor style issues pointed out by Grant, and explicitly cast
negative phandle checks to s32.
v3: rename olpc_prom to olpc_dt
- rework Kconfig entries
- drop devtree build hook from proc, instead adding a call to x86's
paging_init (similarly to how sparc64 does it)
- switch allocation from using slab to alloc_bootmem. this allows
the DT to be built earlier during boot (during setup_arch); the
downside is that there are some 1200 bootmem reservations that are
done during boot. Not ideal..
- add a helper olpc_ofw_is_installed function to test for the
existence and successful detection of OLPC's OFW.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101116220952.26526a80@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
- Define a stub irq_create_of_mapping for x86 as a stop-gap solution until
drivers/of/irq is further along.
- Define irq_dispose_mapping for x86 to appease of_i2c.c
These are needed to allow stuff in drivers/of/ to build on x86. This stuff
will eventually get replaced; quoting Grant,
"The long term plan is to have the drivers/of/ code handling the mapping
intelligently like powerpc currently does." But for now, just provide
these functions.
Signed-off-by: Andres Salomon <dilinger@queued.net>
LKML-Reference: <20101111214526.5de7121b@queued.net>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Add missing header file:
arch/x86/crypto/ghash-clmulni-intel_glue.c:256: error: implicit declaration of function 'IS_ERR'
arch/x86/crypto/ghash-clmulni-intel_glue.c:257: error: implicit declaration of function 'PTR_ERR'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Interrupt-remapping gets enabled very early in the boot, as it determines the
apic mode that the processor can use. And the current code enables the vt-d
fault handling before the setup_local_APIC(). And hence the APIC LDR registers
and data structure in the memory may not be initialized. So the vt-d fault
handling in logical xapic/x2apic modes were broken.
Fix this by enabling the vt-d fault handling in the end_local_APIC_setup()
A cleaner fix of enabling fault handling while enabling intr-remapping
will be addressed for v2.6.38. [ Enabling intr-remapping determines the
usage of x2apic mode and the apic mode determines the fault-handling
configuration. ]
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <20101201062244.541996375@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
In x2apic mode, we need to set the upper address register of the fault
handling interrupt register of the vt-d hardware. Without this
irq migration of the vt-d fault handling interrupt is broken.
Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>
LKML-Reference: <1291225233.2648.39.camel@sbsiddha-MOBL3>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: stable@kernel.org [v2.6.32+]
Acked-by: Chris Wright <chrisw@sous-sol.org>
Tested-by: Takao Indoh <indou.takao@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
During suspend, we disable all the non boot cpus. And during resume we bring
them all back again. So no need to do alternatives_smp_switch() in between.
On my core 2 based laptop, this speeds up the suspend path by 15msec and the
resume path by 5 msec (suspend/resume speed up differences can be attributed
to the different P-states that the cpu is in during suspend/resume).
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1290557500.4946.8.camel@sbsiddha-MOBL3.sc.intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>