Commit graph

10213 commits

Author SHA1 Message Date
Masami Hiramatsu
d498f76395 kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
Change RELATIVEJUMP_INSTRUCTION macro to RELATIVEJUMP_OPCODE
since it represents just the opcode byte.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133349.6725.99302.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 17:49:24 +01:00
Thomas Gleixner
bb8d41330c x86/PCI: Prevent mmconfig memory corruption
commit ff097ddd4 (x86/PCI: MMCONFIG: manage pci_mmcfg_region as a
list, not a table) introduced a nasty memory corruption when
pci_mmcfg_list is empty.

pci_mmcfg_check_end_bus_number() dereferences pci_mmcfg_list.prev even
when the list is empty. The following write hits some variable near to
pci_mmcfg_list.

Further down a similar problem exists, where cfg->list.next is
dereferenced unconditionally and a comparison with some variable near
to pci_mmcfg_list happens.

Add a check for the last element into the for_each_entry() loop and
remove all the other crappy logic which is just a leftover of the old
array based code which was replaced by the list conversion.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-25 08:30:58 -08:00
Steven Rostedt
0c54dd341f ftrace: Remove memory barriers from NMI code when not needed
The code in stop_machine that modifies the kernel text has a bit
of logic to handle the case of NMIs. stop_machine does not prevent
NMIs from executing, and if an NMI were to trigger on another CPU
as the modifying CPU is changing the NMI text, a GPF could result.

To prevent the GPF, the NMI calls ftrace_nmi_enter() which may
modify the code first, then any other NMIs will just change the
text to the same content which will do no harm. The code that
stop_machine called must wait for NMIs to finish while it changes
each location in the kernel. That code may also change the text
to what the NMI changed it to. The key is that the text will never
change content while another CPU is executing it.

To make the above work, the call to ftrace_nmi_enter() must also
do a smp_mb() as well as atomic_inc().  But for applications like
perf that require a high number of NMIs for profiling, this can have
a dramatic effect on the system. Not only is it doing a full memory
barrier on both nmi_enter() as well as nmi_exit() it is also
modifying a global variable with an atomic operation. This kills
performance on large SMP machines.

Since the memory barriers are only needed when ftrace is in the
process of modifying the text (which is seldom), this patch
adds a "modifying_code" variable that gets set before stop machine
is executed and cleared afterwards.

The NMIs will check this variable and store it in a per CPU
"save_modifying_code" variable that it will use to check if it
needs to do the memory barriers and atomic dec on NMI exit.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-02-25 08:42:06 -05:00
Ian Campbell
1431559200 x86, mm: Allow highmem user page tables to be disabled at boot time
Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.

Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.

With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 175.396 (0.238914)
  User Time 515.983 (5.85019)
  System Time 59.737 (1.26727)
  Percent CPU 263.8 (71.6796)
  Context Switches 39989.7 (4672.64)
  Sleeps 42617.7 (246.307)

With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 174.278 (0.831968)
  User Time 515.659 (6.07012)
  System Time 55.9 (1.07799)
  Percent CPU 263.8 (71.266)
  Context Switches 39929.6 (4485.13)
  Sleeps 42583.7 (373.039)

This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.

It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.

Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.

Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).

It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 10:28:19 +01:00
Thadeu Lima de Souza Cascardo
e808bae240 x86: Do not reserve brk for DMI if it's not going to be used
This will save 64K bytes from memory when loading linux if DMI is
disabled, which is good for embedded systems.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
LKML-Reference: <1265758732-19320-1-git-send-email-cascardo@holoscopio.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 10:28:18 +01:00
Jacob Pan
c54113823c x86, pci: Add sanity check for PCI fixed bar probing
While probing for the PCI fixed BAR capability in the extended PCI
configuration space we need to make sure raw_pci_ext_ops is
actually initialized.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A321E8F7@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:34 -08:00
Yinghai Lu
9eeeb09edb x86, legacy_irq: Remove duplicate vector assigment
Remove duplicated cfg[i].vector assignment.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B8493A0.6080501@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:34 -08:00
Yinghai Lu
28c6a0ba30 x86, legacy_irq: Remove left over nr_legacy_irqs
nr_legacy_irqs and its ilk have moved to legacy_pic.

-v2: there is one in ioapic_.c

Singed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B84AAC4.2020204@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:34 -08:00
Jacob Pan
3746c6b6e2 x86, mrst: Platform clock setup code
Add Moorestown platform clock setup code to the x86_init abstraction.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D4@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:33 -08:00
Jacob Pan
bb24c47161 x86, apbt: Moorestown APB system timer driver
Moorestown platform does not have PIT or HPET platform timers.  Instead it
has a bank of eight APB timers.  The number of available timers to the os
is exposed via SFI mtmr tables.  All APB timer interrupts are routed via
ioapic rtes and delivered as MSI.
Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2
for clocksource.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:21 -08:00
Feng Tang
cf08945596 x86, mrst: Add vrtc platform data setup code
vRTC information is obtained from SFI tables on Moorestown, this patch parses
these tables and assign the information.

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0D@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
16ab539585 x86, mrst: Add platform timer info parsing code
Moorestown platform timer information is obtained from SFI FW tables.
This patch parses SFI table then assign the irq information to mp_irqs.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0B@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
af2730f6ee x86, mrst: Fill in PCI functions in x86_init layer
This patch added Moorestown platform specific PCI init functions.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0A@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
5b78b6724a x86, mrst: Add dummy legacy pic to platform setup
Moorestown has no legacy PIC; point it to the null legacy PIC.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D09@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jesse Barnes
a712ffbc19 x86/PCI: Moorestown PCI support
The Moorestown platform only has a few devices that actually support
PCI config cycles.  The rest of the devices use an in-RAM MCFG space
for the purposes of device enumeration and initialization.

There are a few uglies in the fake support, like BAR sizes that aren't
a power of two, sizing detection, and writes to the real devices, but
other than that it's pretty straightforward.

Another way to think of this is not really as PCI at all, but just a
table in RAM describing which devices are present, their capabilities
and their offsets in MMIO space.  This could have been done with a
special new firmware table on this platform, but given that we do have
some real PCI devices too, simply describing things in an MCFG type
space was pretty simple.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D08@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:14:47 -08:00
Jacob Pan
4966e1affb x86, ioapic: Add dummy ioapic functions
Some ioapic extern functions are used when CONFIG_X86_IO_APIC is not
defined.  We need the dummy functions to avoid a compile time error.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318DA07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:14:07 -08:00
Jacob Pan
05ddafb17a x86, ioapic: Early enable ioapic for timer irq
Moorestown platform needs apic ready early for the system timer irq
which is delievered via ioapic.  Should not impact other platforms.

In the longer term, once ioapic setup is moved before late time init,
we will not need this patch to do early apic enabling.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:13:19 -08:00
Jacob Pan
28a3c93d11 x86, pic: Fix section mismatch in legacy pic
Move legacy_pic chip dummy functions out of init section as they might
be referenced at run time.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D3AA@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:13:19 -08:00
Suresh Siddha
6dbbe14f21 x86, ptrace: Remove set_stopped_child_used_math() in [x]fpregs_set
init_fpu() already ensures that the used_math() is set for the stopped child.
Remove the redundant set_stopped_child_used_math() in [x]fpregs_set()

Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100222225240.642169080@sbs-t61.sc.intel.com>
Acked-by: Rolan McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 13:45:27 -08:00
Suresh Siddha
ff7fbc72e0 x86, ptrace: Simplify xstateregs_get()
48 bytes (bytes 464..511) of the xstateregs payload come from the
kernel defined structure (xstate_fx_sw_bytes). Rest comes from the
xstate regs structure in the thread struct. Instead of having multiple
user_regset_copyout()'s, simplify the xstateregs_get() by first
copying the SW bytes into the xstate regs structure in the thread structure
and then using one user_regset_copyout() to copyout the xstateregs.

Requested-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100222225240.494688491@sbs-t61.sc.intel.com>
Acked-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Oleg Nesterov <oleg@redhat.com>
2010-02-23 13:45:27 -08:00
Bjorn Helgaas
7bc5e3f2be x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183

Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681

Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:42 -08:00
Bjorn Helgaas
2fe2abf896 PCI: augment bus resource table with a list
Previously we used a table of size PCI_BUS_NUM_RESOURCES (16) for resources
forwarded to a bus by its upstream bridge.  We've increased this size
several times when the table overflowed.

But there's no good limit on the number of resources because host bridges
and subtractive decode bridges can forward any number of ranges to their
secondary buses.

This patch reduces the table to only PCI_BRIDGE_RESOURCE_NUM (4) entries,
which corresponds to the number of windows a PCI-to-PCI (3) or CardBus (4)
bridge can positively decode.  Any additional resources, e.g., PCI host
bridge windows or subtractively-decoded regions, are kept in a list.

I'd prefer a single list rather than this split table/list approach, but
that requires simultaneous changes to every architecture.  This approach
only requires immediate changes where we set up (a) host bridges with more
than four windows and (b) subtractive-decode P2P bridges, and we can
incrementally change other architectures to use the list.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:37 -08:00
H. Peter Anvin
54b56170e4 Merge remote branch 'origin/x86/apic' into x86/mrst
Conflicts:
	arch/x86/kernel/apic/io_apic.c
2010-02-22 16:25:18 -08:00
H. Peter Anvin
d02e30c31c Merge branch 'x86/irq' into x86/apic
Merge reason:
	Conflicts in arch/x86/kernel/apic/io_apic.c

Resolved Conflicts:
	arch/x86/kernel/apic/io_apic.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-22 16:20:34 -08:00
Dominik Brodowski
3b7a17fcda resource/PCI: mark struct resource as const
Now that we return the new resource start position, there is no
need to update "struct resource" inside the align function.
Therefore, mark the struct resource as const.

Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:16:57 -08:00
Dominik Brodowski
b26b2d494b resource/PCI: align functions now return start of resource
As suggested by Linus, align functions should return the start
of a resource, not void. An update of "res->start" is no longer
necessary.

Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yinghai Lu <yhlu.kernel@gmail.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:16:56 -08:00
Seth Heasley
93da620226 x86/PCI: irq and pci_ids patch for Intel Cougar Point DeviceIDs
This patch adds the Intel Cougar Point (PCH) LPC and SMBus Controller DeviceIDs.

Signed-off-by: Seth Heasley <seth.heasley@intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-22 16:16:55 -08:00
Suresh Siddha
281ff33b7c x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already
We currently enforce the !RW mapping for the kernel mapping that maps
holes between different text, rodata and data sections. However, kernel
identity mappings will have different RWX permissions to the pages mapping to
text and to the pages padding (which are freed) the text, rodata sections.
Hence kernel identity mappings will be broken to smaller pages. For 64-bit,
kernel text and kernel identity mappings are different, so we can enable
protection checks that come with CONFIG_DEBUG_RODATA, as well as retain 2MB
large page mappings for kernel text.

Konrad reported a boot failure with the Linux Xen paravirt guest because of
this. In this paravirt guest case, the kernel text mapping and the kernel
identity mapping share the same page-table pages. Thus forcing the !RW mapping
for some of the kernel mappings also cause the kernel identity mappings to be
read-only resulting in the boot failure. Linux Xen paravirt guest also
uses 4k mappings and don't use 2M mapping.

Fix this issue and retain large page performance advantage for native kernels
by not working hard and not enforcing !RW for the kernel text mapping,
if the current mapping is already using small page mapping.

Reported-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1266522700.2909.34.camel@sbs-t61.sc.intel.com>
Tested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: stable@kernel.org	[2.6.32, 2.6.33]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-22 15:09:31 -08:00
Linus Torvalds
bee415ce42 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Init struct probe_point and set counter correctly
  hw-breakpoint: Keep track of dr7 local enable bits
  hw-breakpoints: Accept breakpoints on NULL address
  perf_events: Fix FORK events
2010-02-22 08:55:32 -08:00
H. Peter Anvin
aef55d4922 Merge branch 'x86/urgent' into x86/irq
Merge reason: conflict in arch/x86/kernel/apic/io_apic.c

Resolved Conflicts:
	arch/x86/kernel/apic/io_apic.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-20 22:54:05 -08:00
Russell King
4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
Jacob Pan
1f91233c26 x86, apic: Remove ioapic_disable_legacy()
The ioapic_disable_legacy() call is no longer needed for platforms do
not have legacy pic. the legacy pic abstraction has taken care it
automatically.

This patch also initialize irq-related static variables based on
information obtained from legacy_pic.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A30A7660@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 17:16:38 -08:00
Jacob Pan
b81bb373a7 x86, pic: Make use of legacy_pic abstraction
This patch replaces legacy PIC-related global variable and functions
with the new legacy_pic abstraction.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D04@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Jacob Pan
ef3548668c x86, pic: Introduce legacy_pic abstraction
This patch makes i8259A like legacy programmable interrupt controller
code into a driver so that legacy pic functions can be selected at
runtime based on platform information, such as HW subarchitecure ID.
Default structure of legacy_pic maintains the current code path for
x86pc.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D03@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Jacob Pan
35f720c593 x86: Initialize stack canary in secondary start
Some secondary clockevent setup code needs to call request_irq, which
will cause fake stack check failure in schedule() if voluntary
preemption model is chosen.  It is safe to have stack canary
initialized here early, since start_secondary() does not return.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D02@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Alek Du
d39f6495f6 x86, ioapic: Improve handling of i8259A irq init
Since we already track the number of legacy vectors by nr_legacy_irqs, we
can avoid use static vector allocations -- we can use dynamic one.

Signed-off-by: Alek Du <alek.du@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D01@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Thomas Gleixner
9325a28ce2 x86: Add pcibios_fixup_irqs to x86_init
Platforms like Moorestown want to override the pcibios_fixup_irqs
default function. Add it to x86_init.pci.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D00@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:39 -08:00
Thomas Gleixner
ab3b37937e x86: Add pci_init_irq to x86_init
Moorestown wants to reuse pcibios_init_irq but needs to provide its
own implementation of pci_enable_irq. After we distangled the init we
can move the init_irq call to x86_init and remove the pci_enable_irq
!= NULL check in pcibios_init_irq. pci_enable_irq is compile time
initialized to pirq_enable_irq and the special cases which override it
(visws and acpi) set the x86_init function pointer to noop. That
allows MSRT to override pci_enable_irq and otherwise run
pcibios_init_irq unmodified.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFF@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:33 -08:00
Thomas Gleixner
b72d0db9dd x86: Move pci init function to x86_init
The PCI initialization in pci_subsys_init() is a mess. pci_numaq_init,
pci_acpi_init, pci_visws_init and pci_legacy_init are called and each
implementation checks and eventually modifies the global variable
pcibios_scanned.

x86_init functions allow us to do this more elegant. The pci.init
function pointer is preset to pci_legacy_init. numaq, acpi and visws
can modify the pointer in their early setup functions. The functions
return 0 when they did the full initialization including bus scan. A
non zero return value indicates that pci_legacy_init needs to be
called either because the selected function failed or wants the
generic bus scan in pci_legacy_init to happen (e.g. visws).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFE@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:29 -08:00
H. Peter Anvin
8e92dc767a x86, setup: Don't skip mode setting for the standard VGA modes
The code for setting standard VGA modes probes for the current mode,
and skips the mode setting if the mode is 3 (color text 80x25) or 7
(mono text 80x25).  Unfortunately, there are BIOSes, including the
VMware BIOS, which report the previous mode if function 0F is queried
while the screen is in a VESA mode, and of course, nothing can help a
mode poked directly into the hardware.

As such, the safe option is to set the mode anyway, and only query to
see if we should be using mode 7 rather than mode 3.  People who don't
want any mode setting at all should probably use vga=0x0f04
(VIDEO_CURRENT_MODE).  It's possible that should be the kernel
default.

Reported-by Rene Arends <R.R.Arends@hro.nl>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-02-19 13:21:38 -08:00
Frederic Weisbecker
326264a024 hw-breakpoint: Keep track of dr7 local enable bits
When the user enables breakpoints through dr7, he can choose
between "local" or "global" enable bits but given how linux is
implemented, both have the same effect.

That said we don't keep track how the user enabled the breakpoints
so when the user requests the dr7 value, we only translate the
"enabled" status using the global enabled bits. It means that if
the user enabled a breakpoint using the local enabled bit, reading
back dr7 will set the global bit and clear the local one.

Apps like Wine expect a full dr7 POKEUSER/PEEKUSER match for emulated
softwares that implement old reverse engineering protection schemes.

We fix that by keeping track of the whole dr7 value given by the user
in the thread structure to drop this bug. We'll think about
something more proper later.

This fixes a 2.6.32 - 2.6.33-x ptrace regression.

Reported-and-tested-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Maneesh Soni <maneesh@linux.vnet.ibm.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
2010-02-19 19:06:48 +01:00
Frederic Weisbecker
84d7109267 hw-breakpoints: Accept breakpoints on NULL address
Before we had a generic breakpoint API, ptrace was accepting
breakpoints on NULL address in x86. The new API refuse them,
without given strong reasons. We need to follow the previous
behaviour as some userspace apps like Wine need such NULL
breakpoints to ensure old emulated software protections
are still working.

This fixes a 2.6.32 - 2.6.33-x ptrace regression.

Reported-and-tested-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: K.Prasad <prasad@linux.vnet.ibm.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Maneesh Soni <maneesh@linux.vnet.ibm.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
2010-02-19 18:35:14 +01:00
H. Peter Anvin
eb572a5c79 x86-64, setup: Inhibit decompressor output if video info is invalid
Inhibit output from the kernel decompressor if the video information
is invalid.  This was already the case for 32 bits, make 64 bits
match.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-02-18 22:15:04 -08:00
Borislav Petkov
cb19060abf x86, cacheinfo: Enable L3 CID only on AMD
Final stage linking can fail with

 arch/x86/built-in.o: In function `store_cache_disable':
 intel_cacheinfo.c:(.text+0xc509): undefined reference to `amd_get_nb_id'
 arch/x86/built-in.o: In function `show_cache_disable':
 intel_cacheinfo.c:(.text+0xc7d3): undefined reference to `amd_get_nb_id'

when CONFIG_CPU_SUP_AMD is not enabled because the amd_get_nb_id
helper is defined in AMD-specific code but also used in generic code
(intel_cacheinfo.c). Reorganize the L3 cache index disable code under
CONFIG_CPU_SUP_AMD since it is AMD-only anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184210.GF20473@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-18 21:59:07 -08:00
Borislav Petkov
f619b3d842 x86, cacheinfo: Remove NUMA dependency, fix for AMD Fam10h rev D1
The show/store_cache_disable routines depend unnecessarily on NUMA's
cpu_to_node and the disabling of cache indices broke when !CONFIG_NUMA.
Remove that dependency by using a helper which is always correct.

While at it, enable L3 Cache Index disable on rev D1 Istanbuls which
sport the feature too.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100218184339.GG20473@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-18 21:58:57 -08:00
Brandon Philips
eb5b379406 x86, irq: Keep chip_data in create_irq_nr and destroy_irq
Version 4: use get_irq_chip_data() in destroy_irq() to get rid of some
local vars.

When two drivers are setting up MSI-X at the same time via
pci_enable_msix() there is a race.  See this dmesg excerpt:

[   85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X
[   85.170611]   alloc irq_desc for 99 on node -1
[   85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X
[   85.170614]   alloc kstat_irqs on node -1
[   85.170616] alloc irq_2_iommu on node -1
[   85.170617]   alloc irq_desc for 100 on node -1
[   85.170619]   alloc kstat_irqs on node -1
[   85.170621] alloc irq_2_iommu on node -1
[   85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X
[   85.170626]   alloc irq_desc for 101 on node -1
[   85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X
[   85.170630]   alloc kstat_irqs on node -1
[   85.170631] alloc irq_2_iommu on node -1
[   85.170635]   alloc irq_desc for 102 on node -1
[   85.170636]   alloc kstat_irqs on node -1
[   85.170639] alloc irq_2_iommu on node -1
[   85.170646] BUG: unable to handle kernel NULL pointer dereference
at 0000000000000088

As you can see igb and ixgbe are both alternating on create_irq_nr()
via pci_enable_msix() in their probe function.

ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe
choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and
calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data =
NULL via dynamic_irq_init().

igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[]
via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this:

	cfg_new = irq_desc_ptrs[102]->chip_data;
	if (cfg_new->vector != 0)
		continue;

This hits the NULL deref.

Another possible race exists via pci_disable_msix() in a driver or in
the number of error paths that call free_msi_irqs():

destroy_irq()
dynamic_irq_cleanup() which sets desc->chip_data = NULL
...race window...
desc->chip_data = cfg;

Remove the save and restore code for cfg in create_irq_nr() and
destroy_irq() and take the desc->lock when checking the irq_cfg.

Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <20100207210250.GB8256@jenkins.home.ifup.org>
Signed-off-by: Brandon Phiilps <bphilips@suse.de>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-18 21:53:15 -08:00
Len Brown
0e2ecbaefd Merge branches 'bugzilla-14886', 'bugzilla-15000', 'bugzilla-15040', 'bugzilla-15108', 'pdc', 'hotplug-null-ref' and 'thinkpad' into release 2010-02-18 03:51:04 -05:00
H. Peter Anvin
f1f6baf8f1 x86, setup: When restoring the screen, update boot_params.screen_info
When we restore the screen content after a mode change, we return the
cursor to its former position.  However, we need to also update
boot_params.screen_info accordingly, so that the decompression code
knows where on the screen the cursor is.  Just in case the video BIOS
does something extra screwy, read the cursor position back from the
BIOS instead of relying on it doing the right thing.

While we're at it, make sure we cap the cursor position to the new
screen coordinates.

Reported-by: Wim Osterholt <wim@djo.tudelft.nl>
Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15329
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17 18:32:06 -08:00
Yinghai Lu
2b633e3fac smp: Use nr_cpus= to set nr_cpu_ids early
On x86, before prefill_possible_map(), nr_cpu_ids will be NR_CPUS aka
CONFIG_NR_CPUS.

Add nr_cpus= to set nr_cpu_ids. so we can simulate cpus <=8 are installed on
normal config.

-v2: accordging to Christoph, acpi_numa_init should use nr_cpu_ids in stead of
     NR_CPUS.
-v3: add doc in kernel-parameters.txt according to Andrew.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-34-git-send-email-yinghai@kernel.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tony Luck <tony.luck@intel.com>
2010-02-17 17:30:22 -08:00
Yinghai Lu
6738762d73 x86, irq: Remove arch_probe_nr_irqs
So keep nr_irqs == NR_IRQS.  With radix trees is matters less.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-33-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-17 17:29:21 -08:00