Commit graph

31995 commits

Author SHA1 Message Date
Alex Chiang
66db2e6331 [IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
This reverts commit e7b140365b.

Commit e7b14036 removes the targetted disabled CPU from the
cpu_online_map after calls to migrate_platform_irqs and fixup_irqs.

Paul McKenney states that the reasoning behind the patch was to
prevent irq handlers from running on CPUs marked offline because:

	RCU happily ignores CPUs that don't have their bits set in
	cpu_online_map, so if there are RCU read-side critical sections
	in the irq handlers being run, RCU will ignore them.  If the
	other CPUs were running, they might sequence through the RCU
	state machine, which could result in data structures being
	yanked out from under those irq handlers, which in turn could
	result in oopses or worse.

Unfortunately, both ia64 functions above look at cpu_online_map to find
a new CPU to migrate interrupts onto. This means we can potentially
migrate an interrupt off ourself back to... ourself. Uh oh.

This causes an oops when we finally try to process pending interrupts on
the CPU we want to disable. The oops results from calling __do_IRQ with
a NULL pt_regs:

Unable to handle kernel NULL pointer dereference (address 0000000000000040)
Call Trace:
 [<a000000100016930>] show_stack+0x50/0xa0
                                sp=e0000009c922fa00 bsp=e0000009c92214d0
 [<a0000001000171a0>] show_regs+0x820/0x860
                                sp=e0000009c922fbd0 bsp=e0000009c9221478
 [<a00000010003c700>] die+0x1a0/0x2e0
                                sp=e0000009c922fbd0 bsp=e0000009c9221438
 [<a0000001006e92f0>] ia64_do_page_fault+0x950/0xa80
                                sp=e0000009c922fbd0 bsp=e0000009c92213d8
 [<a00000010000c7a0>] ia64_native_leave_kernel+0x0/0x270
                                sp=e0000009c922fc60 bsp=e0000009c92213d8
 [<a0000001000ecdb0>] profile_tick+0xd0/0x1c0
                                sp=e0000009c922fe30 bsp=e0000009c9221398
 [<a00000010003bb90>] timer_interrupt+0x170/0x3e0
                                sp=e0000009c922fe30 bsp=e0000009c9221330
 [<a00000010013a800>] handle_IRQ_event+0x80/0x120
                                sp=e0000009c922fe30 bsp=e0000009c92212f8
 [<a00000010013aa00>] __do_IRQ+0x160/0x4a0
                                sp=e0000009c922fe30 bsp=e0000009c9221290
 [<a000000100012290>] ia64_process_pending_intr+0x2b0/0x360
                                sp=e0000009c922fe30 bsp=e0000009c9221208
 [<a0000001000112d0>] fixup_irqs+0xf0/0x2a0
                                sp=e0000009c922fe30 bsp=e0000009c92211a8
 [<a00000010005bd80>] __cpu_disable+0x140/0x240
                                sp=e0000009c922fe30 bsp=e0000009c9221168
 [<a0000001006c5870>] take_cpu_down+0x50/0xa0
                                sp=e0000009c922fe30 bsp=e0000009c9221148
 [<a000000100122610>] stop_cpu+0xd0/0x200
                                sp=e0000009c922fe30 bsp=e0000009c92210f0
 [<a0000001000e0440>] kthread+0xc0/0x140
                                sp=e0000009c922fe30 bsp=e0000009c92210c8
 [<a000000100014ab0>] kernel_thread_helper+0xd0/0x100
                                sp=e0000009c922fe30 bsp=e0000009c92210a0
 [<a00000010000a4c0>] start_kernel_thread+0x20/0x40
                                sp=e0000009c922fe30 bsp=e0000009c92210a0

I don't like this revert because it is fragile. ia64 is getting lucky
because we seem to only ever process timer interrupts in this path, but
if we ever race with an IPI here, we definitely use RCU and have the
potential of hitting an oops that Paul describes above.

Patching ia64's timer_interrupt() to check for NULL pt_regs is
insufficient though, as we still hit the above oops.

As a short term solution, I do think that this revert is the right
answer. The revert hold up under repeated testing (24+ hour test runs)
with this setup:

	- 8-way rx6600
	- randomly toggling CPU online/offline state every 2 seconds
	- running CPU exercisers, memory hog, disk exercisers, and
	  network stressors
	- average system load around ~160

In the long term, we really need to figure out why we set pt_regs = NULL
in ia64_process_pending_intr(). If it turns out that it is unnecessary
to do so, then we could safely re-introduce e7b14036 (along with some
other logic to be smarter about migrating interrupts).

One final note: x86 also removes the disabled CPU from cpu_online_map
and then re-enables interrupts for 1ms, presumably to handle any pending
interrupts:

arch/x86/kernel/irq_32.c (and irq_64.c):
cpu_disable_common:
	[remove cpu from cpu_online_map]

	fixup_irqs():
		for_each_irq:
			[break CPU affinities]

		local_irq_enable();
		mdelay(1);
		local_irq_disable();

So they are doing implicitly what ia64 is doing explicitly.

Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:32:26 -08:00
Robin Holt
39d481cba2 [IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.
BTE_MAX_XFER is wrong.  It is one greater than the number of cache
lines the BTE is actually able to transfer.  If you request a transfer
of exactly BTE_MAX_XFER size, you trip a very cryptic BUG_ON() which
should certainly be made more clear.

This patch fixes that constant and also cleans up the BUG_ON()s in
arch/ia64/sn/kernel/bte.c to test one condition per line.

Signed-off-by: Robin Holt <holt@sgi.com>
Signed-off-by: Tony Luck <aegl@agluck-desktop.(none)>
2009-02-19 11:29:31 -08:00
Tony Luck
334f85b647 [IA64] Build fix for __early_pfn_to_nid() undefined link error
ia64 only defines __early_pfn_to_nid() for SPARSEMEM && NUMA configurations,
so the recent:

	commit: f2dbcfa738
	mm: clean up for early_pfn_to_nid()

ends up with some link problems for certain configuration files.

Fix arch/ia64/Kconfig to only define HAVE_ARCH_EARLY_PFN_TO_NID in the
cases where we do provide this function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
2009-02-19 11:22:36 -08:00
Linus Torvalds
402a917aca Merge master.kernel.org:/home/rmk/linux-2.6-arm
* master.kernel.org:/home/rmk/linux-2.6-arm:
  [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
  [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
  [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
  [ARM] 5403/1: pxa25x_ep_fifo_flush() *ep->reg_udccs always set to 0
  [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
  [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
  [ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller
  [ARM] 5391/1: AT91: Enable GPIO clocks earlier
  [ARM] 5390/1: AT91: Watchdog fixes
  [ARM] 5398/1: Add Wan ZongShun to MAINTAINERS for W90P910
  [ARM] omap: fix _omap2_clksel_get_src_field()
  [ARM] omap: fix omap2_divisor_to_clksel() error return value
2009-02-19 09:52:12 -08:00
Ingo Molnar
e9ce0c37c2 Merge branch 'x86/untangle2' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen into x86/headers 2009-02-19 18:15:01 +01:00
Linus Torvalds
bcf8951fc2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
  x86, mce: use force_sig_info to kill process in machine check
  x86, mce: reinitialize per cpu features on resume
  x86, rcu: fix strange load average and ksoftirqd behavior
2009-02-19 09:14:35 -08:00
Hartley Sweeten
9dd446f657 [ARM] 5405/1: ep93xx: remove unused gesbc9312.h header
Remove the gesbc9312.h header since it is unused.

Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 16:13:02 +00:00
Cyrill Gorcunov
cb425afd21 x86: compressed head_32 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using startup_32 will not bloat the
code size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov
2d4eeecb98 x86: compressed head_64 - use ENTRY,ENDPROC macros
Impact: clenaup

Linker script will put startup_32 at predefined
address so using ENTRY will not bloat the code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:01 +01:00
Cyrill Gorcunov
324bda9e47 x86: pmjump - use GLOBAL,ENDPROC macros
Impact: cleanup

We are in setup stage so we use GLOBAL
instead of ENTRY and do not increase code
size.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov
2f79555097 x86: copy.S - use GLOBAL,ENDPROC macros
Impact: cleanup

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:13:00 +01:00
Cyrill Gorcunov
1b25f3b4e1 x86: linkage - get rid of _X86 macros
Impact: cleanup

There was an attempt to bring build-time checking for
missed ENTRY_X86/END_X86 and KPROBE... pairs. Using
them will add messy in code. Get just rid of them.
This commit could be easily restored if the need appear
in future.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Cyrill Gorcunov
95695547a7 x86: asm linkage - introduce GLOBAL macro
If the code is time critical and this entry is called
from other places we use ENTRY to have it globally defined
and especially aligned.

Contrary we have some snippets which are size
critical. So we use plane ".globl name; name:"
directive. Introduce GLOBAL macro for this.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 17:12:59 +01:00
Makito SHIOKAWA
9da616fb99 [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC
READ_IMPLIES_EXEC must be set when:
o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)

Signed-off-by: Makito SHIOKAWA <lkhmkt@gmail.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 14:45:27 +00:00
Heiko Carstens
23d75d9cad [S390] fix "mem=" handling in case of standby memory
Standby memory detected with the sclp interface gets always registered
with add_memory calls without considering the limitationt that the
"mem=" kernel paramater implies.
So fix this and only register standby memory that is below the specified
limit.
This fixes zfcpdump since it uses "mem=32M". In case there is appr.
2GB standby memory present all of usable memory would be used for the
struct pages needed for standby memory.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Christian Borntraeger
d5cd0343d2 [S390] Fix timeval regression on s390
commit aa5e97ce4b
[PATCH] improve precision of process accounting.

Introduced a timing regression:
-bash-3.2# time ls
real    0m0.006s
user    0m1.754s
sys     0m1.094s

The problem was introduced by an error in cputime_to_timeval.
Cputime is now 1/4096 microsecond, therefore, we have to divide
the remainder with 4096 to get the microseconds.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2009-02-19 15:19:19 +01:00
Russell King
41f3103fcf [ARM] omap: fix clock reparenting in omap2_clk_set_parent()
When changing the parent of a clock, it is necessary to keep the
clock use counts balanced otherwise things the parent state will
get corrupted.  Since we already disable and re-enable the clock,
we might as well use the recursive versions instead.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 13:25:16 +00:00
Hiroshi Shimamoto
71d8f9784a x86: syscalls.h: remove asmlinkage from declaration of sys_rt_sigreturn()
Impact: cleanup

asmlinkage for sys_rt_sigreturn() no longer exists in arch/x86/kernel/signal.c.

Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 12:18:54 +01:00
Nicolas Pitre
3fd9825c42 [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()
In the non highmem case, if two memory banks of 1GB each are provided,
the second bank would evade suppression since its virtual base would
be 0.  Fix this by disallowing any memory bank which virtual base
address is found to be lower than PAGE_OFFSET.

Reported-by: Lennert Buytenhek <buytenh@marvell.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-19 09:49:45 +00:00
Jaswinder Singh Rajput
de5483029b x86: include/asm/processor.h remove double declaration of print_cpu_info
Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-19 10:12:18 +01:00
KAMEZAWA Hiroyuki
cc2559bccc mm: fix memmap init for handling memory hole
Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
and memmap initialization was not done. This was a trouble for
sparc boot.

To fix this, the PFN should be initialized and marked as PG_reserved.
This patch changes early_pfn_in_nid() return true if PFN is a hole.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
KAMEZAWA Hiroyuki
f2dbcfa738 mm: clean up for early_pfn_to_nid()
What's happening is that the assertion in mm/page_alloc.c:move_freepages()
is triggering:

	BUG_ON(page_zone(start_page) != page_zone(end_page));

Once I knew this is what was happening, I added some annotations:

	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
		printk(KERN_ERR "move_freepages: Bogus zones: "
		       "start_page[%p] end_page[%p] zone[%p]\n",
		       start_page, end_page, zone);
		printk(KERN_ERR "move_freepages: "
		       "start_zone[%p] end_zone[%p]\n",
		       page_zone(start_page), page_zone(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
		       page_to_pfn(start_page), page_to_pfn(end_page));
		printk(KERN_ERR "move_freepages: "
		       "start_nid[%d] end_nid[%d]\n",
		       page_to_nid(start_page), page_to_nid(end_page));
 ...

And here's what I got:

	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
	move_freepages: start_nid[1] end_nid[0]

My memory layout on this box is:

[    0.000000] Zone PFN ranges:
[    0.000000]   Normal   0x00000000 -> 0x0081ff5d
[    0.000000] Movable zone start PFN for each node
[    0.000000] early_node_map[8] active PFN ranges
[    0.000000]     0: 0x00000000 -> 0x00020000
[    0.000000]     1: 0x00800000 -> 0x0081f7ff
[    0.000000]     1: 0x0081f800 -> 0x0081fe50
[    0.000000]     1: 0x0081fed1 -> 0x0081fed8
[    0.000000]     1: 0x0081feda -> 0x0081fedb
[    0.000000]     1: 0x0081fedd -> 0x0081fee5
[    0.000000]     1: 0x0081fee7 -> 0x0081ff51
[    0.000000]     1: 0x0081ff59 -> 0x0081ff5d

So it's a block move in that 0x81f600-->0x81f7ff region which triggers
the problem.

This patch:

Declaration of early_pfn_to_nid() is scattered over per-arch include
files, and it seems it's complicated to know when the declaration is used.
 I think it makes fix-for-memmap-init not easy.

This patch moves all declaration to include/linux/mm.h

After this,
  if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use static definition in include/linux/mm.h
  else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
     -> Use generic definition in mm/page_alloc.c
  else
     -> per-arch back end function will be called.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reported-by: David Miller <davem@davemlloft.net>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-02-18 15:37:55 -08:00
Huang Ying
ef41df4344 x86, mce: fix a race condition in mce_read()
Impact: bugfix

Considering the situation as follow:

before: mcelog.next == 1, mcelog.entry[0].finished = 1

+--------------------------------------------------------------------------
R                   W1                  W2                  W3

read mcelog.next (1)
                    mcelog.next++ (2)
                    (working on entry 1,
                    finished == 0)

mcelog.next = 0
                                        mcelog.next++ (1)
                                        (working on entry 0)
                                                           mcelog.next++ (2)
                                                           (working on entry 1)
                        <----------------- race ---------------->
                    (done on entry 1,
                    finished = 1)
                                                           (done on entry 1,
                                                           finished = 1)

To fix the race condition, a cmpxchg loop is added to mce_read() to
ensure no new MCE record can be added between mcelog.next reading and
mcelog.next = 0.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:33:05 -08:00
Andi Kleen
d6b75584a3 x86, mce: disable machine checks on offlined CPUs
Impact: Lower priority bug fix

Offlined CPUs could still get machine checks, but the machine check handler
cannot handle them properly, leading to an unconditional crash. Disable
machine checks on CPUs that are going down.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:56 -08:00
Andi Kleen
5b4408fdaa x86, mce: don't set up mce sysdev devices with mce=off
Impact: bug fix, in this case the resume handler shouldn't run which
	avoids incorrectly reenabling machine checks on resume

When MCEs are completely disabled on the command line don't set
up the sysdev devices for them either.

Includes a comment fix from Thomas Gleixner.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:50 -08:00
Andi Kleen
52d168e28b x86, mce: switch machine check polling to per CPU timer
Impact: Higher priority bug fix

The machine check poller runs a single timer and then broadcasted an
IPI to all CPUs to check them. This leads to unnecessary
synchronization between CPUs. The original CPU running the timer has
to wait potentially a long time for all other CPUs answering. This is
also real time unfriendly and in general inefficient.

This was especially a problem on systems with a lot of events where
the poller run with a higher frequency after processing some events.
There could be more and more CPU time wasted with this, to
the point of significantly slowing down machines.

The machine check polling is actually fully independent per CPU, so
there's no reason to not just do this all with per CPU timers.  This
patch implements that.

Also switch the poller also to use standard timers instead of work
queues. It was using work queues to be able to execute a user program
on a event, but mce_notify_user() handles this case now with a
separate callback. So instead always run the poll code in in a
standard per CPU timer, which means that in the common case of not
having to execute a trigger there will be less overhead.

This allows to clean up the initialization significantly, because
standard timers are already up when machine checks get init'ed.  No
multiple initialization functions.

Thanks to Thomas Gleixner for some help.

Cc: thockin@google.com
v2: Use del_timer_sync() on cpu shutdown and don't try to handle
migrated timers.
v3: Add WARN_ON for timer running on unexpected CPU

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:44 -08:00
Andi Kleen
9bd9840580 x86, mce: always use separate work queue to run trigger
Impact: Needed for bug fix in next patch

This relaxes the requirement that mce_notify_user has to run in process
context. Useful for future changes, but also leads to cleaner
behaviour now. Now instead mce_notify_user can be called directly
from interrupt (but not NMI) context.

The work queue only uses a single global work struct, which can be done safely
because it is always free to reuse before the trigger function is executed.
This way no events can be lost.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:41 -08:00
Andi Kleen
123aa76ec0 x86, mce: don't disable machine checks during code patching
Impact: low priority bug fix

This removes part of a a patch I added myself some time ago. After some
consideration the patch was a bad idea. In particular it stopped machine check
exceptions during code patching.

To quote the comment:

        * MCEs only happen when something got corrupted and in this
        * case we must do something about the corruption.
        * Ignoring it is worse than a unlikely patching race.
        * Also machine checks tend to be broadcast and if one CPU
        * goes into machine check the others follow quickly, so we don't
        * expect a machine check to cause undue problems during to code
        * patching.

So undo the machine check related parts of
8f4e956b31 NMIs are still disabled.

This only removes code, the only additions are a new comment.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:38 -08:00
Andi Kleen
973a2dd1d5 x86, mce: disable machine checks on suspend
Impact: Bug fix

During suspend it is not reliable to process machine check
exceptions, because CPUs disappear but can still get machine check
broadcasts.  Also the system is slightly more likely to
machine check them, but the handler is typically not a position
to handle them in a meaningfull way.

So disable them during suspend and enable them during resume.

Also make sure they are always disabled on hot-unplugged CPUs.

This new code assumes that suspend always hotunplugs all
non BP CPUs.

v2: Remove the WARN_ONs Thomas objected to.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:32:14 -08:00
Andi Kleen
07db1c140e x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown
Impact: Bugfix

The ifdef for the apic clear on shutdown for the 64bit intel thermal
vector was incorrect and never triggered. Fix that.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:34 -08:00
Andi Kleen
380851bc6b x86, mce: use force_sig_info to kill process in machine check
Impact: bug fix (with tolerant == 3)

do_exit cannot be called directly from the exception handler because
it can sleep and the exception handler runs on the exception stack.
Use force_sig() instead.

Based on a earlier patch by Ying Huang who debugged the problem.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:31 -08:00
Andi Kleen
6ec68bff3c x86, mce: reinitialize per cpu features on resume
Impact: Bug fix

This fixes a long standing bug in the machine check code. On resume the
boot CPU wouldn't get its vendor specific state like thermal handling
reinitialized. This means the boot cpu wouldn't ever get any thermal
events reported again.

Call the respective initialization functions on resume

v2: Remove ancient init because they don't have a resume device anyways.
    Pointed out by Thomas Gleixner.
v3: Now fix the Subject too to reflect v2 change

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-17 15:24:28 -08:00
Nicolas Pitre
fd4b9b3650 [ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
The GPIO interrupts can be configured as either level triggered or edge
triggered, with a default of level triggered.  When an edge triggered
interrupt is requested, the gpio_irq_set_type method is called which
currently switches the given IRQ descriptor between two struct irq_chip
instances: orion_gpio_irq_level_chip and orion_gpio_irq_edge_chip. This
happens via __setup_irq() which also calls irq_chip_set_defaults() to
assign default methods to uninitialized ones.  The problem is that
irq_chip_set_defaults() is called before the irq_chip reference is
switched, leaving the new irq_chip (orion_gpio_irq_edge_chip in this
case) with uninitialized methods such as chip->startup() causing a kernel
oops.

Many solutions are possible, such as making irq_chip_set_defaults() global
and calling it from gpio_irq_set_type(), or calling __irq_set_trigger()
before irq_chip_set_defaults() in __setup_irq().  But those require
modifications to the generic IRQ code which might have adverse effect on
other architectures, and that would still be a fragile arrangement.
Manually copying the missing methods from within gpio_irq_set_type()
would be really ugly and it would break again the day new methods with
automatic defaults are added.

A better solution is to have a single irq_chip instance which can deal
with both edge and level triggered interrupts.  It is also a good idea
to switch the IRQ handler instead, as the edge IRQ handler allows for
one edge IRQ event to be queued as the IRQ is actually masked only when
that second IRQ is received, at which point the hardware can queue an
additional IRQ event, making edge triggered interrupts a bit more
reliable.

Tested-by: Martin Michlmayr <tbm@cyrius.com>

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2009-02-17 22:37:09 +00:00
Linus Torvalds
f8effd1a4a Merge branch 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  doc: mmiotrace.txt, buffer size control change
  trace: mmiotrace to the tracer menu in Kconfig
  mmiotrace: count events lost due to not recording
2009-02-17 14:29:15 -08:00
Linus Torvalds
35010334aa Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vm86: fix preemption bug
  x86, olpc: fix model detection without OFW
  x86, hpet: fix for LS21 + HPET = boot hang
  x86: CPA avoid repeated lazy mmu flush
  x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
  x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption
  x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem
  x86/cpa: make sure cpa is safe to call in lazy mmu mode
  x86, ptrace, mm: fix double-free on race
2009-02-17 14:27:39 -08:00
Linus Torvalds
b30b774930 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/vsx: Fix VSX alignment handler for regs 32-63
  powerpc/ps3: Move ps3_mm_add_memory to device_initcall
  powerpc/mm: Fix numa reserve bootmem page selection
  powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL
2009-02-17 14:23:49 -08:00
Ingo Molnar
9be1b56a3e x86, apic: separate 32-bit setup functionality out of apic_32.c
Impact: build fix, cleanup

A couple of arch setup callbacks were mistakenly in apic_32.c, breaking
the build.

Also simplify the code a bit.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 23:12:48 +01:00
Linus Torvalds
39a65762d4 Merge branch 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: VMX: Flush volatile msrs before emulating rdmsr
  KVM: Fix assigned devices circular locking dependency
  KVM: x86: fix LAPIC pending count calculation
  KVM: Fix INTx for device assignment
  KVM: MMU: Map device MMIO as UC in EPT
  KVM: x86: disable kvmclock on non constant TSC hosts
  KVM: PIT: fix i8254 pending count read
  KVM: Fix racy in kvm_free_assigned_irq
  KVM: Add kvm_arch_sync_events to sync with asynchronize events
  KVM: mmu_notifiers release method
  KVM: Avoid using CONFIG_ in userspace visible headers
  KVM: ia64: fix fp fault/trap handler
2009-02-17 14:04:32 -08:00
Paul E. McKenney
bf51935f3e x86, rcu: fix strange load average and ksoftirqd behavior
Damien Wyart reported high ksoftirqd CPU usage (20%) on an
otherwise idle system.

The function-graph trace Damien provided:

>   799.521187 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521371 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521555 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521738 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.521934 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522068 |   1)  ksoftir-2324  |               |                rcu_check_callbacks() {
>   799.522208 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522392 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522575 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522759 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.522956 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523074 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
>   799.523214 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523397 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523579 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523762 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.523960 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524079 |   1)  ksoftir-2324  |               |                  rcu_check_callbacks() {
>   799.524220 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524403 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524587 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
>   799.524770 |   1)    <idle>-0    |               |  rcu_check_callbacks() {
> [ . . . ]

Shows rcu_check_callbacks() being invoked way too often. It should be called
once per jiffy, and here it is called no less than 22 times in about
3.5 milliseconds, meaning one call every 160 microseconds or so.

Why do we need to call rcu_pending() and rcu_check_callbacks() from the
idle loop of 32-bit x86, especially given that no other architecture does
this?

The following patch removes the call to rcu_pending() and
rcu_check_callbacks() from the x86 32-bit idle loop in order to
reduce the softirq load on idle systems.

Reported-by: Damien Wyart <damien.wyart@free.fr>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 22:47:45 +01:00
H. Peter Anvin
a7eb518998 x86: truncate ISA addresses to unsigned int
Impact: Cleanup; fix inappropriate macro use

ISA addresses on x86 are mapped 1:1 with the physical address space.
Since the ISA address space is only 24 bits (32 for VLB or LPC) it
will always fit in an unsigned int, and at least in the aha1542 driver
using a wider type would cause an undesirable promotion.  Hence
explicitly cast the ISA bus addresses to unsigned int.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
2009-02-17 13:01:51 -08:00
Ingo Molnar
2a05180fe2 x86, apic: move remaining APIC drivers to arch/x86/kernel/apic/*
Move the 32-bit extended-arch APIC drivers to arch/x86/kernel/apic/
too, and rename apic_64.c to probe_64.c.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 20:35:47 +01:00
Ingo Molnar
f62bae5009 x86, apic: move APIC drivers to arch/x86/kernel/apic/*
arch/x86/kernel/ is getting a bit crowded, and the APIC
drivers are scattered into various different files.

Move them to arch/x86/kernel/apic/*, and also remove
the 'gen' prefix from those which had it.

Also move APIC related functionality: the IO-APIC driver,
the NMI and the IPI code.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 18:17:36 +01:00
Ingo Molnar
be163a159b x86, apic: rename 'genapic' to 'apic'
Impact: cleanup

Now that all APIC code is consolidated there's nothing 'gen' about
apics anymore - so rename 'struct genapic' to 'struct apic'.

This shortens the code and is nicer to read as well.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:53:57 +01:00
Ingo Molnar
ab6fb7c0b0 x86, apic: remove ->store_NMI_vector()
Impact: cleanup

It's not used by anything anymore.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:53:56 +01:00
Ingo Molnar
cb81eaedf1 x86, numaq_32: clean up, misc
Impact: cleanup

 - misc other cleanups that change the md5 signature
 - consolidate global variables
 - remove unnecessary __numaq_mps_oem_check() wrapper
 - make numaq_mps_oem_check static
 - update copyrights
 - misc other cleanups pointed out by checkpatch

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:53:54 +01:00
Ingo Molnar
36afc3af04 x86, numaq_32: clean up
Impact: cleanup

- refactor smp_dump_qct()
- tidy up include files, remove duplicates
- misc other cleanups, pointed out by checkpatch

No code changed:

md5:
   9c0bc01a53558c77df0f2ebcda7e11a9  numaq_32.o.before.asm
   9c0bc01a53558c77df0f2ebcda7e11a9  numaq_32.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:51 +01:00
Ingo Molnar
7da18ed924 x86, es7000: misc cleanups
These are cleanups that change the md5 signature:

 - asm/ => linux/ include conversion
 - simplify the code flow of find_unisys_acpi_oem_table()
 - move ACPI methods into one #ifdef block
 - remove 0/NULL initialization of statics
 - simplify/standardize printouts
 - update copyrights
 - more cleanups, pointed out by checkpatch

arch/x86/kernel/es7000_32.o:

   text	   data	    bss	    dec	    hex	filename
   2693	    192	     44	   2929	    b71	es7000_32.o.before
   2688	    192	     44	   2924	    b6c	es7000_32.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:50 +01:00
Ingo Molnar
352887d1c9 x86, es7000: remove dead code, clean up
Impact: cleanup

 - a number of structure definitions were stale
 - remove needless wrappers around apic definitions
 - fix details noticed by checkpatch

No code changed:

md5:
   029d8fde0aaf6e934ea63bd8b36430fd  es7000_32.o.before.asm
   029d8fde0aaf6e934ea63bd8b36430fd  es7000_32.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:49 +01:00
Ingo Molnar
d3185b37df x86, es7000: remove externs
Impact: cleanup

In the subarch times there were a number of externs between
various bits of the ES7000 code. Now that there's a single
es7000-platform support file, the externs can be removed and
the functions can be changed the statics.

Beyond the cleanup factor, this also shrinks the size of the
kernel image a bit:

arch/x86/kernel/es7000_32.o:

   text	   data	    bss	    dec	    hex	filename
   2813	    192	     44	   3049	    be9	es7000_32.o.before
   2693	    192	     44	   2929	    b71	es7000_32.o.after

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:48 +01:00
Ingo Molnar
b9e0d1aa97 x86, apic: remove apicid_cluster()
There were multiple definitions of apicid_cluster() scattered around
in APIC drivers - but the definitions are equivalent to the already
existing generic APIC_CLUSTER() method.

So remove apicid_cluster() and change all users to APIC_CLUSTER().

No code changed:

md5:
   1b8244ba8d3d6a454593ce10f09dfa58  summit_32.o.before.asm
   1b8244ba8d3d6a454593ce10f09dfa58  summit_32.o.after.asm

md5:
   a593d98a882bf534622c70d9568497ac  es7000_32.o.before.asm
   a593d98a882bf534622c70d9568497ac  es7000_32.o.after.asm

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-02-17 17:52:47 +01:00