Commit graph

9666 commits

Author SHA1 Message Date
Bharat Bhushan
dc2babfea5 KVM: PPC: Fix DEC truncation for greater than 0xffff_ffff/1000
kvmppc_emulate_dec() uses dec_nsec of type unsigned long and does below calculation:

        dec_nsec = vcpu->arch.dec;
        dec_nsec *= 1000;
This will truncate if DEC value "vcpu->arch.dec" is greater than 0xffff_ffff/1000.
For example : For tb_ticks_per_usec = 4a, we can not set decrementer more than ~58ms.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Acked-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:25 +02:00
Bharat Bhushan
f9208427f7 PPC: Fix race in mtmsr paravirt implementation
The current implementation of mtmsr and mtmsrd are racy in that it does:

  * check (int_pending == 0)
  ---> host sets int_pending = 1 <---
  * write shared page
  * done

while instead we should check for int_pending after the shared page is written.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Alexander Graf
95325e6b19 KVM: PPC: E500: Support hugetlbfs
With hugetlbfs support emerging on e500, we should also support KVM
backing its guest memory by it.

This patch adds support for hugetlbfs into the e500 shadow mmu code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
841741f23b KVM: PPC: e500: Don't hardcode PIR=0
The hardcoded behavior prevents proper SMP support.

user space shall specify the vcpu's PIR as the vcpu id.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
303b7c97e3 KVM: PPC: e500: tlbsx: fix tlb0 esel
It should contain the way, not the absolute TLB0 index.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
dc83b8bc02 KVM: PPC: e500: MMU API
This implements a shared-memory API for giving host userspace access to
the guest's TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:24 +02:00
Scott Wood
0164c0f0c4 KVM: PPC: e500: clear up confusion between host and guest entries
Split out the portions of tlbe_priv that should be associated with host
entries into tlbe_ref.  Base victim selection on the number of hardware
entries, not guest entries.

For TLB1, where one guest entry can be mapped by multiple host entries,
we use the host tlbe_ref for tracking page references.  For the guest
TLB0 entries, we still track it with gtlb_priv, to avoid having to
retranslate if the entry is evicted from the host TLB but not the
guest TLB.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood
90b92a6f51 KVM: PPC: e500: Eliminate preempt_disable in local_sid_destroy_all
The only place it makes sense to call this function already needs
to have preemption disabled.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Scott Wood
3bf3cdcc14 KVM: PPC: e500: don't translate gfn to pfn with preemption disabled
Delay allocation of the shadow pid until we're ready to disable
preemption and write the entry.

Signed-off-by: Scott Wood <scottwood@freescale.com>
Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:23 +02:00
Christian Borntraeger
b9e5dc8d45 KVM: provide synchronous registers in kvm_run
On some cpus the overhead for virtualization instructions is in the same
range as a system call. Having to call multiple ioctls to get set registers
will make certain userspace handled exits more expensive than necessary.
Lets provide a section in kvm_run that works as a shared save area
for guest registers.
We also provide two 64bit flags fields (architecture specific), that will
specify
1. which parts of these fields are valid.
2. which registers were modified by userspace

Each bit for these flag fields will define a group of registers (like
general purpose) or a single register.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:22 +02:00
Carsten Otte
5b1c1493af KVM: s390: ucontrol: export SIE control block to user
This patch exports the s390 SIE hardware control block to userspace
via the mapping of the vcpu file descriptor. In order to do so,
a new arch callback named kvm_arch_vcpu_fault  is introduced for all
architectures. It allows to map architecture specific pages.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:19 +02:00
Carsten Otte
e08b963716 KVM: s390: add parameter for KVM_CREATE_VM
This patch introduces a new config option for user controlled kernel
virtual machines. It introduces a parameter to KVM_CREATE_VM that
allows to set bits that alter the capabilities of the newly created
virtual machine.
The parameter is passed to kvm_arch_init_vm for all architectures.
The only valid modifier bit for now is KVM_VM_S390_UCONTROL.
This requires CAP_SYS_ADMIN privileges and creates a user controlled
virtual machine on s390 architectures.

Signed-off-by: Carsten Otte <cotte@de.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2012-03-05 14:52:18 +02:00
Ingo Molnar
737f24bda7 Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/builtin-record.c
	tools/perf/builtin-top.c
	tools/perf/perf.h
	tools/perf/util/top.h

Merge reason: resolve these cherry-picking conflicts.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-05 09:20:08 +01:00
Thomas Gleixner
ba74c1448f sched/rt: Document scheduler related skip-resched-check sites
Create a distinction between scheduler related preempt_enable_no_resched()
calls and the nearly one hundred other places in the kernel that do not
want to reschedule, for one reason or another.

This distinction matters for -rt, where the scheduler and the non-scheduler
preempt models (and checks) are different. For upstream it's purely
documentational.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/n/tip-gs88fvx2mdv5psnzxnv575ke@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01 10:28:04 +01:00
Thomas Gleixner
bd2f55361f sched/rt: Use schedule_preempt_disabled()
Coccinelle based conversion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-24swm5zut3h9c4a6s46x8rws@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-03-01 10:28:03 +01:00
Paul Gortmaker
50af5ead3b bug.h: add include of it to various implicit C users
With bug.h currently living right in linux/kernel.h there
are files that use BUG_ON and friends but are not including
the header explicitly.  Fix them up so we can remove the
presence in kernel.h file.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2012-02-29 17:15:08 -05:00
Yinghai Lu
210647af89 PCI: Rename pci_remove_bus_device to pci_stop_and_remove_bus_device
The old pci_remove_bus_device actually did stop and remove.

Make the name reflect that to reduce confusion.

This patch is done by sed scripts and changes back some incorrect
__pci_remove_bus_device changes.

Suggested-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2012-02-27 12:12:18 -08:00
David S. Miller
ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Danny Kukawka
0a167e0a5c arch/powerpc/platforms/powernv/setup.c: included asm/xics.h twice
arch/powerpc/platforms/powernv/setup.c: included 'asm/xics.h' twice,
remove the duplicate.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:59 +11:00
Danny Kukawka
ed7e3d1ca7 arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice
arch/powerpc/kvm/book3s_hv.c: included 'linux/sched.h' twice,
remove the duplicate.

Signed-off-by: Danny Kukawka <danny.kukawka@bisect.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Stephen Rothwell
3d066d77cf powerpc: remove CONFIG_PPC_ISERIES from the architecture Kconfig files
After this, we can remove the legacy iSeries code more easily.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Benjamin Herrenschmidt
fe83364f0b powerpc/mpic: Fix allocation of reverse-map for multi-ISU mpics
When using a multi-ISU MPIC, we can interrupts up to
isu_size * MPIC_MAX_ISU, not just isu_size, so allocate
the right size reverse map.

Without this, the code will constantly fallback to
a linear search.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-27 11:33:58 +11:00
Benjamin Herrenschmidt
f851013cb2 Merge remote-tracking branch 'origin/master' into next 2012-02-27 10:50:11 +11:00
Ben Greear
3bdc0eba0b net: Add framework to allow sending packets with customized CRC.
This is useful for testing RX handling of frames with bad
CRCs.

Requires driver support to actually put the packet on the
wire properly.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2012-02-24 01:37:35 -08:00
Ingo Molnar
c5905afb0e static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]()
So here's a boot tested patch on top of Jason's series that does
all the cleanups I talked about and turns jump labels into a
more intuitive to use facility. It should also address the
various misconceptions and confusions that surround jump labels.

Typical usage scenarios:

        #include <linux/static_key.h>

        struct static_key key = STATIC_KEY_INIT_TRUE;

        if (static_key_false(&key))
                do unlikely code
        else
                do likely code

Or:

        if (static_key_true(&key))
                do likely code
        else
                do unlikely code

The static key is modified via:

        static_key_slow_inc(&key);
        ...
        static_key_slow_dec(&key);

The 'slow' prefix makes it abundantly clear that this is an
expensive operation.

I've updated all in-kernel code to use this everywhere. Note
that I (intentionally) have not pushed through the rename
blindly through to the lowest levels: the actual jump-label
patching arch facility should be named like that, so we want to
decouple jump labels from the static-key facility a bit.

On non-jump-label enabled architectures static keys default to
likely()/unlikely() branches.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jason Baron <jbaron@redhat.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: a.p.zijlstra@chello.nl
Cc: mathieu.desnoyers@efficios.com
Cc: davem@davemloft.net
Cc: ddaney.cavm@gmail.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20120222085809.GA26397@elte.hu
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2012-02-24 10:05:59 +01:00
Bjorn Helgaas
fb127cb9de PCI: collapse pcibios_resource_to_bus
Everybody uses the generic pcibios_resource_to_bus() supplied by the core
now, so remove the ARCH_HAS_GENERIC_PCI_OFFSETS used during conversion.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:19:04 -07:00
Bjorn Helgaas
6c5705fec6 powerpc/PCI: get rid of device resource fixups
Tell the PCI core about host bridge address translation so it can take
care of bus-to-resource conversion for us.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:19:03 -07:00
Bjorn Helgaas
673c975624 powerpc/PCI: replace pci_probe_only with pci_flags
We already use pci_flags, so this just sets pci_flags directly and removes
the intermediate step of figuring out pci_probe_only, then using it to set
pci_flags.

The PCI core provides a pci_flags definition (currently __weak), so drop
the powerpc definitions in favor of that.

CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Bjorn Helgaas
3c13be017a powerpc/PCI: make pci_probe_only default to 0
pci_probe_only is set on ppc64 to prevent resource re-allocation
by the core. It's meant to be used in very specific circumstances
such as when operating under a hypervisor that may prevent such
re-allocation.

Instead of default to 1, we make it default to 0 and explicitly
set it in the few cases where we need it.

This fixes FSL PCI which wants it clear among others.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-02-23 20:18:58 -07:00
Paul Gortmaker
6d166fec12 ppc-6xx: fix build failure in flipper-pic.c and hlwd-pic.c
The commit bae1d8f199 (linux-next)

  "irq_domain/powerpc: Use common irq_domain structure instead of irq_host"

made this change:

   -static struct irq_host *flipper_irq_host;
   +static struct irq_domain *flipper_irq_host;

and this change:

   -static struct irq_host *hlwd_irq_host;
   +static struct irq_domain *hlwd_irq_host;

The intent was to change the type, and not the name, but then in a
couple of instances, it looks like the sed to change the irq_domain_ops
name inadvertently also changed the irq_host name where it was not
supposed to, causing build failures.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
2012-02-22 18:41:19 -07:00
Michael Ellerman
f2699491e0 powerpc/perf: Move perf core & PMU code into a subdirectory
The perf code has grown a lot since it started, and is big enough to
warrant its own subdirectory. For reference it's ~60% bigger than the
oprofile code. It declutters the kernel directory, makes it simpler to
grep for "just perf stuff", and allows us to shorten some filenames.

While we're at it, make it more obvious that we have two implementations
of the core perf logic. One for (roughly) Book3S CPUs, which was the
original implementation, and the other for Freescale embedded CPUs.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:04 +11:00
Mahesh Salgaonkar
12d9299241 fadump: Remove the phyp assisted dump code.
Remove the phyp assisted dump implementation which is not is use.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:03 +11:00
Mahesh Salgaonkar
67b43b9d7c fadump: Invalidate the fadump registration during machine shutdown.
If dump is active during system reboot, shutdown or halt then invalidate
the fadump registration as it does not get invalidated automatically.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:03 +11:00
Mahesh Salgaonkar
b500afff11 fadump: Invalidate registration and release reserved memory for general use.
This patch introduces an sysfs interface '/sys/kernel/fadump_release_mem' to
invalidate the last fadump registration, invalidate '/proc/vmcore', release
the reserved memory for general use and re-register for future kernel dump.
Once the dump is copied to the disk, unlike phyp dump, the userspace tool
can release all the memory reserved for dump with one single operation of
echo 1 to '/sys/kernel/fadump_release_mem'.

Release the reserved memory region excluding the size of the memory required
for future kernel dump registration. And therefore, unlike kdump, Fadump
doesn't need a 2nd reboot to get back the system to the production
configuration.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:02 +11:00
Mahesh Salgaonkar
d34c5f26cf fadump: Add PT_NOTE program header for vmcoreinfo
Introduce a PT_NOTE program header that points to physical address of
vmcoreinfo_note buffer declared in kernel/kexec.c. The vmcoreinfo
note buffer is populated during crash_fadump() at the time of system
crash.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:02 +11:00
Mahesh Salgaonkar
ebaeb5ae24 fadump: Convert firmware-assisted cpu state dump data into elf notes.
When registered for firmware assisted dump on powerpc, firmware preserves
the registers for the active CPUs during a system crash. This patch reads
the cpu register data stored in Firmware-assisted dump format (except for
crashing cpu) and converts it into elf notes and updates the PT_NOTE program
header accordingly. The exact register state for crashing cpu is saved to
fadump crash info structure in scratch area during crash_fadump() and read
during second kernel boot.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
2df173d9e8 fadump: Initialize elfcore header and add PT_LOAD program headers.
Build the crash memory range list by traversing through system memory during
the first kernel before we register for firmware-assisted dump. After the
successful dump registration, initialize the elfcore header and populate
PT_LOAD program headers with crash memory ranges. The elfcore header is
saved in the scratch area within the reserved memory. The scratch area starts
at the end of the memory reserved for saving RMR region contents. The
scratch area contains fadump crash info structure that contains magic number
for fadump validation and physical address where the eflcore header can be
found. This structure will also be used to pass some important crash info
data to the second kernel which will help second kernel to populate ELF core
header with correct data before it gets exported through /proc/vmcore. Since
the firmware preserves the entire partition memory at the time of crash the
contents of the scratch area will be preserved till second kernel boot.

Since the memory dump exported through /proc/vmcore is in ELF format similar
to kdump, it will help us to reuse the kdump infrastructure for dump capture
and filtering. Unlike phyp dump, userspace tool does not need to refer any
sysfs interface while reading /proc/vmcore.

NOTE: The current design implementation does not address a possibility of
introducing additional fields (in future) to this structure without affecting
compatibility. It's on TODO list to come up with better approach to
address this.

Reserved dump area start => +-------------------------------------+
                            |  CPU state dump data                |
                            +-------------------------------------+
                            |  HPTE region data                   |
                            +-------------------------------------+
                            |  RMR region data                    |
Scratch area start       => +-------------------------------------+
                            |  fadump crash info structure {      |
                            |     magic nummber                   |
                     +------|---- elfcorehdr_addr                 |
                     |      |  }                                  |
                     +----> +-------------------------------------+
                            |  ELF core header                    |
Reserved dump area end   => +-------------------------------------+

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
3ccc00a7e0 fadump: Register for firmware assisted dump.
On 2012-02-20 11:02:51 Mon, Paul Mackerras wrote:
> On Thu, Feb 16, 2012 at 04:44:30PM +0530, Mahesh J Salgaonkar wrote:
>
> If I have read the code correctly, we are going to get this printk on
> non-pSeries machines or on older pSeries machines, even if the user
> has not put the fadump=on option on the kernel command line.  The
> printk will be annoying since there is no actual error condition.  It
> seems to me that the condition for the printk should include
> fw_dump.fadump_enabled.  In other words you should probably add
>
> 	if (!fw_dump.fadump_enabled)
> 		return 0;
>
> at the beginning of the function.

Hi Paul,

Thanks for pointing it out. Please find the updated patch below.

The existing patches above this (4/10 through 10/10) cleanly applies
on this update.

Thanks,
-Mahesh.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Mahesh Salgaonkar
eb39c8803d fadump: Reserve the memory for firmware assisted dump.
Reserve the memory during early boot to preserve CPU state data, HPTE region
and RMA (real mode area) region data in case of kernel crash. At the time of
crash, powerpc firmware will store CPU state data, HPTE region data and move
RMA region data to the reserved memory area.

If the firmware-assisted dump fails to reserve the memory, then fallback
to existing kexec-based kdump.

Most of the code implementation to reserve memory has been
adapted from phyp assisted dump implementation written by Linas Vepstas
and Manish Ahuja

This patch also introduces a config option CONFIG_FA_DUMP for firmware
assisted dump feature on Powerpc (ppc64) architecture.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:01 +11:00
Kyle Moffett
e55d7f737d powerpc/mpic: Remove duplicate MPIC_WANTS_RESET flag
There are two separate flags controlling whether or not the MPIC is
reset during initialization, which is completely unnecessary, and only
one of them can be specified in the device tree.

Also, most platforms in-tree right now do actually want to reset the
MPIC during initialization anyways, which means lots of duplicate code
passing the MPIC_WANTS_RESET flag.

Fix all of the callers which currently do not pass the MPIC_WANTS_RESET
flag to pass the MPIC_NO_RESET flag, then remove the MPIC_WANTS_RESET
flag and make the code reset the MPIC by default.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett
c1b8d45db4 powerpc/mpic: Add "last-interrupt-source" property to override hardware
The FreeScale PowerQUICC-III-compatible (mpc85xx/mpc86xx) MPICs do not
correctly report the number of hardware interrupt sources, so software
needs to override the detected value with "256".

To avoid needing to write custom board-specific code to detect that
scenario, allow it to be easily overridden in the device-tree.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:50:00 +11:00
Kyle Moffett
5019609fce powerpc/mpic: Remove MPIC_BROKEN_FRR_NIRQS and duplicate irq_count
The mpic->irq_count variable is only used as a software error-checking
limit to determine whether or not an IRQ number is valid.  In board code
which does not manually specify an IRQ count to mpic_alloc(), i.e. 0, it
is automatically detected from the number of ISUs and the ISU size.

In practice, all hardware ends up with irq_count == num_sources, so all
of the runtime checks on mpic->irq_count should just check the value of
mpic->num_sources instead.

When platform hardware does not correctly report the number of IRQs,
which only happens on the MPC85xx/MPC86xx, the MPIC_BROKEN_FRR_NIRQS
flag is used to override the detected value of num_sources with the
manual irq_count parameter.  Since there's no need to manually specify
the number of IRQs except in this case, the extra flag can be eliminated
and the test changed to "irq_count != 0".

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Kyle Moffett
9ca163c860 fsl/mpic: Create and document the "single-cpu-affinity" device-tree flag
The Freescale MPIC (and perhaps others in the future) is incapable of
routing non-IPI interrupts to more than once CPU at a time.  Currently
all of the Freescale boards msut pass the MPIC_SINGLE_DEST_CPU flag to
mpic_alloc(), but that information should really be present in the
device-tree.

Older board code can't rely on the device-tree having the property set,
but newer platforms won't need it manually specified in the code.

[BenH: Remove unrelated changes, folded in a different patch]

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:59 +11:00
Kyle Moffett
98cca250ae fsl/mpic: Document and use the "big-endian" device-tree flag
The MPIC code checks for a "big-endian" property and sets the flag
MPIC_BIG_ENDIAN if one is present, although prior to the "mpic->flags"
fixup that would never have worked anways.

Unfortunately, even now that it works properly, the Freescale mpic
device-node (the "PowerQUICC-III"-compatible one) does not specify it,
so all of the board ports need to manually pass it to mpic_alloc().

Document the flag and add it to the pq3 device tree.  Existing code will
still need to pass the MPIC_BIG_ENDIAN flag because their dtb may not
have this property, but new platforms shouldn't need to do so.

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:58 +11:00
Kyle Moffett
3a7a7176e8 powerpc/mpic: Fix use of "flags" variable in mpic_alloc()
The mpic_alloc() function takes a "flags" parameter and assigns it into
the mpic->flags variable fairly early on, but several later pieces of
code detect various device-tree properties and save them into the
"mpic->flags" variable (EG: "big-endian" => MPIC_BIG_ENDIAN).

Unfortunately, a number of codepaths (including several which test the
flag MPIC_BIG_ENDIAN!) test "flags" instead of "mpic->flags", and get
wrong answers as a result.

Consolidate the device-tree flag tests early in mpic_alloc() and change
all of the checks after "mpic->flags" is init'ed to use "mpic->flags".

[BenH: Fixed up use of mpic->node before it's initialized]

Signed-off-by: Kyle Moffett <Kyle.D.Moffett@boeing.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-23 10:49:58 +11:00
Benjamin Herrenschmidt
18b246fa60 powerpc: Fix various issues with return to userspace
We have a few problems when returning to userspace. This is a
quick set of fixes for 3.3, I'll look into a more comprehensive
rework for 3.4. This fixes:

 - We kept interrupts soft-disabled when schedule'ing or calling
do_signal when returning to userspace as a result of a hardware
interrupt.

 - Rename do_signal to do_notify_resume like all other archs (and
do_signal_pending back to do_signal, which it was before Roland
changed it).

 - Add the missing call to key_replace_session_keyring() to
do_notify_resume().

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
---
2012-02-22 16:48:53 +11:00
Michael Ellerman
922b9f86a0 powerpc: Fix program check handling when lockdep is enabled
In commit 54321242af ("Disable interrupts early in Program Check"), we
switched from enabling to disabling interrupts in program_check_common.

Whereas ENABLE_INTS leaves r3 untouched, if lockdep is enabled DISABLE_INTS
calls into lockdep code and will clobber r3. That means we pass a bogus
struct pt_regs* into program_check_exception() and all hell breaks loose.

So load our regs pointer into r3 after we call DISABLE_INTS.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:49 +11:00
Rusty Russell
07d2f1a54a powerpc: Remove references to cpu_*_map
This has been obsolescent for a while; time for the final push.

In adjacent context, replaced old cpus_* with cpumask_*.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2012-02-22 16:48:47 +11:00
Grant Likely
0f22dd395f of: Only compile OF_DYNAMIC on PowerPC pseries and iseries
Only two architectures use the OF node reference counting and reclaim bits.
There is no need to compile it for the rest of the PowerPC platforms or for
any of the other architectures.  This patch makes iseries and pseries
select CONFIG_OF_DYNAMIC, and makes it default to off for everything else.

It is still safe to turn on CONFIG_OF_DYNAMIC on all architectures, it just
isn't necessary.

v2: Also select OF_DYNAMIC for PPC_CHROMA and MPC885ADS as reported by Michael
    Meuling

Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Jimi Xenidis <jimix@pobox.com> (for PPC_CHROMA bug fix)
Cc: Rob Herring <rob.herring@calxeda.com>
2012-02-21 13:33:00 -07:00
Pavel Emelyanov
ef64a54f6e sock: Introduce the SO_PEEK_OFF sock option
This one specifies where to start MSG_PEEK-ing queue data from. When
set to negative value means that MSG_PEEK works as ususally -- peeks
from the head of the queue always.

When some bytes are peeked from queue and the peeking offset is non
negative it is moved forward so that the next peek will return next
portion of data.

When non-peeking recvmsg occurs and the peeking offset is non negative
is is moved backward so that the next peek will still peek the proper
data (i.e. the one that would have been picked if there were no non
peeking recv in between).

The offset is set using per-proto opteration to let the protocol handle
the locking issues and to check whether the peeking offset feature is
supported by the protocol the socket belongs to.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:03:48 -05:00