Commit graph

3546 commits

Author SHA1 Message Date
Daniel Borkmann
286aad3c40 net: bpf: be friendly to kmemcheck
Reported by Mikulas Patocka, kmemcheck currently barks out a
false positive since we don't have special kmemcheck annotation
for bitfields used in bpf_prog structure.

We currently have jited:1, len:31 and thus when accessing len
while CONFIG_KMEMCHECK enabled, kmemcheck throws a warning that
we're reading uninitialized memory.

As we don't need the whole bit universe for pages member, we
can just split it to u16 and use a bool flag for jited instead
of a bitfield.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:58:56 -07:00
Daniel Borkmann
738cbe72ad net: bpf: consolidate JIT binary allocator
Introduced in commit 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks") and later on replicated in aa2d2c73c2
("s390/bpf,jit: address randomize and write protect jit code") for
s390 architecture, write protection for BPF JIT images got added and
a random start address of the JIT code, so that it's not on a page
boundary anymore.

Since both use a very similar allocator for the BPF binary header,
we can consolidate this code into the BPF core as it's mostly JIT
independant anyway.

This will also allow for future archs that support DEBUG_SET_MODULE_RONX
to just reuse instead of reimplementing it.

JIT tested on x86_64 and s390x with BPF test suite.

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-09 16:58:56 -07:00
Heiko Carstens
4423028203 s390/spinlock: optimize spin_unlock code
Use a memory barrier + store sequence instead of a load + compare and swap
sequence to unlock a spinlock and an rw lock.
For the spinlock case this saves us two memory reads and a not needed cpu
serialization after the compare and swap instruction stored the new value.

The kernel size (performance_defconfig) gets reduced by ~14k.

Average execution time of a tight inlined spin_unlock loop drops from
5.8ns to 0.7ns on a zEC12 machine.

An artificial stress test case where several counters are protected with
a single spinlock and which are only incremented while holding the spinlock
shows ~30% improvement on a 4 cpu machine.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:30 +02:00
Heiko Carstens
3d1e220d08 s390/ftrace: optimize mcount code
Reduce the number of executed instructions within the mcount block if
function tracing is enabled. We achieve that by using a non-standard
C function call ABI. Since the called function is also written in
assembler this is not a problem.
This also allows to replace the unconditional store at the beginning
of the mcount block with a larl instruction, which doesn't touch
memory.

In theory we could also patch the first instruction of the mcount block
to enable and disable function tracing. However this would break kprobes.
This could be fixed with implementing the "kprobes_on_ftrace" feature;
however keeping the odd jprobes working seems not to be possible without
a lot of code churn. Therefore keep the code easy and simply accept one
wasted 1-cycle "larl" instruction per function prologue.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:30 +02:00
Heiko Carstens
ea2f476990 s390/kprobes: remove unused jprobe_return_end()
Even if it has a __used annotation it is actually unused.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:29 +02:00
Heiko Carstens
5d6a016349 s390/ftrace: enforce DYNAMIC_FTRACE if FUNCTION_TRACER is selected
We have too many combinations for function tracing. Lets simply stick to
the most advanced option, so we don't have to care of other combinations.

This means we always select DYNAMIC_FTRACE if FUNCTION_TRACER is selected.

In the s390 Makefile also remove CONFIG_FTRACE_SYSCALLS since that
functionality got moved to architecture independent code in the meantime.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:29 +02:00
Heiko Carstens
10dec7dbd5 s390/ftrace: add HAVE_DYNAMIC_FTRACE_WITH_REGS support
This code is based on a patch from Vojtech Pavlik.
http://marc.info/?l=linux-s390&m=140438885114413&w=2

The actual implementation now differs significantly:
Instead of adding a second function "ftrace_regs_caller" which would be nearly
identical to the existing ftrace_caller function, the current ftrace_caller
function is now an alias to ftrace_regs_caller and always passes the needed
pt_regs structure and function_trace_op parameters unconditionally.

Besides that also use asm offsets to correctly allocate and access the new
struct pt_regs on the stack.

While at it we can make use of new instruction to get rid of some indirect
loads if compiled for new machines.

The passed struct pt_regs can be changed by the called function and it's new
contents will replace the current contents.

Note: to change the return address the embedded psw member of the pt_regs
structure must be changed. The psw member is right now incomplete, since
the mask part is missing. For all current use cases this should be sufficent.
Providing and restoring a sane mask would mean we need to add an epsw/lpswe
pair to the mcount code. Only these two instruction would cost us ~120 cycles
which currently seems not necessary.

Cc: Vojtech Pavlik <vojtech@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:28 +02:00
Heiko Carstens
2481a87b02 s390/ftrace: optimize function graph caller code
When the function graph tracer is disabled we can skip three additional
instructions. So let's just do this.

So if function tracing is enabled but function graph tracing is
runtime disabled, we get away with a single unconditional branch.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:28 +02:00
Heiko Carstens
0f1b1ff54b s390: pass march flag to assembly files as well
Currently the march flag gets only passed to C files, but not to
assembler files.
This means that we can't add new instructions like e.g. aghik to asm
files, since the assembler doesn't know of the new instructions if
the appropriate march flag isn't specified.

So also pass the march flag when compiling assembler files as well.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:28 +02:00
Martin Schwidefsky
b7eacb59cd s390/vdso: add vdso support for coarse clocks
Add CLOCK_REALTIME_COARSE and CLOCK_MONOTONIC_COARSE optimization to
the 64-bit and 31-bit vdso.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:27 +02:00
Martin Schwidefsky
070b7be633 s390/vdso: replace stck with stcke
If gettimeofday / clock_gettime are called multiple times in a row
the STCK instruction will stall until a difference in the result is
visible. This unnecessarily slows down the vdso calls, use stcke
instead of stck to get rid of the stall.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:27 +02:00
Heiko Carstens
b7d5006de1 s390: remove unused MACHINE_FLAG_RRBM
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-09 08:53:26 +02:00
Linus Torvalds
35af25616c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "A bug fix for the vdso code, the loadparm for booting from SCSI is
  added and the access permissions for the dasd module parameters are
  corrected"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/vdso: remove NULL pointer check from clock_gettime
  s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
  s390/dasd: Make module parameter visible in sysfs
2014-09-08 08:27:00 -07:00
David S. Miller
eb84d6b604 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2014-09-07 21:41:53 -07:00
Linus Torvalds
2b12164b55 A smattering of bug fixes across most architectures.
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQIcBAABAgAGBQJUCi7SAAoJEBvWZb6bTYbyuhgP+wQXwL1W4f6c219PMmz/Hpiz
 OCRCTFpz8eOC/e1VG5zCcocu7FisG43auH5WqDnyr8RcC8RZlfKcpIwvIAgGk1oP
 RusqDbhHXSo+mWCYAlVDwGeAagc1sgxjpHAKJ0uLT7Rsv7hJp88g/KE5JbOX+M6k
 UfO+bMSvys5eFCct57O5ZgLWR58k++9CC0f6U5B/uMkwYCnPz32YQL83ebpBLPvT
 vHcRVgUPXj7pg4ng8uVALvLJTewEKK8aG5o8LXtOmmOdYgccxSouy5wrX00g48pb
 lCB3UZrKZ07AfEgdK/06oz9RwqUUpu58VZSE4DVlEhEcPTx0Uy6k1PMw6utAJi5r
 WZ+Ws3IBQMfp3oqWJmdBLte0JAjK89glhqjrrseXjtux1piknyPfquPB/tGw6wxv
 rBMq4r64KJwcpL2DMYZGbHpZ7vbfDTJ7aYZvHBp2YRFnR9YE7lqGt3VJpp9WHkZT
 7SMvIVFEdF1SN4jXLZ4+3tno5hPlH+MbeteCT6ZweqVfSQzHEo7AvriKQS0wPoBm
 rOMZvO7SMSctHkBBqnTkXHnZ8r0J+v89VZr85ayQC/FHUEp6nFdYNiqqO54VkTfE
 BKuoepRfvjAK40hnWIWlPUMmzK1tzZwxB9vU+ghJ2yJWZMo4oIXjwg6fvYfa8Ar/
 r68L21dou0TNpUpyIdvN
 =vBFm
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
 "A smattering of bug fixes across most architectures"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  powerpc/kvm/cma: Fix panic introduces by signed shift operation
  KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
  KVM: s390/mm: Fix storage key corruption during swapping
  arm/arm64: KVM: Complete WFI/WFE instructions
  ARM/ARM64: KVM: Nuke Hyp-mode tlbs before enabling MMU
  KVM: s390/mm: try a cow on read only pages for key ops
  KVM: s390: Fix user triggerable bug in dead code
2014-09-06 16:42:12 -07:00
Daniel Borkmann
60a3b2253c net: bpf: make eBPF interpreter images read-only
With eBPF getting more extended and exposure to user space is on it's way,
hardening the memory range the interpreter uses to steer its command flow
seems appropriate.  This patch moves the to be interpreted bytecode to
read-only pages.

In case we execute a corrupted BPF interpreter image for some reason e.g.
caused by an attacker which got past a verifier stage, it would not only
provide arbitrary read/write memory access but arbitrary function calls
as well. After setting up the BPF interpreter image, its contents do not
change until destruction time, thus we can setup the image on immutable
made pages in order to mitigate modifications to that code. The idea
is derived from commit 314beb9bca ("x86: bpf_jit_comp: secure bpf jit
against spraying attacks").

This is possible because bpf_prog is not part of sk_filter anymore.
After setup bpf_prog cannot be altered during its life-time. This prevents
any modifications to the entire bpf_prog structure (incl. function/JIT
image pointer).

Every eBPF program (including classic BPF that are migrated) have to call
bpf_prog_select_runtime() to select either interpreter or a JIT image
as a last setup step, and they all are being freed via bpf_prog_free(),
including non-JIT. Therefore, we can easily integrate this into the
eBPF life-time, plus since we directly allocate a bpf_prog, we have no
performance penalty.

Tested with seccomp and test_bpf testsuite in JIT/non-JIT mode and manual
inspection of kernel_page_tables.  Brad Spengler proposed the same idea
via Twitter during development of this patch.

Joint work with Hannes Frederic Sowa.

Suggested-by: Brad Spengler <spender@grsecurity.net>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Kees Cook <keescook@chromium.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-05 12:02:48 -07:00
Andy Lutomirski
a4412fc948 seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing
The secure_computing function took a syscall number parameter, but
it only paid any attention to that parameter if seccomp mode 1 was
enabled.  Rather than coming up with a kludge to get the parameter
to work in mode 2, just remove the parameter.

To avoid churn in arches that don't have seccomp filters (and may
not even support syscall_get_nr right now), this leaves the
parameter in secure_computing_strict, which is now a real function.

For ARM, this is a bit ugly due to the fact that ARM conditionally
supports seccomp filters.  Fixing that would probably only be a
couple of lines of code, but it should be coordinated with the audit
maintainers.

This will be a slight slowdown on some arches.  The right fix is to
pass in all of seccomp_data instead of trying to make just the
syscall nr part be fast.

This is a prerequisite for making two-phase seccomp work cleanly.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: linux-mips@linux-mips.org
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: linux-s390@vger.kernel.org
Cc: x86@kernel.org
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
2014-09-03 14:58:17 -07:00
Christian Borntraeger
1951497d90 KVM: s390/mm: Fix guest storage key corruption in ptep_set_access_flags
commit 0944fe3f4a ("s390/mm: implement software referenced bits")
triggered another paging/storage key corruption. There is an
unhandled invalid->valid pte change where we have to set the real
storage key from the pgste.
When doing paging a guest page might be swapcache or swap and when
faulted in it might be read-only and due to a parallel scan old.
An do_wp_page will make it writeable and young. Due to software
reference tracking this page was invalid and now becomes valid.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
2014-09-02 10:30:43 +02:00
Christian Borntraeger
3e03d4c46d KVM: s390/mm: Fix storage key corruption during swapping
Since 3.12 or more precisely  commit 0944fe3f4a ("s390/mm:
implement software referenced bits") guest storage keys get
corrupted during paging. This commit added another valid->invalid
translation for page tables - namely ptep_test_and_clear_young.
We have to transfer the storage key into the pgste in that case.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: stable@vger.kernel.org # v3.12+
2014-09-02 10:30:43 +02:00
Martin Schwidefsky
5da76157a4 s390/vdso: remove NULL pointer check from clock_gettime
The explicit NULL pointer check on the timespec argument is only
required for clock_getres but not for clock_gettime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-01 09:56:29 +02:00
Michael Holzheu
6992860167 s390/ipl: Add missing SCSI loadparm attributes to /sys/firmware
Currently the loadparm is only supported for CCW IPL. But also for SCSI
IPL it can be specified either on the HMC load panel respectively
z/VM console or via diagnose 308.

So fix this for SCSI and add the required sysfs attributes for reading the
IPL loadparm and for setting the loadparm for re-IPL.

With this patch the following two sysfs attributes are introduced:

 - /sys/firmware/ipl/loadparm (for system that have been IPLed from SCSI)
 - /sys/firmware/reipl/fcp/loadparm

Because the loadparm is now available for SCSI and CCW it is moved
now from "struct ipl_block_ccw" to the generic "struct ipl_list_hdr".

Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-09-01 09:56:29 +02:00
Vivek Goyal
b41d34b46a kexec: remove CONFIG_KEXEC dependency on crypto
New system call depends on crypto.  As it did not have a separate config
option, CONFIG_KEXEC was modified to select CRYPTO and CRYPTO_SHA256.

But now previous patch introduced a new config option for new syscall.
So CONFIG_KEXEC does not require crypto.  Remove that dependency.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Cc: Eric Biederman <ebiederm@xmission.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-29 16:28:16 -07:00
Radim Krčmář
13a34e067e KVM: remove garbage arg to *hardware_{en,dis}able
In the beggining was on_each_cpu(), which required an unused argument to
kvm_arch_ops.hardware_{en,dis}able, but this was soon forgotten.

Remove unnecessary arguments that stem from this.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Radim Krčmář
0865e636ae KVM: static inline empty kvm_arch functions
Using static inline is going to save few bytes and cycles.
For example on powerpc, the difference is 700 B after stripping.
(5 kB before)

This patch also deals with two overlooked empty functions:
kvm_arch_flush_shadow was not removed from arch/mips/kvm/mips.c
  2df72e9bc KVM: split kvm_arch_flush_shadow
and kvm_arch_sched_in never made it into arch/ia64/kvm/kvm-ia64.c.
  e790d9ef6 KVM: add kvm_arch_sched_in

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:55 +02:00
Paolo Bonzini
656473003b KVM: forward declare structs in kvm_types.h
Opaque KVM structs are useful for prototypes in asm/kvm_host.h, to avoid
"'struct foo' declared inside parameter list" warnings (and consequent
breakage due to conflicting types).

Move them from individual files to a generic place in linux/kvm_types.h.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 16:35:53 +02:00
Christian Borntraeger
dc77d344b4 KVM: s390/mm: fix up indentation of set_guest_storage_key
commit ab3f285f22 ("KVM: s390/mm: try a cow on read only pages for
key ops")' misaligned a code block. Let's fixup the indentation.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-29 13:46:54 +02:00
Christoph Lameter
eb7e7d7663 s390: Replace __get_cpu_var uses
__get_cpu_var() is used for multiple purposes in the kernel source. One of
them is address calculation via the form &__get_cpu_var(x).  This calculates
the address for the instance of the percpu variable of the current processor
based on an offset.

Other use cases are for storing and retrieving data from the current
processors percpu area.  __get_cpu_var() can be used as an lvalue when
writing data or on the right side of an assignment.

__get_cpu_var() is defined as :

#define __get_cpu_var(var) (*this_cpu_ptr(&(var)))

__get_cpu_var() always only does an address determination. However, store
and retrieve operations could use a segment prefix (or global register on
other platforms) to avoid the address calculation.

this_cpu_write() and this_cpu_read() can directly take an offset into a
percpu area and use optimized assembly code to read and write per cpu
variables.

This patch converts __get_cpu_var into either an explicit address
calculation using this_cpu_ptr() or into a use of this_cpu operations that
use the offset.  Thereby address calculations are avoided and less registers
are used when code is generated.

At the end of the patch set all uses of __get_cpu_var have been removed so
the macro is removed too.

The patch set includes passes over all arches as well. Once these operations
are used throughout then specialized macros can be defined in non -x86
arches as well in order to optimize per cpu access by f.e.  using a global
register that may be set to the per cpu base.

Transformations done to __get_cpu_var()

1. Determine the address of the percpu instance of the current processor.

	DEFINE_PER_CPU(int, y);
	int *x = &__get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(&y);

2. Same as #1 but this time an array structure is involved.

	DEFINE_PER_CPU(int, y[20]);
	int *x = __get_cpu_var(y);

    Converts to

	int *x = this_cpu_ptr(y);

3. Retrieve the content of the current processors instance of a per cpu
variable.

	DEFINE_PER_CPU(int, y);
	int x = __get_cpu_var(y)

   Converts to

	int x = __this_cpu_read(y);

4. Retrieve the content of a percpu struct

	DEFINE_PER_CPU(struct mystruct, y);
	struct mystruct x = __get_cpu_var(y);

   Converts to

	memcpy(&x, this_cpu_ptr(&y), sizeof(x));

5. Assignment to a per cpu variable

	DEFINE_PER_CPU(int, y)
	__get_cpu_var(y) = x;

   Converts to

	this_cpu_write(y, x);

6. Increment/Decrement etc of a per cpu variable

	DEFINE_PER_CPU(int, y);
	__get_cpu_var(y)++

   Converts to

	this_cpu_inc(y)

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: linux390@de.ibm.com
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
2014-08-26 13:45:52 -04:00
Paolo Bonzini
a7428c3ded KVM: s390: Fixes and features for 3.18 part 1
1. The usual cleanups: get rid of duplicate code, use defines, factor
    out the sync_reg handling, additional docs for sync_regs, better
    error handling on interrupt injection
 2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes
 3. Additional registers for kvm_run sync regs. This is usually not
    needed in the fast path due to eventfd/irqfd, but kvm stat claims
    that we reduced the overhead of console output by ~50% on my system
 4. A rework of the gmap infrastructure. This is the 2nd step towards
    host large page support (after getting rid of the storage key
    dependency). We introduces two radix trees to store the guest-to-host
    and host-to-guest translations. This gets us rid of most of
    the page-table walks in the gmap code. Only one in __gmap_link is left,
    this one is required to link the shadow page table to the process page
    table. Finally this contains the plumbing to support gmap page tables
    with less than 5 levels.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT/EEwAAoJEBF7vIC1phx8UOQQAKifhckkQi39bVlw75y6Is0u
 YGeBp63zJqJ4mqZXUc3CL7GOKp9beps+uHj9KqlHIPJMg6oytWf+6fgr2kygQZFh
 kMdlITphF5AezklAtocVu4LumRjslAhRkc6kWE2J21w9xUeSExzpavUc9kYmj8W8
 81BJMcG0/xRiZSQ+GynRPn6tk9+zgIMEmUmBQoHXfWElBGNUhIJi9xSfoKOrlDho
 on2PgDYSfnwfftKDaq7ttPA4ApHLxyiOpoWXnldy1SSiy1MdZpXNbKLEiuRf5g9R
 2k3sJmvBxNfb3CRJuhyKAqvDbt+u+NLEktSJcky61H1R5J23oobfVBDsD2vum8Ah
 ZJIwSM9H/Hi6FqJQCxywkyU1Vj+Wn7U2NYPrFLpi4apfSVi9uxXCFtWzV3WPSay4
 mM87ZRd8mFQRz5DTTfK5VNAraNk3m7XTWzZyo8vQ350vk2xQDIw7X6PB+7YqAmu6
 99ikYtAN/0fAUIXXx8ORbcjhPLJzEyfylJMQ2Dz+aGWS7wEdb7P+9xSAJfFzMbJQ
 DWwlJVKZuc4OP3gsEyYQlB2EI+P2iJIIA5uyIPcEOPZeUHpZ1skD/ked0LHBZkwm
 jj5eIcbjaiPbIcwaOlPyF5H68O/XPE7TnSSdJhpa6vp6uVMUZ0O9XWjJE+eeqKv2
 X5j8yraVuyTJfEZ9RGQP
 =KrOE
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-next-20140825' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: Fixes and features for 3.18 part 1

1. The usual cleanups: get rid of duplicate code, use defines, factor
   out the sync_reg handling, additional docs for sync_regs, better
   error handling on interrupt injection
2. We use KVM_REQ_TLB_FLUSH instead of open coding tlb flushes
3. Additional registers for kvm_run sync regs. This is usually not
   needed in the fast path due to eventfd/irqfd, but kvm stat claims
   that we reduced the overhead of console output by ~50% on my system
4. A rework of the gmap infrastructure. This is the 2nd step towards
   host large page support (after getting rid of the storage key
   dependency). We introduces two radix trees to store the guest-to-host
   and host-to-guest translations. This gets us rid of most of
   the page-table walks in the gmap code. Only one in __gmap_link is left,
   this one is required to link the shadow page table to the process page
   table. Finally this contains the plumbing to support gmap page tables
   with less than 5 levels.
2014-08-26 14:31:44 +02:00
Martin Schwidefsky
f079e95214 KVM: s390/mm: remove outdated gmap data structures
The radix tree rework removed all code that uses the gmap_rmap
and gmap_pgtable data structures. Remove these outdated definitions.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:03 +02:00
Martin Schwidefsky
c6c956b80b KVM: s390/mm: support gmap page tables with less than 5 levels
Add an addressing limit to the gmap address spaces and only allocate
the page table levels that are needed for the given limit. The limit
is fixed and can not be changed after a gmap has been created.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:03 +02:00
Martin Schwidefsky
527e30b41d KVM: s390/mm: use radix trees for guest to host mappings
Store the target address for the gmap segments in a radix tree
instead of using invalid segment table entries. gmap_translate
becomes a simple radix_tree_lookup, gmap_fault is split into the
address translation with gmap_translate and the part that does
the linking of the gmap shadow page table with the process page
table.
A second radix tree is used to keep the pointers to the segment
table entries for segments that are mapped in the guest address
space. On unmap of a segment the pointer is retrieved from the
radix tree and is used to carry out the segment invalidation in
the gmap shadow page table. As the radix tree can only store one
pointer, each host segment may only be mapped to exactly one
guest location.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-26 10:09:02 +02:00
Thierry Reding
90114d65fe s390: Implement dma_{alloc,free}_attrs()
The S390 architecture advertises support for HAVE_DMA_ATTRS when PCI is
enabled. Patches to unify some of the DMA API would like to rely on the
dma_alloc_attrs() and dma_free_attrs() functions to be provided when an
architecture supports DMA attributes.

Rename dma_alloc_coherent() and dma_free_coherent() to dma_alloc_attrs()
and dma_free_attrs() since they are functionally equivalent and alias
the former to the latter for compatibility.

For consistency with other architectures, also reuse the existing symbol
HAVE_DMA_ATTRS defined in arch/Kconfig instead of providing a duplicate.
Select it when PCI is enabled.

While at it, drop a redundant 'default n' from the PCI Kconfig symbol.

Signed-off-by: Thierry Reding <treding@nvidia.com>
Acked-By: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2014-08-26 07:39:12 +02:00
Paolo Bonzini
7cd4b90a73 Here are two fixes for s390 KVM code that prevent:
1. a malicious user to trigger a kernel BUG
 2. a malicious user to change the storage key of read-only pages
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJT+y/oAAoJEBF7vIC1phx82FoP/0QRqR5Dw5W8nJFFCrft2n/9
 lrFop/WvVeCtVUXbeYVLoUOFN3id6BAiZeessYS/T6V0HThCx2fKlxDZrik2hQ2o
 OtVTR0aIotPWTzMYxIapBPpo4j/LFyMbDgVg/VWgt8+89DSfocY3g7Zv/gwtwCZp
 dhHU6McCZrhsCKTC7IoAR33IsOaeGbkfMrFWQ30TfDam/dxB3i4ZBRhzCLSPmqu/
 V+PNdYinXSZWvq7jFa6//x3gSwXTAZx643nHmIt94c5fXd7ZXxT8fD1dw1c6FHK3
 mVwP/VRA2DeaDE2n7mkFUI6LxghQtNKyv1uF8QE1wVmLYGwrGSoSwt7uk5/hoqFi
 XSwWDPFRPhnBQ6NAyFi4DN4FGr2kPV/EUpKge6laY6dtfbpEQY0MnHMJj/vkKBZh
 fkZZ5Y2XlfY4QnuoDBMGsn65y0izkI0YlAsB0ett2gVjhhiW50XVAUFd5RbaVKSF
 cQlgvB/iLHmN01QmYlLfFFN942wYjWAnZR0BPCySBkzpNj8SIEzWNX5my653uSPX
 WCZRMXNE4+5oydhd4L4uBwd9QzQnHnUaXPA/VeOwtDWA8wiDXmKzQM2o1vq2Dbva
 in7FvD7z/JbPXQ/XF+DcwVtkghel3uke84QvpfJ2TkK6M5bjkCIl5eVRwiS2jMvM
 CJMO7HTreugz+ZrmHZxl
 =ED9q
 -----END PGP SIGNATURE-----

Merge tag 'kvm-s390-20140825' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

Here are two fixes for s390 KVM code that prevent:
1. a malicious user to trigger a kernel BUG
2. a malicious user to change the storage key of read-only pages
2014-08-25 15:37:00 +02:00
Martin Schwidefsky
6e0a0431bf KVM: s390/mm: cleanup gmap function arguments, variable names
Make the order of arguments for the gmap calls more consistent,
if the gmap pointer is passed it is always the first argument.
In addition distinguish between guest address and user address
by naming the variables gaddr for a guest address and vmaddr for
a user address.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:58 +02:00
Martin Schwidefsky
9da4e38076 KVM: s390/mm: readd address parameter to gmap_do_ipte_notify
Revert git commit c3a23b9874c1 ("remove unnecessary parameter from
gmap_do_ipte_notify").

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:57 +02:00
Martin Schwidefsky
55dbbdd9a8 KVM: s390/mm: readd address parameter to pgste_ipte_notify
Revert git commit 1b7fd6952063 ("remove unecessary parameter from
pgste_ipte_notify")

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:57 +02:00
Jens Freimann
331cbc277e KVM: s390: don't use kvm lock in interrupt injection code
The kvm lock protects us against vcpus going away, but they only go
away when the virtual machine is shut down. We don't need this
mutex here, so let's get rid of it.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:56 +02:00
Jens Freimann
7939503147 KVM: s390: return -EFAULT if lowcore is not mapped during irq delivery
Currently we just kill the userspace process and exit the thread
immediatly without making sure that we don't hold any locks etc.

Improve this by making KVM_RUN return -EFAULT if the lowcore is not
mapped during interrupt delivery. To achieve this we need to pass
the return code of guest memory access routines used in interrupt
delivery all the way back to the KVM_RUN ioctl.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:56 +02:00
David Hildenbrand
d3d692c82e KVM: s390: implement KVM_REQ_TLB_FLUSH and make use of it
Use the KVM_REQ_TLB_FLUSH request in order to trigger tlb flushes instead
of manipulating the SIE control block whenever we need it. Also trigger it for
a control register sync directly instead of (ab)using kvm_s390_set_prefix().

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:55 +02:00
David Hildenbrand
b028ee3edd KVM: s390: synchronize more registers with kvm_run
In order to reduce the number of syscalls when dropping to user space, this
patch enables the synchronization of the following "registers" with kvm_run:
- ARCH0: CPU timer, clock comparator, TOD programmable register,
         guest breaking-event register, program parameter
- PFAULT: pfault parameters (token, select, compare)

The registers are grouped to reduce the overhead when syncing.

As this grows the number of sync registers quite a bit, let's move the code
synchronizing registers with kvm_run from kvm_arch_vcpu_ioctl_run() into
separate helper routines.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:53 +02:00
Christian Borntraeger
c3950b66b9 KVM: s390: no special machine check delivery
The load PSW handler does not have to inject pending machine checks.
This can wait until the CPU runs the generic interrupt injection code.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
2014-08-25 14:35:30 +02:00
David Hildenbrand
fbfa304963 KVM: s390: clear kvm_dirty_regs when dropping to user space
We should make sure that all kvm_dirty_regs bits are cleared before dropping
to user space. Until now, some would remain pending.

Signed-off-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:30 +02:00
Jens Freimann
8a2ef71b0b KVM: s390: factor out get_ilc() function
Let's make this a reusable function.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:29 +02:00
Christian Borntraeger
ab3f285f22 KVM: s390/mm: try a cow on read only pages for key ops
The PFMF instruction handler  blindly wrote the storage key even if
the page was mapped R/O in the host. Lets try a COW before continuing
and bail out in case of errors.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
2014-08-25 14:35:28 +02:00
Jens Freimann
44c6ca3d1b KVM: s390: add defines for pfault init delivery code
Get rid of open coded values for pfault init.

Signed-off-by: Jens Freimann <jfrei@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
2014-08-25 14:35:28 +02:00
Christian Borntraeger
614a80e474 KVM: s390: Fix user triggerable bug in dead code
In the early days, we had some special handling for the
KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit
d7b0b5eb30 (KVM: s390: Make psw available on all exits, not
just a subset).

Now this switch statement is just a sanity check for userspace
not messing with the kvm_run structure. Unfortunately, this
allows userspace to trigger a kernel BUG. Let's just remove
this switch statement.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Reviewed-by: David Hildenbrand <dahi@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
2014-08-25 14:35:15 +02:00
Radim Krčmář
e790d9ef64 KVM: add kvm_arch_sched_in
Introduce preempt notifiers for architecture specific code.
Advantage over creating a new notifier in every arch is slightly simpler
code and guaranteed call order with respect to kvm_sched_in.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-08-21 18:45:21 +02:00
Heiko Carstens
7bb1cdbfe2 s390: wire up memfd_create syscall
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-12 13:00:08 +02:00
Martin Schwidefsky
bcfcbb6bae s390: add system information as device randomness
The virtual-machine cpu information data block and the cpu-id of
the boot cpu can be used as source of device randomness.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-12 13:00:07 +02:00
Michael Holzheu
852ffd0f4e s390/kdump: Clear subchannel ID to signal non-CCW/SCSI IPL
For CCW and SCSI IPL the hardware sets the subchannel ID and number
correctly at 0xb8. For kdump at 0xb8 normally there is the data of
the previously IPLed system.

In order to be clean now for kdump and kexec always set the subchannel
ID and number to zero. This tells the next OS that no CCW/SCSI IPL
has been done.

Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
2014-08-12 13:00:06 +02:00