Commit graph

114020 commits

Author SHA1 Message Date
Andy Lutomirski
7841b40871 x86/entry/compat: Implement opportunistic SYSRETL for compat syscalls
If CS, SS and IP are as expected and FLAGS is compatible with
SYSRETL, then return from fast compat syscalls (both SYSCALL and
SYSENTER) using SYSRETL.

Unlike native 64-bit opportunistic SYSRET, this is not invisible
to user code: RCX and R8-R15 end up in a different state than
shown saved in pt_regs.  To compensate, we only do this when
returning to the vDSO fast syscall return path.  This won't
interfere with syscall restart, as we won't use SYSRETL when
returning to the INT80 restart instruction.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/aa15e49db33773eb10b73d73466b6d5466d7856a.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:10 +02:00
Andy Lutomirski
a474e67c91 x86/vdso/compat: Wire up SYSENTER and SYSCSALL for compat userspace
What, you didn't realize that SYSENTER and SYSCALL were actually
the same thing? :)

Unlike the old code, this actually passes the ptrace_syscall_32
test on AMD systems.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/b74615af58d785aa02d917213ec64e2022a2c796.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:09 +02:00
Andy Lutomirski
710246df58 x86/entry: Add C code for fast system call entries
This handles both SYSENTER and SYSCALL.  The asm glue will take
care of the differences.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/6041a58a9b8ef6d2522ab4350deb1a1945eb563f.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:09 +02:00
Andy Lutomirski
ee08c6bd31 x86/entry/64/compat: Migrate the body of the syscall entry to C
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/a2f0fce68feeba798a24339b5a7ec1ec2dd9eaf7.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:09 +02:00
Andy Lutomirski
bd2d3a3ba6 x86/entry: Add do_syscall_32(), a C function to do 32-bit syscalls
System calls are really quite simple.  Add a helper to call
a 32-bit system call.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/a77ed179834c27da436fb4a7fb23c8ee77abc11c.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:08 +02:00
Andy Lutomirski
eb974c6256 x86/syscalls: Give sys_call_ptr_t a useful type
Syscalls are asmlinkage functions (on 32-bit kernels), take six
args of type unsigned long, and return long.  Note that uml
could probably be slightly cleaned up on top of this patch.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/4d3ecc4a169388d47009175408b2961961744e6f.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:08 +02:00
Andy Lutomirski
034042cc1e x86/entry/syscalls: Move syscall table declarations into asm/syscalls.h
The header was missing some compat declarations.

Also make sys_call_ptr_t have a consistent type.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/3166aaff0fb43897998fcb6ef92991533f8c5c6c.1444091585.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:08 +02:00
Andy Lutomirski
8169aff611 x86/entry/64/compat: Set up full pt_regs for all compat syscalls
This is conceptually simpler.  More importantly, it eliminates
the PTREGSCALL and execve stubs, which were not compatible with
the C ABI.  This means that C code can call through the compat
syscall table.

The execve stubs are a bit subtle.  They did two things: they
cleared some registers and they forced slow-path return.
Neither is necessary any more: elf_common_init clears the extra
registers and start_thread calls force_iret().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f95b7f7dfaacf88a8cae85bb06226cae53769287.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:07 +02:00
Andy Lutomirski
2ec67971fa x86/entry/64/compat: Remove most of the fast system call machinery
We now have only one code path that calls through the compat
syscall table.  This will make it much more pleasant to change
the pt_regs vs register calling convention, which we need to do
to move the call into C.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/320cda5573cefdc601b955d23fbe8f36c085432d.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:07 +02:00
Andy Lutomirski
c5f638ac90 x86/entry/64/compat: Remove audit optimizations
These audit optimizations are messy and hard to maintain.  We'll
get a similar effect from opportunistic sysret when fast compat
system calls are re-implemented.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/0bcca79ac7ff835d0e5a38725298865b01347a82.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:07 +02:00
Andy Lutomirski
e62a254a1f x86/entry/64/compat: Disable SYSENTER and SYSCALL32 entries
We've disabled the vDSO helpers to call them, so turn off the
entries entirely (temporarily) in preparation for cleaning them
up.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/8d6e84bf651519289dc532dcc230adfabbd2a3eb.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:07 +02:00
Andy Lutomirski
8242c6c84a x86/vdso/32: Save extra registers in the INT80 vsyscall path
The goal is to integrate the SYSENTER and SYSCALL32 entry paths
with the INT80 path.  SYSENTER clobbers ESP and EIP.  SYSCALL32
clobbers ECX (and, invisibly, R11).  SYSRETL (long mode to
compat mode) clobbers ECX and, invisibly, R11.  SYSEXIT (which
we only need for native 32-bit) clobbers ECX and EDX.

This means that we'll need to provide ESP to the kernel in a
register (I chose ECX, since it's only needed for SYSENTER) and
we need to provide the args that normally live in ECX and EDX in
memory.

The epilogue needs to restore ECX and EDX, since user code
relies on regs being preserved.

We don't need to do anything special about EIP, since the kernel
already knows where we are.  The kernel will eventually need to
know where int $0x80 lands, so add a vdso_image entry for it.

The only user-visible effect of this code is that ptrace-induced
changes to ECX and EDX during fast syscalls will be lost.  This
is already the case for the SYSENTER path.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/b860925adbee2d2627a0671fbfe23a7fd04127f8.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:06 +02:00
Andy Lutomirski
7bcdea4d05 x86/elf/64: Clear more registers in elf_common_init()
Before we start calling execve in contexts that honor the full
pt_regs, we need to teach it to initialize all registers.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/65a38a9edee61a1158cfd230800c61dbd963dac5.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:06 +02:00
Andy Lutomirski
29c0ce9508 x86/vdso: Replace hex int80 CFI annotations with GAS directives
Maintaining the current CFI annotations written in R'lyehian is
difficult for most of us.  Translate them to something a little
closer to English.

This will remove the CFI data for kernels built with extremely
old versions of binutils.  I think this is a fair tradeoff for
the ability for mortals to edit the asm.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/ae3ff4ff5278b4bfc1e1dab368823469866d4b71.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:06 +02:00
Andy Lutomirski
f24f910884 x86/vdso: Define BUILD_VDSO while building and emit .eh_frame in asm
For the vDSO, user code wants runtime unwind info.  Make sure
that, if we use .cfi directives, we generate it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/16e29ad8855e6508197000d8c41f56adb00d7580.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:05 +02:00
Andy Lutomirski
7b956f035a x86/asm: Re-add parts of the manual CFI infrastructure
Commit:

  131484c8da ("x86/debug: Remove perpetually broken, unmaintainable dwarf annotations")

removed all the manual DWARF annotations outside the vDSO.  It also removed
the macros we used for the manual annotations.

Re-add these macros so that we can clean up the vDSO annotations.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/4c70bb98a8b773c8ccfaabf6745e569ff43e7f65.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-09 09:41:05 +02:00
Andy Lutomirski
0a6d1fa0d2 x86/vdso: Remove runtime 32-bit vDSO selection
32-bit userspace will now always see the same vDSO, which is
exactly what used to be the int80 vDSO.  Subsequent patches will
clean it up and make it support SYSENTER and SYSCALL using
alternatives.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/e7e6b3526fa442502e6125fe69486aab50813c32.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:08 +02:00
Andy Lutomirski
b611acf473 x86/entry/64/compat: After SYSENTER, move STI after the NT fixup
We eventually want to make it all the way into C code before
enabling interrupts.  We need to rework our flags handling
slightly to delay enabling interrupts.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/35d24d2a9305da3182eab7b2cdfd32902e90962c.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:08 +02:00
Andy Lutomirski
72f924783b x86/entry, locking/lockdep: Move lockdep_sys_exit() to prepare_exit_to_usermode()
Rather than worrying about exactly where LOCKDEP_SYS_EXIT should
go in the asm code, add it to prepare_exit_from_usermode() and
remove all of the asm calls that are followed by
prepare_exit_to_usermode().

LOCKDEP_SYS_EXIT now appears only in the syscall fast paths.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1736ebe948b845e68120b86b89091f3ec27f5e8e.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:07 +02:00
Andy Lutomirski
dd27f998f0 x86/entry/64/compat: Fix SYSENTER's NT flag before user memory access
Clearing NT is part of the prologue, whereas loading up arg6
makes more sense to think about as part of syscall processing.
Reorder them.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/19eb235828b2d2a52c53459e09f2974e15e65a35.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:07 +02:00
Andy Lutomirski
7e0f51cb44 x86/uaccess: Add unlikely() to __chk_range_not_ok() failure paths
This should improve code quality a bit. It also shrinks the kernel text:

 Before:
       text     data      bss       dec    filename
   21828379  5194760  1277952  28301091    vmlinux

 After:
       text     data      bss       dec    filename
   21827997  5194760  1277952  28300709    vmlinux

... by 382 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/f427b8002d932e5deab9055e0074bb4e7e80ee39.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:06 +02:00
Andy Lutomirski
a76cf66e94 x86/uaccess: Tell the compiler that uaccess is unlikely to fault
GCC doesn't realize that get_user(), put_user(), and their __
variants are unlikely to fail.  Tell it.

I noticed this while playing with the C entry code.

 Before:
       text     data      bss       dec    filename
   21828763  5194760  1277952  28301475    vmlinux.baseline

 After:
      text      data      bss       dec    filename
   21828379  5194760  1277952  28301091    vmlinux.new

The generated code shrunk by 384 bytes.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/dc37bed7024319c3004d950d57151fca6aeacf97.1444091584.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:34:06 +02:00
Ingo Molnar
25a9a924c0 Merge branch 'linus' into x86/asm, to pick up fixes before applying new changes
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-07 11:24:24 +02:00
Linus Torvalds
f6702681a0 xen: bug fixes for 4.3-rc4
- Fix VM save performance regression with x86 PV guests.
 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI).
 - Other minor fixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJWE8wQAAoJEFxbo/MsZsTRVTMH/0eqSg2M78wv4sBl234Y3FE9
 AN8KFUdlkK7VN9v0uuLMDSKIWNUuFJIvo/2rElWGRiX2Q+/pfnQg3ZSFhub9S8uL
 T4LCvmG9viRFb2oUz792ewqncSw3X98Jpto4smA820gJRjndBSWm5HUKUtPAkv1M
 l5DFMEgOeHbu+wCbKD/ZPEt5K9GsIaNviSNoWtYHirZwrd00oLmNbWp+g8lIGQiT
 3vLW0SaZzjL6akKxihb/p3WZ9eNmyz8yk0V7dItUEVUB9qoaDDLJ5qIRSHHWTWQD
 Jza/GE32VallZLuEXGG5/D86MsnyVYHC+lZtwo2IptOGm8v7WuZRv094wI1ev5c=
 =aiDw
 -----END PGP SIGNATURE-----

Merge tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip

Pull xen bug fixes from David Vrabel:

 - Fix VM save performance regression with x86 PV guests

 - Make kexec work in x86 PVHVM guests (if Xen has the soft-reset ABI)

 - Other minor fixes.

* tag 'for-linus-4.3b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  x86/xen/p2m: hint at the last populated P2M entry
  x86/xen: Do not clip xen_e820_map to xen_e820_map_entries when sanitizing map
  x86/xen: Support kexec/kdump in HVM guests by doing a soft reset
  xen/x86: Don't try to write syscall-related MSRs for PV guests
  xen: use correct type for HYPERVISOR_memory_op()
2015-10-06 15:05:02 +01:00
Linus Torvalds
3ec20e2e61 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
 "Three bug fixes and an update to the default configuration"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/defconfig: set SCSI_DH=y
  s390/vtime: correct scaled cputime of partially idle CPUs
  s390/boot/decompression: disable floating point in decompressor
  s390/numa: use correct type for node_to_cpumask_map
2015-10-06 14:59:36 +01:00
David Vrabel
98dd166ea3 x86/xen/p2m: hint at the last populated P2M entry
With commit 633d6f17cd (x86/xen: prepare
p2m list for memory hotplug) the P2M may be sized to accomdate a much
larger amount of memory than the domain currently has.

When saving a domain, the toolstack must scan all the P2M looking for
populated pages.  This results in a performance regression due to the
unnecessary scanning.

Instead of reporting (via shared_info) the maximum possible size of
the P2M, hint at the last PFN which might be populated.  This hint is
increased as new leaves are added to the P2M (in the expectation that
they will be used for populated entries).

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: <stable@vger.kernel.org> # 4.0+
2015-10-06 13:54:20 +01:00
Linus Torvalds
30c44659f4 Merge branch 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile
Pull strscpy string copy function implementation from Chris Metcalf.

Chris sent this during the merge window, but I waffled back and forth on
the pull request, which is why it's going in only now.

The new "strscpy()" function is definitely easier to use and more secure
than either strncpy() or strlcpy(), both of which are horrible nasty
interfaces that have serious and irredeemable problems.

strncpy() has a useless return value, and doesn't NUL-terminate an
overlong result.  To make matters worse, it pads a short result with
zeroes, which is a performance disaster if you have big buffers.

strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking
the insane NUL padding, but having a differently broken return value
which returns the original length of the source string.  Which means
that it will read characters past the count from the source buffer, and
you have to trust the source to be properly terminated.  It also makes
error handling fragile, since the test for overflow is unnecessarily
subtle.

strscpy() avoids both these problems, guaranteeing the NUL termination
(but not excessive padding) if the destination size wasn't zero, and
making the overflow condition very obvious by returning -E2BIG.  It also
doesn't read past the size of the source, and can thus be used for
untrusted source data too.

So why did I waffle about this for so long?

Every time we introduce a new-and-improved interface, people start doing
these interminable series of trivial conversion patches.

And every time that happens, somebody does some silly mistake, and the
conversion patch to the improved interface actually makes things worse.
Because the patch is mindnumbing and trivial, nobody has the attention
span to look at it carefully, and it's usually done over large swatches
of source code which means that not every conversion gets tested.

So I'm pulling the strscpy() support because it *is* a better interface.
But I will refuse to pull mindless conversion patches.  Use this in
places where it makes sense, but don't do trivial patches to fix things
that aren't actually known to be broken.

* 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
  tile: use global strscpy() rather than private copy
  string: provide strscpy()
  Make asm/word-at-a-time.h available on all architectures
2015-10-04 16:31:13 +01:00
Linus Torvalds
0d8770815f Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus
Pull MIPS updates from Ralf Baechle:
 "This week's round of MIPS fixes:
   - Fix JZ4740 build
   - Fix fallback to GFP_DMA
   - FP seccomp in case of ENOSYS
   - Fix bootmem panic
   - A number of FP and CPS fixes
   - Wire up new syscalls
   - Make sure BPF assembler objects can properly be disassembled
   - Fix BPF assembler code for MIPS I"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
  MIPS: scall: Always run the seccomp syscall filters
  MIPS: Octeon: Fix kernel panic on startup from memory corruption
  MIPS: Fix R2300 FP context switch handling
  MIPS: Fix octeon FP context switch handling
  MIPS: BPF: Fix load delay slots.
  MIPS: BPF: Do all exports of symbols with FEXPORT().
  MIPS: Fix the build on jz4740 after removing the custom gpio.h
  MIPS: CPS: #ifdef on CONFIG_MIPS_MT_SMP rather than CONFIG_MIPS_MT
  MIPS: CPS: Don't include MT code in non-MT kernels.
  MIPS: CPS: Stop dangling delay slot from has_mt.
  MIPS: dma-default: Fix 32-bit fall back to GFP_DMA
  MIPS: Wire up userfaultfd and membarrier syscalls.
2015-10-04 11:41:58 +01:00
Markos Chandras
d218af7849 MIPS: scall: Always run the seccomp syscall filters
The MIPS syscall handler code used to return -ENOSYS on invalid
syscalls. Whilst this is expected, it caused problems for seccomp
filters because the said filters never had the change to run since
the code returned -ENOSYS before triggering them. This caused
problems on the chromium testsuite for filters looking for invalid
syscalls. This has now changed and the seccomp filters are always
run even if the syscall is invalid. We return -ENOSYS once we
return from the seccomp filters. Moreover, similar codepaths have
been merged in the process which simplifies somewhat the overall
syscall code.

Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/11236/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-04 12:10:56 +02:00
Linus Torvalds
2cf30826bb Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
 "Fixes all around the map: W+X kernel mapping fix, WCHAN fixes, two
  build failure fixes for corner case configs, x32 header fix and a
  speling fix"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
  x86/mm: Set NX on gap between __ex_table and rodata
  x86/kexec: Fix kexec crash in syscall kexec_file_load()
  x86/process: Unify 32bit and 64bit implementations of get_wchan()
  x86/process: Add proper bound checks in 64bit get_wchan()
  x86, efi, kasan: Fix build failure on !KASAN && KMEMCHECK=y kernels
  x86/hyperv: Fix the build in the !CONFIG_KEXEC_CORE case
  x86/cpufeatures: Correct spelling of the HWP_NOTIFY flag
2015-10-03 10:53:05 -04:00
Linus Torvalds
a758379b03 Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
 "Two EFI fixes: one for x86, one for ARM, fixing a boot crash bug that
  can trigger under newer EFI firmware"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64/efi: Fix boot crash by not padding between EFI_MEMORY_RUNTIME regions
  x86/efi: Fix boot crash by mapping EFI memmap entries bottom-up at runtime, instead of top-down
2015-10-03 10:46:41 -04:00
Linus Torvalds
5634347dee - Fix for transparent huge page change_protection() logic which was
inadvertently changing a huge pmd page into a pmd table entry.
 - Function graph tracer panic fix caused by the return_to_handler code
   corrupting the multi-regs function return value (composite types).
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWDr76AAoJEGvWsS0AyF7xdKsP/1oE1dM/xXhQbYcJxXV3MgnT
 05pXmxxJUz7o2meVcbsz4c4UbhdHaQX2//jsgwxmoTNZo4EVz15c8GLWCPh5IRsw
 FQ/bVbDNmbOMZd4RSKShfIkW4bjelT5Mn/WuxUQoIX0qx316hmfFXMLCK2Gg7iOc
 hLkERWrbwHUynu0/lzE9EphOcLIGMmuT6n4qXtdhiLoFFMg8iuKDoxetj14oR3GC
 LQ5JHpvnS6ECLl50RbVvWLCSymnfhzveGvW/d58rFHFRY5PnjV2LATfLCkaKiz8h
 szxJLFuZZzP0lmhOZ9LUaRnNwTUFx5sg0FMEJaLimnTWZ2KmvxBgMuZz+vutjjlz
 DHsQQWVVW771Yzv4vWkv/4oAd/IMcoZFLaAjVYxcjzEFC/kB/i1zRSe8BMxdTs1u
 xqIi3Iv6c7Kv7VdANfTuR9zvFDPRSLoK1UEqQ0Sdvg9NuP8rPrn2ZaMyL1fIwxaL
 AO9JTAWqCYhgWXfeCAQYI1aDEdeE1ndK7a6eX6RDu1nRupQAHfTvV+DwfLRTF6g2
 T3IwfcDuquZHNaKBR6CIgF0xSzyfk7Wsbf3QPqtIGjGsyoHfrcf/9y0b3yNxXNq9
 GEepvrYQfdoP2xhwOyDK+8kNt0HxMiCrrPD0dni95No8DDct1TJ3kPnBdWyfAWLi
 sNNSuGbqMTRpONnuC9kK
 =AJCF
 -----END PGP SIGNATURE-----

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

 - Fix for transparent huge page change_protection() logic which was
   inadvertently changing a huge pmd page into a pmd table entry.

 - Function graph tracer panic fix caused by the return_to_handler code
   corrupting the multi-regs function return value (composite types).

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: ftrace: fix function_graph tracer panic
  arm64: Fix THP protection change logic
2015-10-02 14:54:16 -04:00
Linus Torvalds
b55a97e759 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
 "Summary:
   - Fix for accidental modification of arguments of syscall functions
   - Wire up new syscalls
   - Update defconfigs"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
  m68k/defconfig: Update defconfigs for v4.3-rc1
  m68k: Define asmlinkage_protect
  m68k: Wire up membarrier
  m68k: Wire up userfaultfd
  m68k: Wire up direct socket calls
2015-10-02 14:51:46 -04:00
Matt Bennett
66803dd919 MIPS: Octeon: Fix kernel panic on startup from memory corruption
During development it was found that a number of builds would panic
during the kernel init process, more specifically in 'delayed_fput()'.
The panic showed the kernel trying to access a memory address of
'0xb7fdc00' while traversing the 'delayed_fput_list' structure.
Comparing this memory address to the value of the pointer used on
builds that did not panic confirmed that the pointer on crashing
builds must have been corrupted at some stage earlier in the init
process.

By traversing the list earlier and earlier in the code it was found
that 'plat_mem_setup()' was responsible for corrupting the list.
Specifically the line:

    memory = cvmx_bootmem_phy_alloc(mem_alloc_size,
			__pa_symbol(&__init_end), -1,
			0x100000,
			CVMX_BOOTMEM_FLAG_NO_LOCKING);

Which would eventually call:

    cvmx_bootmem_phy_set_size(new_ent_addr,
		cvmx_bootmem_phy_get_size
		(ent_addr) -
		(desired_min_addr -
			ent_addr));

Where 'new_ent_addr'=0x4800000 (the address of 'delayed_fput_list')
and the second argument (size)=0xb7fdc00 (the address causing the
kernel panic). The job of this part of 'plat_mem_setup()' is to
allocate chunks of memory for the kernel to use. At the start of
each chunk of memory the size of the chunk is written, hence the
value 0xb7fdc00 is written onto memory at 0x4800000, therefore the
kernel panics when it goes back to access 'delayed_fput_list' later
on in the initialisation process.

On builds that were not crashing it was found that the compiler had
placed 'delayed_fput_list' at 0x4800008, meaning it wasn't corrupted
(but something else in memory was overwritten).

As can be seen in the first function call above the code begins to
allocate chunks of memory beginning from the symbol '__init_end'.
The MIPS linker script (vmlinux.lds.S) however defines the .bss
section to begin after '__init_end'. Therefore memory within the
.bss section is allocated to the kernel to use (System.map shows
'delayed_fput_list' and other kernel structures to be in .bss).

To stop the kernel panic (and the .bss section being corrupted)
memory should begin being allocated from the symbol '_end'.

Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz>
Acked-by: David Daney <david.daney@cavium.com>
Cc: linux-mips@linux-mips.org
Cc: aleksey.makarov@auriga.com
Patchwork: https://patchwork.linux-mips.org/patch/11251/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-02 19:19:55 +02:00
Paul Burton
085c2f25d3 MIPS: Fix R2300 FP context switch handling
Commit 1a3d59579b ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Remove it from the r2300_switch.S implementation too in order to prevent
attempting to save the FP context twice, which would likely lead to an
exception from the second save because the FPU had already been disabled
by the first save.

This patch has only been build tested, using rbtx49xx_defconfig.

Fixes: 1a3d59579b ("MIPS: Tidy up FPU context switching")
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Cc: linux-kernel@vger.kernel.org
Cc: Manuel Lauss <manuel.lauss@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/11167/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-02 19:16:46 +02:00
Paul Burton
0fa24340f7 MIPS: Fix octeon FP context switch handling
Commit 1a3d59579b ("MIPS: Tidy up FPU context switching") removed FP
context saving from the asm-written resume function in favour of reusing
existing code to perform the same task. However it only removed the FP
context saving code from the r4k_switch.S implementation of resume.
Octeon uses its own implementation in octeon_switch.S, so remove FP
context saving there too in order to prevent attempting to save context
twice. That formerly led to an exception from the second save as follows
because the FPU had already been disabled by the first save:

    do_cpu invoked from kernel context![#1]:
    CPU: 0 PID: 2 Comm: kthreadd Not tainted 4.3.0-rc2-dirty #2
    task: 800000041f84a008 ti: 800000041f864000 task.ti: 800000041f864000
    $ 0   : 0000000000000000 0000000010008ce1 0000000000100000 ffffffffbfffffff
    $ 4   : 800000041f84a008 800000041f84ac08 800000041f84c000 0000000000000004
    $ 8   : 0000000000000001 0000000000000000 0000000000000000 0000000000000001
    $12   : 0000000010008ce3 0000000000119c60 0000000000000036 800000041f864000
    $16   : 800000041f84ac08 800000000792ce80 800000041f84a008 ffffffff81758b00
    $20   : 0000000000000000 ffffffff8175ae50 0000000000000000 ffffffff8176c740
    $24   : 0000000000000006 ffffffff81170300
    $28   : 800000041f864000 800000041f867d90 0000000000000000 ffffffff815f3fa0
    Hi    : 0000000000fa8257
    Lo    : ffffffffe15cfc00
    epc   : ffffffff8112821c resume+0x9c/0x200
    ra    : ffffffff815f3fa0 __schedule+0x3f0/0x7d8
    Status: 10008ce2        KX SX UX KERNEL EXL
    Cause : 1080002c (ExcCode 0b)
    PrId  : 000d0601 (Cavium Octeon+)
    Modules linked in:
    Process kthreadd (pid: 2, threadinfo=800000041f864000, task=800000041f84a008, tls=0000000000000000)
    Stack : ffffffff81604218 ffffffff815f7e08 800000041f84a008 ffffffff811681b0
              800000041f84a008 ffffffff817e9878 0000000000000000 ffffffff81770000
              ffffffff81768340 ffffffff81161398 0000000000000001 0000000000000000
              0000000000000000 ffffffff815f4424 0000000000000000 ffffffff81161d68
              ffffffff81161be8 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 ffffffff8111e16c
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              0000000000000000 0000000000000000 0000000000000000 0000000000000000
              ...
    Call Trace:
    [<ffffffff8112821c>] resume+0x9c/0x200
    [<ffffffff815f3fa0>] __schedule+0x3f0/0x7d8
    [<ffffffff815f4424>] schedule+0x34/0x98
    [<ffffffff81161d68>] kthreadd+0x180/0x198
    [<ffffffff8111e16c>] ret_from_kernel_thread+0x14/0x1c

Tested using cavium_octeon_defconfig on an EdgeRouter Lite.

Fixes: 1a3d59579b ("MIPS: Tidy up FPU context switching")
Reported-by: Aaro Koskinen <aaro.koskinen@nokia.com>
Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Cc: linux-mips@linux-mips.org
Cc: Aleksey Makarov <aleksey.makarov@auriga.com>
Cc: linux-kernel@vger.kernel.org
Cc: Chandrakala Chavva <cchavva@caviumnetworks.com>
Cc: David Daney <david.daney@cavium.com>
Cc: Leonid Rosenboim <lrosenboim@caviumnetworks.com>
Patchwork: https://patchwork.linux-mips.org/patch/11166/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-02 19:16:06 +02:00
Li Bin
ee556d00cf arm64: ftrace: fix function_graph tracer panic
When function graph tracer is enabled, the following operation
will trigger panic:

mount -t debugfs nodev /sys/kernel
echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
echo function_graph > /sys/kernel/tracing/current_tracer
ls /proc/

------------[ cut here ]------------
[  198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
[  198.506126] pgd = ffffffc008f79000
[  198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
[  198.517726] Internal error: Oops: 94000005 [#1] SMP
[  198.518798] Modules linked in:
[  198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
[  198.521800] Hardware name: linux,dummy-virt (DT)
[  198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
[  198.524306] PC is at next_tgid+0x30/0x100
[  198.525205] LR is at return_to_handler+0x0/0x20
[  198.526090] pc : [<ffffffc0002a1070>] lr : [<ffffffc0000907c0>] pstate: 60000145
[  198.527392] sp : ffffffc0f9ab3d40
[  198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
[  198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
[  198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
[  198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
[  198.533202] x21: ffffffc000d85050 x20: 0000000000000002
[  198.534446] x19: 0000000000000002 x18: 0000000000000000
[  198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
[  198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
[  198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
[  198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
[  198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
[  198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
[  198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
[  198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
[  198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
[  198.547432]
[  198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
[  198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
[  198.582568] Call trace:
[  198.583313] [<ffffffc0002a1070>] next_tgid+0x30/0x100
[  198.584359] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.585503] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.586574] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.587660] [<ffffffc0000907bc>] ftrace_graph_caller+0x6c/0x70
[  198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
[  198.591092] ---[ end trace 6a346f8f20949ac8 ]---

This is because when using function graph tracer, if the traced
function return value is in multi regs ([x0-x7]), return_to_handler
may corrupt them. So in return_to_handler, the parameter regs should
be protected properly.

Cc: <stable@vger.kernel.org> # 3.18+
Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-02 11:12:56 +01:00
Ralf Baechle
0c5d187828 MIPS: BPF: Fix load delay slots.
The entire bpf_jit_asm.S is written in noreorder mode because "we know
better" according to a comment.  This also prevented the assembler from
throwing in the required NOPs for MIPS I processors which have no
load-use interlock, thus the load's consumer might end up using the
old value of the register from prior to the load.

Fixed by putting the assembler in reorder mode for just the affected
load instructions.  This is not enough for gas to actually try to be
clever by looking at the next instruction and inserting a nop only
when needed but as the comment said "we know better", so getting gas
to unconditionally emit a NOP is just right in this case and prevents
adding further ifdefery.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-02 09:48:57 +02:00
Ben Hutchings
f4b4aae182 x86/headers/uapi: Fix __BITS_PER_LONG value for x32 builds
On x32, gcc predefines __x86_64__ but long is only 32-bit.  Use
__ILP32__ to distinguish x32.

Fixes this compiler error in perf:

	tools/include/asm-generic/bitops/__ffs.h: In function '__ffs':
	tools/include/asm-generic/bitops/__ffs.h:19:8: error: right shift count >= width of type [-Werror=shift-count-overflow]
	  word >>= 32;
	       ^

This isn't sufficient to build perf for x32, though.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443660043.2730.15.camel@decadent.org.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02 09:43:21 +02:00
Stephen Smalley
ab76f7b4ab x86/mm: Set NX on gap between __ex_table and rodata
Unused space between the end of __ex_table and the start of
rodata can be left W+x in the kernel page tables.  Extend the
setting of the NX bit to cover this gap by starting from
text_end rather than rodata_start.

  Before:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB x  pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

  After:
  ---[ High Kernel Mapping ]---
  0xffffffff80000000-0xffffffff81000000          16M                               pmd
  0xffffffff81000000-0xffffffff81600000           6M     ro         PSE     GLB x  pmd
  0xffffffff81600000-0xffffffff81754000        1360K     ro                 GLB x  pte
  0xffffffff81754000-0xffffffff81800000         688K     RW                 GLB NX pte
  0xffffffff81800000-0xffffffff81a00000           2M     ro         PSE     GLB NX pmd
  0xffffffff81a00000-0xffffffff81b3b000        1260K     ro                 GLB NX pte
  0xffffffff81b3b000-0xffffffff82000000        4884K     RW                 GLB NX pte
  0xffffffff82000000-0xffffffff82200000           2M     RW         PSE     GLB NX pmd
  0xffffffff82200000-0xffffffffa0000000         478M                               pmd

Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: <stable@vger.kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Link: http://lkml.kernel.org/r/1443704662-3138-1-git-send-email-sds@tycho.nsa.gov
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02 09:21:06 +02:00
Lee, Chun-Yi
e3c41e37b0 x86/kexec: Fix kexec crash in syscall kexec_file_load()
The original bug is a page fault crash that sometimes happens
on big machines when preparing ELF headers:

    BUG: unable to handle kernel paging request at ffffc90613fc9000
    IP: [<ffffffff8103d645>] prepare_elf64_ram_headers_callback+0x165/0x260

The bug is caused by us under-counting the number of memory ranges
and subsequently not allocating enough ELF header space for them.
The bug is typically masked on smaller systems, because the ELF header
allocation is rounded up to the next page.

This patch modifies the code in fill_up_crash_elf_data() by using
walk_system_ram_res() instead of walk_system_ram_range() to correctly
count the max number of crash memory ranges. That's because the
walk_system_ram_range() filters out small memory regions that
reside in the same page, but walk_system_ram_res() does not.

Here's how I found the bug:

After tracing prepare_elf64_headers() and prepare_elf64_ram_headers_callback(),
the code uses walk_system_ram_res() to fill-in crash memory regions information
to the program header, so it counts those small memory regions that
reside in a page area.

But, when the kernel was using walk_system_ram_range() in
fill_up_crash_elf_data() to count the number of crash memory regions,
it filters out small regions.

I printed those small memory regions, for example:

  kexec: Get nr_ram ranges. vaddr=0xffff880077592258 paddr=0x77592258, sz=0xdc0

Based on the code in walk_system_ram_range(), this memory region
will be filtered out:

  pfn = (0x77592258 + 0x1000 - 1) >> 12 = 0x77593
  end_pfn = (0x77592258 + 0xfc0 -1 + 1) >> 12 = 0x77593
  end_pfn - pfn = 0x77593 - 0x77593 = 0  <=== if (end_pfn > pfn) is FALSE

So, the max_nr_ranges that's counted by the kernel doesn't include
small memory regions - causing us to under-allocate the required space.
That causes the page fault crash that happens in a later code path
when preparing ELF headers.

This bug is not easy to reproduce on small machines that have few
CPUs, because the allocated page aligned ELF buffer has more free
space to cover those small memory regions' PT_LOAD headers.

Signed-off-by: Lee, Chun-Yi <jlee@suse.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Jiang Liu <jiang.liu@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: kexec@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/1443531537-29436-1-git-send-email-jlee@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-10-02 09:13:06 +02:00
Linus Torvalds
bde17b90dd Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
 "12 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  dmapool: fix overflow condition in pool_find_page()
  thermal: avoid division by zero in power allocator
  memcg: remove pcp_counter_lock
  kprobes: use _do_fork() in samples to make them work again
  drivers/input/joystick/Kconfig: zhenhua.c needs BITREVERSE
  memcg: make mem_cgroup_read_stat() unsigned
  memcg: fix dirty page migration
  dax: fix NULL pointer in __dax_pmd_fault()
  mm: hugetlbfs: skip shared VMAs when unmapping private pages to satisfy a fault
  mm/slab: fix unexpected index mapping result of kmalloc_size(INDEX_NODE+1)
  userfaultfd: remove kernel header include from uapi header
  arch/x86/include/asm/efi.h: fix build failure
2015-10-01 22:20:11 -04:00
Andrey Ryabinin
a523841ee4 arch/x86/include/asm/efi.h: fix build failure
With KMEMCHECK=y, KASAN=n:

  arch/x86/platform/efi/efi.c:673:3: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
  arch/x86/platform/efi/efi_64.c:139:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]
  arch/x86/include/asm/desc.h:121:2: error: implicit declaration of function `memcpy' [-Werror=implicit-function-declaration]

Don't #undef memcpy if KASAN=n.

Fixes: 769a8089c1 ("x86, efi, kasan: #undef memset/memcpy/memmove per arch")
Signed-off-by: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Reported-by: Ingo Molnar <mingo@kernel.org>
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2015-10-01 21:42:35 -04:00
Linus Torvalds
ccf70ddcbe (Relatively) a lot of reverts, mostly.
Bugs have trickled in for a new feature in 4.2 (MTRR support in guests)
 so I'm reverting it all; let's not make this -rc period busier for KVM
 than it's been so far.  This covers the four reverts from me.
 
 The fifth patch is being reverted because Radim found a bug in the
 implementation of stable scheduler clock, *but* also managed to implement
 the feature entirely without hypervisor support.  So instead of fixing
 the hypervisor side we can remove it completely; 4.4 will get the new
 implementation.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJWDXc/AAoJEL/70l94x66D8GoH/0WXeSYHn8+Ql5oZ5vI0QcCG
 6MiKVixhHTOpkug2QE4DGClYoFSUPuDEB/w6D7YciNn0quDHFZbI3XEMXYtLobHN
 0J9cMv9Vpy5pBVMG/LJOw9pFAJRdhSx/cHU2DW9vUiRG9dO9zuxFzBtUciWLOPAX
 tSQfDumeUV30BsTP5ldi9kaIUJBM9oBD4JhES0JHx6ePBvy+9vCRmHotugzrrGx6
 N+AbCmwUwxnK29PF9i7KMfex6T8l1uQG3fwWVazHoswsqbFEQyF6NpaSTYoZkjM9
 6gaXEE1FQ7tRhuio4bBDos0lLu6iGesveP71p/HpULleq2sbH2ER8TpzR5iSnQA=
 =zAJS
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
 "(Relatively) a lot of reverts, mostly.

  Bugs have trickled in for a new feature in 4.2 (MTRR support in
  guests) so I'm reverting it all; let's not make this -rc period busier
  for KVM than it's been so far.  This covers the four reverts from me.

  The fifth patch is being reverted because Radim found a bug in the
  implementation of stable scheduler clock, *but* also managed to
  implement the feature entirely without hypervisor support.  So instead
  of fixing the hypervisor side we can remove it completely; 4.4 will
  get the new implementation"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
  Update KVM homepage Url
  Revert "KVM: SVM: use NPT page attributes"
  Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"
  Revert "KVM: SVM: Sync g_pat with guest-written PAT value"
  Revert "KVM: x86: apply guest MTRR virtualization on host reserved pages"
  Revert "KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR"
2015-10-01 16:43:25 -04:00
Steve Capper
1a541b4e3c arm64: Fix THP protection change logic
6910fa1 ("arm64: enable PTE type bit in the mask for pte_modify") fixes
a problem whereby a large block of PROT_NONE mapped memory is
incorrectly mapped as block descriptors when mprotect is called.

Unfortunately, a subtle bug was introduced by this fix to the THP logic.

If one mmaps a large block of memory, then faults it such that it is
collapsed into THPs; resulting calls to mprotect on this area of memory
will lead to incorrect table descriptors being written instead of block
descriptors. This is because pmd_modify calls pte_modify which is now
allowed to modify the type of the page table entry.

This patch reverts commit 6910fa16db, and
fixes the problem it was trying to address by adjusting PAGE_NONE to
represent a table entry. Thus no change in pte type is required when
moving from PROT_NONE to a different protection.

Fixes: 6910fa16db ("arm64: enable PTE type bit in the mask for pte_modify")
Cc: <stable@vger.kernel.org> # 4.0+
Cc: Feng Kan <fkan@apm.com>
Reported-by: Ganapatrao Kulkarni <Ganapatrao.Kulkarni@caviumnetworks.com>
Tested-by: Ganapatrao Kulkarni <gkulkarni@caviumnetworks.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Steve Capper <steve.capper@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2015-10-01 18:02:21 +01:00
Ralf Baechle
1e16a8f116 MIPS: BPF: Do all exports of symbols with FEXPORT().
FEXPORT also marks the symbol as code using .type symbol, @function.
Without objdump -d will output only a hexdump for code following the
affected symbols.

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2015-10-01 15:45:44 +02:00
Dirk Müller
d2922422c4 Use WARN_ON_ONCE for missing X86_FEATURE_NRIPS
The cpu feature flags are not ever going to change, so warning
everytime can cause a lot of kernel log spam
(in our case more than 10GB/hour).

The warning seems to only occur when nested virtualization is
enabled, so it's probably triggered by a KVM bug.  This is a
sensible and safe change anyway, and the KVM bug fix might not
be suitable for stable releases anyway.

Cc: stable@vger.kernel.org
Signed-off-by: Dirk Mueller <dmueller@suse.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 14:59:37 +02:00
Paolo Bonzini
fc07e76ac7 Revert "KVM: SVM: use NPT page attributes"
This reverts commit 3c2e7f7de3.
Initializing the mapping from MTRR to PAT values was reported to
fail nondeterministically, and it also caused extremely slow boot
(due to caching getting disabled---bug 103321) with assigned devices.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Sebastian Schuette <dracon@ewetel.net>
Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 13:30:44 +02:00
Paolo Bonzini
bcf166a994 Revert "KVM: svm: handle KVM_X86_QUIRK_CD_NW_CLEARED in svm_get_mt_mask"
This reverts commit 5492830370.
It builds on the commit that is being reverted next.

Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 13:30:43 +02:00
Paolo Bonzini
625422f60c Revert "KVM: SVM: Sync g_pat with guest-written PAT value"
This reverts commit e098223b78,
which has a dependency on other commits being reverted.

Cc: stable@vger.kernel.org # 4.2+
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-10-01 13:30:43 +02:00