commit 80b3ffce0196ea50068885d085ff981e4b8396f4 upstream.
Update commit d614fd58a283 ("mips/ptrace: Preserve previous registers
for short regset write") bug and consistently consume all data supplied
to `fpr_set_msa' with the ptrace(2) PTRACE_SETREGSET request, such that
a zero data buffer counter is returned where insufficient data has been
given to fill a whole number of FP general registers.
In reality this is not going to happen, as the caller is supposed to
only supply data covering a whole number of registers and it is verified
in `ptrace_regset' and again asserted in `fpr_set', however structuring
code such that the presence of trailing partial FP general register data
causes `fpr_set_msa' to return with a non-zero data buffer counter makes
it appear that this trailing data will be used if there are subsequent
writes made to FP registers, which is going to be the case with the FCSR
once the missing write to that register has been fixed.
Fixes: d614fd58a283 ("mips/ptrace: Preserve previous registers for short regset write")
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17927/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dc24d0edf33c3e15099688b6bbdf7bdc24bf6e91 upstream.
Complement commit d614fd58a283 ("mips/ptrace: Preserve previous
registers for short regset write") and ensure that no partial register
write attempt is made with PTRACE_SETREGSET, as we do not preinitialize
any temporaries used to hold incoming register data and consequently
random data could be written.
It is the responsibility of the caller, such as `ptrace_regset', to
arrange for writes to span whole registers only, so here we only assert
that it has indeed happened.
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 72b22bbad1 ("MIPS: Don't assume 64-bit FP registers for FP regset")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17926/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a03fe72572c12e98f4173f8a535f32468e48b6ec upstream.
In preparation to fix a commit 72b22bbad1 ("MIPS: Don't assume 64-bit
FP registers for FP regset") FCSR access regression factor out
NT_PRFPREG regset access helpers for the non-MSA and the MSA variants
respectively, to avoid having to deal with excessive indentation in the
actual fix.
No functional change, however use `target->thread.fpu.fpr[0]' rather
than `target->thread.fpu.fpr[i]' for FGR holding type size determination
as there's no `i' variable to refer to anymore, and for the factored out
`i' variable declaration use `unsigned int' rather than `unsigned' as
its type, following the common style.
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 72b22bbad1 ("MIPS: Don't assume 64-bit FP registers for FP regset")
Cc: James Hogan <james.hogan@mips.com>
Cc: Paul Burton <Paul.Burton@mips.com>
Cc: Alex Smith <alex@alex-smith.me.uk>
Cc: Dave Martin <Dave.Martin@arm.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17925/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b67336eee3fcb8ecedc6c13e2bf88aacfa3151e2 upstream.
Fix an API loophole introduced with commit 9791554b45 ("MIPS,prctl:
add PR_[GS]ET_FP_MODE prctl options for MIPS"), where the caller of
prctl(2) is incorrectly allowed to make a change to CP0.Status.FR or
CP0.Config5.FRE register bits even if CONFIG_MIPS_O32_FP64_SUPPORT has
not been enabled, despite that an executable requesting the mode
requested via ELF file annotation would not be allowed to run in the
first place, or for n64 and n64 ABI tasks which do not have non-default
modes defined at all. Add suitable checks to `mips_set_process_fp_mode'
and bail out if an invalid mode change has been requested for the ABI in
effect, even if the FPU hardware or emulation would otherwise allow it.
Always succeed however without taking any further action if the mode
requested is the same as one already in effect, regardless of whether
any mode change, should it be requested, would actually be allowed for
the task concerned.
Signed-off-by: Maciej W. Rozycki <macro@mips.com>
Fixes: 9791554b45 ("MIPS,prctl: add PR_[GS]ET_FP_MODE prctl options for MIPS")
Reviewed-by: Paul Burton <paul.burton@mips.com>
Cc: James Hogan <james.hogan@mips.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17800/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bec40c26041de61162f7be9d2ce548c756ce0f65 upstream.
With the SRP protocol all RDMA operations are initiated by the target.
Since no RDMA operations are initiated by the initiator, do not grant
the initiator permission to submit RDMA reads or writes to the target.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d5b42e6607661b198d8b26a0c30969605b1bf5c7 upstream.
The "set_bittiming" callback treats a positive return value as error!
For that reason "can_changelink()" will quit silently after setting
the bittiming values without processing ctrlmode, restart-ms, etc.
Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e39d200fa5bf5b94a0948db0dae44c1b73b84a56 upstream.
Reported by syzkaller:
BUG: KASAN: stack-out-of-bounds in write_mmio+0x11e/0x270 [kvm]
Read of size 8 at addr ffff8803259df7f8 by task syz-executor/32298
CPU: 6 PID: 32298 Comm: syz-executor Tainted: G OE 4.15.0-rc2+ #18
Hardware name: LENOVO ThinkCentre M8500t-N000/SHARKBAY, BIOS FBKTC1AUS 02/16/2016
Call Trace:
dump_stack+0xab/0xe1
print_address_description+0x6b/0x290
kasan_report+0x28a/0x370
write_mmio+0x11e/0x270 [kvm]
emulator_read_write_onepage+0x311/0x600 [kvm]
emulator_read_write+0xef/0x240 [kvm]
emulator_fix_hypercall+0x105/0x150 [kvm]
em_hypercall+0x2b/0x80 [kvm]
x86_emulate_insn+0x2b1/0x1640 [kvm]
x86_emulate_instruction+0x39a/0xb90 [kvm]
handle_exception+0x1b4/0x4d0 [kvm_intel]
vcpu_enter_guest+0x15a0/0x2640 [kvm]
kvm_arch_vcpu_ioctl_run+0x549/0x7d0 [kvm]
kvm_vcpu_ioctl+0x479/0x880 [kvm]
do_vfs_ioctl+0x142/0x9a0
SyS_ioctl+0x74/0x80
entry_SYSCALL_64_fastpath+0x23/0x9a
The path of patched vmmcall will patch 3 bytes opcode 0F 01 C1(vmcall)
to the guest memory, however, write_mmio tracepoint always prints 8 bytes
through *(u64 *)val since kvm splits the mmio access into 8 bytes. This
leaks 5 bytes from the kernel stack (CVE-2017-17741). This patch fixes
it by just accessing the bytes which we operate on.
Before patch:
syz-executor-5567 [007] .... 51370.561696: kvm_mmio: mmio write len 3 gpa 0x10 val 0x1ffff10077c1010f
After patch:
syz-executor-13416 [002] .... 51302.299573: kvm_mmio: mmio write len 3 gpa 0x10 val 0xc1010f
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Darren Kenny <darren.kenny@oracle.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Tested-by: Marc Zyngier <marc.zyngier@arm.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbc7c07ec23c040179384a1f16b62b6030eb6bdd upstream.
When system is under memory pressure it is observed that dm bufio
shrinker often reclaims only one buffer per scan. This change fixes
the following two issues in dm bufio shrinker that cause this behavior:
1. ((nr_to_scan - freed) <= retain_target) condition is used to
terminate slab scan process. This assumes that nr_to_scan is equal
to the LRU size, which might not be correct because do_shrink_slab()
in vmscan.c calculates nr_to_scan using multiple inputs.
As a result when nr_to_scan is less than retain_target (64) the scan
will terminate after the first iteration, effectively reclaiming one
buffer per scan and making scans very inefficient. This hurts vmscan
performance especially because mutex is acquired/released every time
dm_bufio_shrink_scan() is called.
New implementation uses ((LRU size - freed) <= retain_target)
condition for scan termination. LRU size can be safely determined
inside __scan() because this function is called after dm_bufio_lock().
2. do_shrink_slab() uses value returned by dm_bufio_shrink_count() to
determine number of freeable objects in the slab. However dm_bufio
always retains retain_target buffers in its LRU and will terminate
a scan when this mark is reached. Therefore returning the entire LRU size
from dm_bufio_shrink_count() is misleading because that does not
represent the number of freeable objects that slab will reclaim during
a scan. Returning (LRU size - retain_target) better represents the
number of freeable objects in the slab. This way do_shrink_slab()
returns 0 when (LRU size < retain_target) and vmscan will not try to
scan this shrinker avoiding scans that will not reclaim any memory.
Test: tested using Android device running
<AOSP>/system/extras/alloc-stress that generates memory pressure
and causes intensive shrinker scans
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This fixes the following much-reported build issue:
arch/x86/entry/vdso/vma.c: In function ‘map_vdso’:
arch/x86/entry/vdso/vma.c:175:9: error:
implicit declaration of function ‘pvclock_pvti_cpu0_va’
on some arches and configurations.
Thanks to Guenter for being persistent enough to get it fixed :)
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This needs to happen early in kaiser_pagetable_walk(), before the
hierarchy is established so that _PAGE_USER permission can be really
set.
A proper fix would be to teach kaiser_pagetable_walk() to update those
permissions but the vsyscall page is the only exception here so ...
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3205c36cf7d96024626f92d65f560035df1abcb2 upstream.
While most of the locations where a kernel taint bit is set are accompanied
with a warning message, there are two which set their bits silently. If
the tainting module gets unloaded later on, it is almost impossible to tell
what was the reason for setting the flag.
Signed-off-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0224418516b4d8a6c2160574bac18447c354ef0 upstream.
Currently, percpu symbols from .data..percpu ELF section of a module are
not copied over and stored in final symtab array of struct module.
Consequently such symbol cannot be returned via kallsyms API (for
example kallsyms_lookup_name). This can be especially confusing when the
percpu symbol is exported. Only its __ksymtab et al. are present in its
symtab.
The culprit is in layout_and_allocate() function where SHF_ALLOC flag is
dropped for .data..percpu section. There is in fact no need to copy the
section to final struct module, because kernel module loader allocates
extra percpu section by itself. Unfortunately only symbols from
SHF_ALLOC sections are copied due to a check in is_core_symbol().
The patch changes is_core_symbol() function to copy over also percpu
symbols (their st_shndx points to .data..percpu ELF section). We do it
only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the
function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally
elf_type() returns type 'a' for a percpu symbol because its address is
absolute.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a78f70e8d65e88b9f631d073f68cb26dcd746298 upstream.
The reference files use spaces to separate tokens, however, we must
preserve spaces inside string literals. Currently the only case in the
tree is struct edac_raw_error_desc in <linux/edac.h>:
$ KBUILD_SYMTYPES=1 make -s drivers/edac/amd64_edac.symtypes
$ mv drivers/edac/amd64_edac.{symtypes,symref}
$ KBUILD_SYMTYPES=1 make -s drivers/edac/amd64_edac.symtypes
drivers/edac/amd64_edac.c:527: warning: amd64_get_dram_hole_info: modversion changed because of changes in struct edac_raw_error_desc
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1e5476815fd7f98b888e01a0f9522b63085f96c9 upstream.
The recent changes for PTI touch cpu_tlbstate from various tlb_flush
inlines. cpu_tlbstate is exported as GPL symbol, so this causes a
regression when building out of tree drivers for certain graphics cards.
Aside of that the export was wrong since it was introduced as it should
have been EXPORT_PER_CPU_SYMBOL_GPL().
Use the correct PER_CPU export and drop the _GPL to restore the previous
state which allows users to utilize the cards they payed for.
As always I'm really thrilled to make this kind of change to support the
#friends (or however the hot hashtag of today is spelled) from that closet
sauce graphics corp.
Fixes: 1e02ce4ccc ("x86: Store a per-cpu shadow copy of CR4")
Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches")
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Backlund <tmb@mageia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 88776c0e70be0290f8357019d844aae15edaa967 upstream.
Qemu for PARISC reported on a 32bit SMP parisc kernel strange failures
about "Not-handled unaligned insn 0x0e8011d6 and 0x0c2011c9."
Those opcodes evaluate to the ldcw() assembly instruction which requires
(on 32bit) an alignment of 16 bytes to ensure atomicity.
As it turns out, qemu is correct and in our assembly code in entry.S and
pacache.S we don't pay attention to the required alignment.
This patch fixes the problem by aligning the lock offset in assembly
code in the same manner as we do in our C-code.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f4e9b7af0cd58dd039a0fb2cd67d57cea4889abf upstream.
The size for the Microcode Patch Block (MPB) for an AMD family 17h
processor is 3200 bytes. Add a #define for fam17h so that it does
not default to 2048 bytes and fail a microcode load/update.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Borislav Petkov <bp@alien8.de>
Link: https://lkml.kernel.org/r/20171130224640.15391.40247.stgit@tlendack-t1.amdoffice.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Alice Ferrazzi <alicef@gentoo.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79435ac78d160e4c245544d457850a56f805ac0d upstream.
This used to setup the LP_COUNT register automatically, but now has been
removed.
There was an earlier fix 3c7c7a2fc8811 which fixed instance in delay.h but
somehow missed this one as gcc change had not made its way into
production toolchains and was not pedantic as it is now !
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.
Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached. It mustn't return false as that indicates
the page cannot yet be freed.
The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem ->releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.
This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress. This might be an issue for
9P, however.
Note that the pages aren't entirely stuck. Removing a file or unmounting
will clear things because that uses ->invalidatepage() instead.
Fixes: 201a15428b ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e2bf801ecd4e62222a46d1ba9e57e710171d29c1 upstream.
Include the OF-based modalias in the uevent sent when registering devices
on the sunxi RSB bus, so that user space has a chance to autoload the
kernel module for the device.
Fixes a regression caused by commit 3f241bfa60bd ("arm64: allwinner: a64:
pine64: Use dcdc1 regulator for mmc0"). When the axp20x-rsb module for
the AXP803 PMIC is built as a module, it is not loaded and the system
ends up with an disfunctional MMC controller.
Fixes: d787dcdb9c ("bus: sunxi-rsb: Add driver for Allwinner Reduced Serial Bus")
Acked-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d76c68109f37cb85b243a1cf0f40313afd2bae68 upstream.
pcrypt is using the old way of freeing instances, where the ->free()
method specified in the 'struct crypto_template' is passed a pointer to
the 'struct crypto_instance'. But the crypto_instance is being
kfree()'d directly, which is incorrect because the memory was actually
allocated as an aead_instance, which contains the crypto_instance at a
nonzero offset. Thus, the wrong pointer was being kfree()'d.
Fix it by switching to the new way to free aead_instance's where the
->free() method is specified in the aead_instance itself.
Reported-by: syzbot <syzkaller@googlegroups.com>
Fixes: 0496f56065 ("crypto: pcrypt - Add support for new AEAD interface")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 203f45003a3d03eea8fa28d74cfc74c354416fdb upstream.
queue_cache_init is first called for the Control Word Queue
(n2_crypto_probe). At that time, queue_cache[0] is NULL and a new
kmem_cache will be allocated. If the subsequent n2_register_algs call
fails, the kmem_cache will be released in queue_cache_destroy, but
queue_cache_init[0] is not set back to NULL.
So when the Module Arithmetic Unit gets probed next (n2_mau_probe),
queue_cache_init will not allocate a kmem_cache again, but leave it
as its bogus value, causing a BUG() to trigger when queue_cache[0] is
eventually passed to kmem_cache_zalloc:
n2_crypto: Found N2CP at /virtual-devices@100/n2cp@7
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
n2_crypto: md5 alg registration failed
n2cp f028687c: /virtual-devices@100/n2cp@7: Unable to register algorithms.
called queue_cache_destroy
n2cp: probe of f028687c failed with error -22
n2_crypto: Found NCP at /virtual-devices@100/ncp@6
n2_crypto: Registered NCS HVAPI version 2.0
called queue_cache_init
kernel BUG at mm/slab.c:2993!
Call Trace:
[0000000000604488] kmem_cache_alloc+0x1a8/0x1e0
(inlined) kmem_cache_zalloc
(inlined) new_queue
(inlined) spu_queue_setup
(inlined) handle_exec_unit
[0000000010c61eb4] spu_mdesc_scan+0x1f4/0x460 [n2_crypto]
[0000000010c62b80] n2_mau_probe+0x100/0x220 [n2_crypto]
[000000000084b174] platform_drv_probe+0x34/0xc0
Signed-off-by: Jan Engelhardt <jengelh@inai.de>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc730860 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc730860 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 69e0210fd01ff157d332102219aaf5c26ca8069b upstream.
Currently we clear kasan_zero_page before __flush_tlb_all(). This
works with current implementation of native_flush_tlb[_global]()
because it doesn't cause do any writes to kasan shadow memory.
But any subtle change made in native_flush_tlb*() could break this.
Also current code seems doesn't work for paravirt guests (lguest).
Only after the TLB flush we can be sure that kasan_zero_page is not
used as early shadow anymore (instrumented code will not write to it).
So it should cleared it only after the TLB flush.
Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b078f5de7fc0851af4102493c7b5bb07e49c4cb upstream.
The pvclock vdso code was too abstracted to understand easily
and excessively paranoid. Simplify it for a huge speedup.
This opens the door for additional simplifications, as the vdso
no longer accesses the pvti for any vcpu other than vcpu 0.
Before, vclock_gettime using kvm-clock took about 45ns on my
machine. With this change, it takes 29ns, which is almost as
fast as the pure TSC implementation.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/6b51dcc41f1b101f963945c5ec7093d72bdac429.1449702533.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3).
This does not work with KAISER as the CR3 switch from and to user space PGD
would require to map the whole XEN_PV machinery into both.
More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV
guests use distinct %cr3 values for kernel and user already.
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that the required bits have been addressed, reenable
PARAVIRT.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes. Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I found asm/tlbflush.h too twisty, and think it safer not to avoid
__native_flush_tlb_global_irq_disabled() in the kaiser_enabled case,
but instead let it handle kaiser_enabled along with cr3: it can just
use __native_flush_tlb() for that, no harm in re-disabling preemption.
(This is not the same change as Kirill and Dave have suggested for
upstream, flipping PGE in cr4: that's neat, but needs a cpu_has_pge
check; cr3 is enough for kaiser, and thought to be cheaper than cr4.)
Also delete the X86_FEATURE_INVPCID invpcid_flush_all_nonglobals()
preference from __native_flush_tlb(): unlike the invpcid_flush_all()
preference in __native_flush_tlb_global(), it's not seen in upstream
4.14, and was recently reported to be surprisingly slow.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.
Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.
Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE? I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.
Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti".
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime. That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.
Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.
It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h). I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.
Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments. But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
An error from kaiser_add_mapping() here is not at all likely, but
Eric Biggers rightly points out that __free_ldt_struct() relies on
new_ldt->size being initialized: move that up.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>