Commit graph

569856 commits

Author SHA1 Message Date
Joel Fernandes
139ac8ac89 FROMLIST: tracing: Prepare to add preempt and irq trace events
In preparation of adding irqsoff and preemptsoff enable and disable trace
events, move required functions and code to make it easier to add these events
in a later patch. This patch is just code movement and no functional change.

Change-Id: I587d411da5efbc4959bcccd7a05c7a66c231e1e0
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: kernel-team@android.com
Link: https://patchwork.kernel.org/patch/9988159/
Signed-off-by: Joel Fernandes <joelaf@google.com>
2017-10-06 16:16:11 -07:00
Greg Kroah-Hartman
a958344209 This is the 4.4.90 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlnV4kwACgkQONu9yGCS
 aT560A//f5HQvKsfbDuzGTeCggXr+m9cd8L6GfKvQbmAKwPG9qfRiBM2EO4zqDkq
 Q6A6yjy0YVvsnQ3rJCA44yg/MC9JojlVgmBowtgD9uSh0Z8Q8qDdzAi35xdOF+BD
 O+2opoKSCKuc1ckpb7AoY/803XllTbNWbYd2eDzBvjxXd5i+qCF4GnBTTvkMIshm
 Fis8va+fgNHjuBlHgmV+sCR3CRWGv6PqtkDcG79nv69JkwQ2tx3JbMwtDOrgnR5X
 nIlvgNtZwYKtorxin6qaDWfmhLBHiI4Xhr9L1gAKLsi9S5m3nQ4Ozzsndqtlxppa
 bOHpsdCzVRkBz1UB2QQfZOJzE3tQvCBaAxUGeMAcO/F5wcgeyHl9Wo2bblqidJSc
 MdtN044pSE1yFTYtd14CdUKl+Jx/R9lFYM/o7IzxTrrRHfBuylTSA8fx6OIdPxJA
 Wmd+4HwVJxXmCBNaWnH4LRhd6rp7FB3wzUSt3Euxfq5GRa5+522u6VEriGRuBEeW
 SOrcU++U9mIuR2Zk6A6vBVwB1g78vEvGlyQehFzJWghcLtRqqzEPYz281CklcOe/
 G3p8+v8wSZo/hHEyeJLRwX74Nlna/ZjoAdxS+ngW9BuNXFeZErTN32HgJNtBF5La
 4MIKiOBWCytmxhrS+PIykQAmal6HlDEvjue6xGzqKxeBq5l250I=
 =NPlq
 -----END PGP SIGNATURE-----

Merge 4.4.90 into android-4.4

Changes in 4.4.90
	cifs: release auth_key.response for reconnect.
	mac80211: flush hw_roc_start work before cancelling the ROC
	KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
	tracing: Fix trace_pipe behavior for instance traces
	tracing: Erase irqsoff trace with empty write
	md/raid5: fix a race condition in stripe batch
	md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
	scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
	crypto: talitos - Don't provide setkey for non hmac hashing algs.
	crypto: talitos - fix sha224
	KEYS: fix writing past end of user-supplied buffer in keyring_read()
	KEYS: prevent creating a different user's keyrings
	KEYS: prevent KEYCTL_READ on negative key
	powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
	Fix SMB3.1.1 guest authentication to Samba
	SMB: Validate negotiate (to protect against downgrade) even if signing off
	SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
	vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
	nl80211: check for the required netlink attributes presence
	bsg-lib: don't free job in bsg_prepare_job
	seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
	arm64: Make sure SPsel is always set
	arm64: fault: Route pte translation faults via do_translation_fault
	KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
	kvm: nVMX: Don't allow L2 to access the hardware CR8
	PCI: Fix race condition with driver_override
	btrfs: fix NULL pointer dereference from free_reloc_roots()
	btrfs: propagate error to btrfs_cmp_data_prepare caller
	btrfs: prevent to set invalid default subvolid
	x86/fpu: Don't let userspace set bogus xcomp_bv
	gfs2: Fix debugfs glocks dump
	timer/sysclt: Restrict timer migration sysctl values to 0 and 1
	KVM: VMX: do not change SN bit in vmx_update_pi_irte()
	KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
	cxl: Fix driver use count
	dmaengine: mmp-pdma: add number of requestors
	ARM: pxa: add the number of DMA requestor lines
	ARM: pxa: fix the number of DMA requestor lines
	KVM: VMX: use cmpxchg64
	video: fbdev: aty: do not leak uninitialized padding in clk to userspace
	swiotlb-xen: implement xen_swiotlb_dma_mmap callback
	fix xen_swiotlb_dma_mmap prototype
	Linux 4.4.90

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-10-05 10:03:13 +02:00
Martijn Coenen
3cc621033b ANDROID: binder: fix transaction leak.
If a call to put_user() fails, we failed to
properly free a transaction and send a failed
reply (if necessary).

Bug: 63117588
Test: binderLibTest

Change-Id: Ia98db8cd82ce354a4cdc8811c969988d585c7e31
Signed-off-by: Martijn Coenen <maco@android.com>
2017-10-05 09:58:05 +02:00
Martijn Coenen
e5e42eca05 ANDROID: binder: Add tracing for binder priority inheritance.
Bug: 34461621
Change-Id: I5ebb1c0c49fd42a89ee250a1d70221f767c82c7c
Signed-off-by: Martijn Coenen <maco@google.com>
2017-10-05 09:58:05 +02:00
Greg Kroah-Hartman
37c2d0d3e8 Linux 4.4.90 2017-10-05 09:41:59 +02:00
Arnd Bergmann
228969b476 fix xen_swiotlb_dma_mmap prototype
xen_swiotlb_dma_mmap was backported from v4.10, but older
kernels before commit 00085f1efa38 ("dma-mapping: use unsigned long
for dma_attrs") use a different signature:

arm/xen/mm.c:202:10: error: initialization from incompatible pointer type [-Werror=incompatible-pointer-types]
  .mmap = xen_swiotlb_dma_mmap,
          ^~~~~~~~~~~~~~~~~~~~
arm/xen/mm.c:202:10: note: (near initialization for 'xen_swiotlb_dma_ops.mmap')

This adapts the patch to the old calling conventions.

Fixes: "swiotlb-xen: implement xen_swiotlb_dma_mmap callback"
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Stefano Stabellini
079c03f4a9 swiotlb-xen: implement xen_swiotlb_dma_mmap callback
commit 7e91c7df29b5e196de3dc6f086c8937973bd0b88 upstream.

This function creates userspace mapping for the DMA-coherent memory.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Oleksandr Dmytryshyn <oleksandr.dmytryshyn@globallogic.com>
Signed-off-by: Andrii Anisov <andrii_anisov@epam.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Vladis Dronov
27323cb81e video: fbdev: aty: do not leak uninitialized padding in clk to userspace
commit 8e75f7a7a00461ef6d91797a60b606367f6e344d upstream.

'clk' is copied to a userland with padding byte(s) after 'vclk_post_div'
field unitialized, leaking data from the stack. Fix this ensuring all of
'clk' is initialized to zero.

References: https://github.com/torvalds/linux/pull/441
Reported-by: sohu0106 <sohu0106@126.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Paolo Bonzini
150cd84bb6 KVM: VMX: use cmpxchg64
commit c0a1666bcb2a33e84187a15eabdcd54056be9a97 upstream.

This fixes a compilation failure on 32-bit systems.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Robert Jarzmik
90df2daa1d ARM: pxa: fix the number of DMA requestor lines
commit 4c35430ad18f5a034302cb90e559ede5a27f93b9 upstream.

The number of requestor lines was clamped to 0 for all pxa architectures
in the requestor declaration. Fix this by using the value.

Fixes: 72b195cb7162 ("ARM: pxa: add the number of DMA requestor lines")
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Robert Jarzmik
c575be9a39 ARM: pxa: add the number of DMA requestor lines
commit 72b195cb716284217e8b270af420bc7e5cf04b3c upstream.

Declare the number of DMA requestor lines per platform :
 - for pxa25x: 40 requestor lines
 - for pxa27x: 75 requestor lines
 - for pxa3xx: 100 requestor lines

This information will be used to activate the DMA flow control or not.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:48 +02:00
Robert Jarzmik
a85f176c85 dmaengine: mmp-pdma: add number of requestors
commit c283e41ef32442f41e7180f9bb1c5aedf9255bfe upstream.

The DMA chip has a fixed number of requestor lines used for flow
control. This number is platform dependent. The pxa_dma dma driver will
use this value to activate or not the flow control.

There won't be any impact on mmp_pdma driver.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Frederic Barrat
6124ed1a71 cxl: Fix driver use count
commit 197267d0356004a31c4d6b6336598f5dff3301e1 upstream.

cxl keeps a driver use count, which is used with the hash memory model
on p8 to know when to upgrade local TLBIs to global and to trigger
callbacks to manage the MMU for PSL8.

If a process opens a context and closes without attaching or fails the
attachment, the driver use count is never decremented. As a
consequence, TLB invalidations remain global, even if there are no
active cxl contexts.

We should increment the driver use count when the process is attaching
to the cxl adapter, and not on open. It's not needed before the
adapter starts using the context and the use count is decremented on
the detach path, so it makes more sense.

It affects only the user api. The kernel api is already doing The
Right Thing.

Signed-off-by: Frederic Barrat <fbarrat@linux.vnet.ibm.com>
Fixes: 7bb5d91a4d ("cxl: Rework context lifetimes")
Acked-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[ajd: backport to stable v4.4 tree]
Signed-off-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Haozhong Zhang
9037837e0c KVM: VMX: remove WARN_ON_ONCE in kvm_vcpu_trigger_posted_interrupt
commit 5753743fa5108b8f98bd61e40dc63f641b26c768 upstream.

WARN_ON_ONCE(pi_test_sn(&vmx->pi_desc)) in kvm_vcpu_trigger_posted_interrupt()
intends to detect the violation of invariant that VT-d PI notification
event is not suppressed when vcpu is in the guest mode. Because the
two checks for the target vcpu mode and the target suppress field
cannot be performed atomically, the target vcpu mode may change in
between. If that does happen, WARN_ON_ONCE() here may raise false
alarms.

As the previous patch fixed the real invariant breaker, remove this
WARN_ON_ONCE() to avoid false alarms, and document the allowed cases
instead.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Haozhong Zhang
fc39e561e3 KVM: VMX: do not change SN bit in vmx_update_pi_irte()
commit dc91f2eb1a4021eb6705c15e474942f84ab9b211 upstream.

In kvm_vcpu_trigger_posted_interrupt() and pi_pre_block(), KVM
assumes that PI notification events should not be suppressed when the
target vCPU is not blocked.

vmx_update_pi_irte() sets the SN field before changing an interrupt
from posting to remapping, but it does not check the vCPU mode.
Therefore, the change of SN field may break above the assumption.
Besides, I don't see reasons to suppress notification events here, so
remove the changes of SN field to avoid race condition.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Reported-by: "Ramamurthy, Venkatesh" <venkatesh.ramamurthy@intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 28b835d60f ("KVM: Update Posted-Interrupts Descriptor when vCPU is preempted")
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Myungho Jung
5e9b526fcc timer/sysclt: Restrict timer migration sysctl values to 0 and 1
commit b94bf594cf8ed67cdd0439e70fa939783471597a upstream.

timer_migration sysctl acts as a boolean switch, so the allowed values
should be restricted to 0 and 1.

Add the necessary extra fields to the sysctl table entry to enforce that.

[ tglx: Rewrote changelog ]

Signed-off-by: Myungho Jung <mhjungk@gmail.com>
Link: http://lkml.kernel.org/r/1492640690-3550-1-git-send-email-mhjungk@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kazuhiro Hayashi <kazuhiro3.hayashi@toshiba.co.jp>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Andreas Gruenbacher
ddf25aea67 gfs2: Fix debugfs glocks dump
commit 10201655b085df8e000822e496e5d4016a167a36 upstream.

The switch to rhashtables (commit 88ffbf3e03) broke the debugfs glock
dump (/sys/kernel/debug/gfs2/<device>/glocks) for dumps bigger than a
single buffer: the right function for restarting an rhashtable iteration
from the beginning of the hash table is rhashtable_walk_enter;
rhashtable_walk_stop + rhashtable_walk_start will just resume from the
current position.

The upstream commit doesn't directly apply to 4.4.y because 4.4.y
doesn't have rhashtable_walk_enter and the following mainline commits:

  92ecd73a887c4a2b94daf5fc35179d75d1c4ef95  
    gfs2: Deduplicate gfs2_{glocks,glstats}_open
  cc37a62785a584f4875788689f3fd1fa6e4eb291  
    gfs2: Replace rhashtable_walk_init with rhashtable_walk_enter

Other than rhashtable_walk_enter, rhashtable_walk_init can fail.  To
handle the failure case in gfs2_glock_seq_stop, we check if
rhashtable_walk_init has initialized iter->walker; if it has not, we
must not call rhashtable_walk_stop or rhashtable_walk_exit.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Eric Biggers
d25fea066a x86/fpu: Don't let userspace set bogus xcomp_bv
commit 814fb7bb7db5433757d76f4c4502c96fc53b0b5e upstream.

[Please apply to 4.4-stable.  Note: the backport includes the
fpstate_init() call in xstateregs_set(), since fix is useless without
it.  It was added by commit 91c3dba7dbc1 ("x86/fpu/xstate: Fix PTRACE
frames for XSAVES"), but it doesn't make sense to backport that whole
commit.]

On x86, userspace can use the ptrace() or rt_sigreturn() system calls to
set a task's extended state (xstate) or "FPU" registers.  ptrace() can
set them for another task using the PTRACE_SETREGSET request with
NT_X86_XSTATE, while rt_sigreturn() can set them for the current task.
In either case, registers can be set to any value, but the kernel
assumes that the XSAVE area itself remains valid in the sense that the
CPU can restore it.

However, in the case where the kernel is using the uncompacted xstate
format (which it does whenever the XSAVES instruction is unavailable),
it was possible for userspace to set the xcomp_bv field in the
xstate_header to an arbitrary value.  However, all bits in that field
are reserved in the uncompacted case, so when switching to a task with
nonzero xcomp_bv, the XRSTOR instruction failed with a #GP fault.  This
caused the WARN_ON_FPU(err) in copy_kernel_to_xregs() to be hit.  In
addition, since the error is otherwise ignored, the FPU registers from
the task previously executing on the CPU were leaked.

Fix the bug by checking that the user-supplied value of xcomp_bv is 0 in
the uncompacted case, and returning an error otherwise.

The reason for validating xcomp_bv rather than simply overwriting it
with 0 is that we want userspace to see an error if it (incorrectly)
provides an XSAVE area in compacted format rather than in uncompacted
format.

Note that as before, in case of error we clear the task's FPU state.
This is perhaps non-ideal, especially for PTRACE_SETREGSET; it might be
better to return an error before changing anything.  But it seems the
"clear on error" behavior is fine for now, and it's a little tricky to
do otherwise because it would mean we couldn't simply copy the full
userspace state into kernel memory in one __copy_from_user().

This bug was found by syzkaller, which hit the above-mentioned
WARN_ON_FPU():

    WARNING: CPU: 1 PID: 0 at ./arch/x86/include/asm/fpu/internal.h:373 __switch_to+0x5b5/0x5d0
    CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.13.0 
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
    task: ffff9ba2bc8e42c0 task.stack: ffffa78cc036c000
    RIP: 0010:__switch_to+0x5b5/0x5d0
    RSP: 0000:ffffa78cc08bbb88 EFLAGS: 00010082
    RAX: 00000000fffffffe RBX: ffff9ba2b8bf2180 RCX: 00000000c0000100
    RDX: 00000000ffffffff RSI: 000000005cb10700 RDI: ffff9ba2b8bf36c0
    RBP: ffffa78cc08bbbd0 R08: 00000000929fdf46 R09: 0000000000000001
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ba2bc8e42c0
    R13: 0000000000000000 R14: ffff9ba2b8bf3680 R15: ffff9ba2bf5d7b40
    FS:  00007f7e5cb10700(0000) GS:ffff9ba2bf400000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000004005cc CR3: 0000000079fd5000 CR4: 00000000001406e0
    Call Trace:
    Code: 84 00 00 00 00 00 e9 11 fd ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 e7 fa ff ff 0f ff 66 0f 1f 84 00 00 00 00 00 e9 c2 fa ff ff <0f> ff 66 0f 1f 84 00 00 00 00 00 e9 d4 fc ff ff 66 66 2e 0f 1f

Here is a C reproducer.  The expected behavior is that the program spin
forever with no output.  However, on a buggy kernel running on a
processor with the "xsave" feature but without the "xsaves" feature
(e.g. Sandy Bridge through Broadwell for Intel), within a second or two
the program reports that the xmm registers were corrupted, i.e. were not
restored correctly.  With CONFIG_X86_DEBUG_FPU=y it also hits the above
kernel warning.

    #define _GNU_SOURCE
    #include <stdbool.h>
    #include <inttypes.h>
    #include <linux/elf.h>
    #include <stdio.h>
    #include <sys/ptrace.h>
    #include <sys/uio.h>
    #include <sys/wait.h>
    #include <unistd.h>

    int main(void)
    {
        int pid = fork();
        uint64_t xstate[512];
        struct iovec iov = { .iov_base = xstate, .iov_len = sizeof(xstate) };

        if (pid == 0) {
            bool tracee = true;
            for (int i = 0; i < sysconf(_SC_NPROCESSORS_ONLN) && tracee; i++)
                tracee = (fork() != 0);
            uint32_t xmm0[4] = { [0 ... 3] = tracee ? 0x00000000 : 0xDEADBEEF };
            asm volatile("   movdqu %0, %%xmm0\n"
                         "   mov %0, %%rbx\n"
                         "1: movdqu %%xmm0, %0\n"
                         "   mov %0, %%rax\n"
                         "   cmp %%rax, %%rbx\n"
                         "   je 1b\n"
                         : "+m" (xmm0) : : "rax", "rbx", "xmm0");
            printf("BUG: xmm registers corrupted!  tracee=%d, xmm0=%08X%08X%08X%08X\n",
                   tracee, xmm0[0], xmm0[1], xmm0[2], xmm0[3]);
        } else {
            usleep(100000);
            ptrace(PTRACE_ATTACH, pid, 0, 0);
            wait(NULL);
            ptrace(PTRACE_GETREGSET, pid, NT_X86_XSTATE, &iov);
            xstate[65] = -1;
            ptrace(PTRACE_SETREGSET, pid, NT_X86_XSTATE, &iov);
            ptrace(PTRACE_CONT, pid, 0, 0);
            wait(NULL);
        }
        return 1;
    }

Note: the program only tests for the bug using the ptrace() system call.
The bug can also be reproduced using the rt_sigreturn() system call, but
only when called from a 32-bit program, since for 64-bit programs the
kernel restores the FPU state from the signal frame by doing XRSTOR
directly from userspace memory (with proper error checking).

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Halcrow <mhalcrow@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wanpeng Li <wanpeng.li@hotmail.com>
Cc: Yu-cheng Yu <yu-cheng.yu@intel.com>
Cc: kernel-hardening@lists.openwall.com
Fixes: 0b29643a58 ("x86/xsaves: Change compacted format xsave area header")
Link: http://lkml.kernel.org/r/20170922174156.16780-2-ebiggers3@gmail.com
Link: http://lkml.kernel.org/r/20170923130016.21448-25-mingo@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
satoru takeuchi
4c16afac18 btrfs: prevent to set invalid default subvolid
commit 6d6d282932d1a609e60dc4467677e0e863682f57 upstream.

`btrfs sub set-default` succeeds to set an ID which isn't corresponding to any
fs/file tree. If such the bad ID is set to a filesystem, we can't mount this
filesystem without specifying `subvol` or `subvolid` mount options.

Fixes: 6ef5ed0d38 ("Btrfs: add ioctl and incompat flag to set the default mount subvol")
Signed-off-by: Satoru Takeuchi <satoru.takeuchi@gmail.com>
Reviewed-by: Qu Wenruo <quwenruo.btrfs@gmx.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Naohiro Aota
0efde43517 btrfs: propagate error to btrfs_cmp_data_prepare caller
commit 78ad4ce014d025f41b8dde3a81876832ead643cf upstream.

btrfs_cmp_data_prepare() (almost) always returns 0 i.e. ignoring errors
from gather_extent_pages(). While the pages are freed by
btrfs_cmp_data_free(), cmp->num_pages still has > 0. Then,
btrfs_extent_same() try to access the already freed pages causing faults
(or violates PageLocked assertion).

This patch just return the error as is so that the caller stop the process.

Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: f441460202 ("btrfs: fix deadlock with extent-same and readpage")
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:47 +02:00
Naohiro Aota
9a7d93dd2c btrfs: fix NULL pointer dereference from free_reloc_roots()
commit bb166d7207432d3c7d10c45dc052f12ba3a2121d upstream.

__del_reloc_root should be called before freeing up reloc_root->node.
If not, calling __del_reloc_root() dereference reloc_root->node, causing
the system BUG.

Fixes: 6bdf131fac23 ("Btrfs: don't leak reloc root nodes on error")
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Nicolai Stange
b08dc7d4cf PCI: Fix race condition with driver_override
commit 9561475db680f7144d2223a409dd3d7e322aca03 upstream.

The driver_override implementation is susceptible to a race condition when
different threads are reading vs. storing a different driver override.  Add
locking to avoid the race condition.

This is in close analogy to commit 6265539776a0 ("driver core: platform:
fix race condition with driver_override") from Adrian Salido.

Fixes: 782a985d7a ("PCI: Introduce new device binding path using pci_dev.driver_override")
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Jim Mattson
21a638c5ef kvm: nVMX: Don't allow L2 to access the hardware CR8
commit 51aa68e7d57e3217192d88ce90fd5b8ef29ec94f upstream.

If L1 does not specify the "use TPR shadow" VM-execution control in
vmcs12, then L0 must specify the "CR8-load exiting" and "CR8-store
exiting" VM-execution controls in vmcs02. Failure to do so will give
the L2 VM unrestricted read/write access to the hardware CR8.

This fixes CVE-2017-12154.

Signed-off-by: Jim Mattson <jmattson@google.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Jan H. Schönherr
7520be6a45 KVM: VMX: Do not BUG() on out-of-bounds guest IRQ
commit 3a8b0677fc6180a467e26cc32ce6b0c09a32f9bb upstream.

The value of the guest_irq argument to vmx_update_pi_irte() is
ultimately coming from a KVM_IRQFD API call. Do not BUG() in
vmx_update_pi_irte() if the value is out-of bounds. (Especially,
since KVM as a whole seems to hang after that.)

Instead, print a message only once if we find that we don't have a
route for a certain IRQ (which can be out-of-bounds or within the
array).

This fixes CVE-2017-1000252.

Fixes: efc644048e ("KVM: x86: Update IRTE for posted-interrupts")
Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Will Deacon
e726c30c75 arm64: fault: Route pte translation faults via do_translation_fault
commit 760bfb47c36a07741a089bf6a28e854ffbee7dc9 upstream.

We currently route pte translation faults via do_page_fault, which elides
the address check against TASK_SIZE before invoking the mm fault handling
code. However, this can cause issues with the path walking code in
conjunction with our word-at-a-time implementation because
load_unaligned_zeropad can end up faulting in kernel space if it reads
across a page boundary and runs into a page fault (e.g. by attempting to
read from a guard region).

In the case of such a fault, load_unaligned_zeropad has registered a
fixup to shift the valid data and pad with zeroes, however the abort is
reported as a level 3 translation fault and we dispatch it straight to
do_page_fault, despite it being a kernel address. This results in calling
a sleeping function from atomic context:

  BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
  in_atomic(): 0, irqs_disabled(): 0, pid: 10290
  Internal error: Oops - BUG: 0 [] PREEMPT SMP
  [...]
  [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
  [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
  [<ffffff8e016977f0>] do_page_fault+0x140/0x330
  [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
  Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
  [...]
  [<ffffff8e016844fc>] el1_da+0x18/0x78
  [<ffffff8e017f399c>] path_parentat+0x44/0x88
  [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
  [<ffffff8e017f5044>] filename_create+0x4c/0x128
  [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
  [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
  Code: 36380080 d5384100 f9400800 9402566d (d4210000)
  ---[ end trace 2d01889f2bca9b9f ]---

Fix this by dispatching all translation faults to do_translation_faults,
which avoids invoking the page fault logic for faults on kernel addresses.

Reported-by: Ankit Jain <ankijain@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Marc Zyngier
638e7874f6 arm64: Make sure SPsel is always set
commit 5371513fb338fb9989c569dc071326d369d6ade8 upstream.

When the kernel is entered at EL2 on an ARMv8.0 system, we construct
the EL1 pstate and make sure this uses the the EL1 stack pointer
(we perform an exception return to EL1h).

But if the kernel is either entered at EL1 or stays at EL2 (because
we're on a VHE-capable system), we fail to set SPsel, and use whatever
stack selection the higher exception level has choosen for us.

Let's not take any chance, and make sure that SPsel is set to one
before we decide the mode we're going to run in.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Oleg Nesterov
9237605e0b seccomp: fix the usage of get/put_seccomp_filter() in seccomp_get_filter()
commit 66a733ea6b611aecf0119514d2dddab5f9d6c01e upstream.

As Chris explains, get_seccomp_filter() and put_seccomp_filter() can end
up using different filters. Once we drop ->siglock it is possible for
task->seccomp.filter to have been replaced by SECCOMP_FILTER_FLAG_TSYNC.

Fixes: f8e529ed94 ("seccomp, ptrace: add support for dumping seccomp filters")
Reported-by: Chris Salls <chrissalls5@gmail.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[tycho: add __get_seccomp_filter vs. open coding refcount_inc()]
Signed-off-by: Tycho Andersen <tycho@docker.com>
[kees: tweak commit log]
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Christoph Hellwig
668cee82cd bsg-lib: don't free job in bsg_prepare_job
commit f507b54dccfd8000c517d740bc45f20c74532d18 upstream.

The job structure is allocated as part of the request, so we should not
free it in the error path of bsg_prepare_job.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Vladis Dronov
9d74367d1a nl80211: check for the required netlink attributes presence
commit e785fa0a164aa11001cba931367c7f94ffaff888 upstream.

nl80211_set_rekey_data() does not check if the required attributes
NL80211_REKEY_DATA_{REPLAY_CTR,KEK,KCK} are present when processing
NL80211_CMD_SET_REKEY_OFFLOAD request. This request can be issued by
users with CAP_NET_ADMIN privilege and may result in NULL dereference
and a system crash. Add a check for the required attributes presence.
This patch is based on the patch by bo Zhang.

This fixes CVE-2017-12153.

References: https://bugzilla.redhat.com/show_bug.cgi?id=1491046
Fixes: e5497d766a ("cfg80211/nl80211: support GTK rekey offload")
Reported-by: bo Zhang <zhangbo5891001@gmail.com>
Signed-off-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:46 +02:00
Andreas Gruenbacher
3393445ef4 vfs: Return -ENXIO for negative SEEK_HOLE / SEEK_DATA offsets
commit fc46820b27a2d9a46f7e90c9ceb4a64a1bc5fab8 upstream.

In generic_file_llseek_size, return -ENXIO for negative offsets as well
as offsets beyond EOF.  This affects filesystems which don't implement
SEEK_HOLE / SEEK_DATA internally, possibly because they don't support
holes.

Fixes xfstest generic/448.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Steve French
3bb7084cc0 SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags
commit 1013e760d10e614dc10b5624ce9fc41563ba2e65 upstream.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Steve French
02ef29f9cb SMB: Validate negotiate (to protect against downgrade) even if signing off
commit 0603c96f3af50e2f9299fa410c224ab1d465e0f9 upstream.

As long as signing is supported (ie not a guest user connection) and
connection is SMB3 or SMB3.02, then validate negotiate (protect
against man in the middle downgrade attacks).  We had been doing this
only when signing was required, not when signing was just enabled,
but this more closely matches recommended SMB3 behavior and is
better security.  Suggested by Metze.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Jeremy Allison <jra@samba.org>
Acked-by: Stefan Metzmacher <metze@samba.org>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Steve French
c096b31f9d Fix SMB3.1.1 guest authentication to Samba
commit 23586b66d84ba3184b8820277f3fc42761640f87 upstream.

Samba rejects SMB3.1.1 dialect (vers=3.1.1) negotiate requests from
the kernel client due to the two byte pad at the end of the negotiate
contexts.

Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Tyrel Datwyler
fe37a445ea powerpc/pseries: Fix parent_dn reference leak in add_dt_node()
commit b537ca6fede69a281dc524983e5e633d79a10a08 upstream.

A reference to the parent device node is held by add_dt_node() for the
node to be added. If the call to dlpar_configure_connector() fails
add_dt_node() returns ENOENT and that reference is not freed.

Add a call to of_node_put(parent_dn) prior to bailing out after a
failed dlpar_configure_connector() call.

Fixes: 8d5ff32076 ("powerpc/pseries: Make dlpar_configure_connector parent node aware")
Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Eric Biggers
638b385050 KEYS: prevent KEYCTL_READ on negative key
commit 37863c43b2c6464f252862bf2e9768264e961678 upstream.

Because keyctl_read_key() looks up the key with no permissions
requested, it may find a negatively instantiated key.  If the key is
also possessed, we went ahead and called ->read() on the key.  But the
key payload will actually contain the ->reject_error rather than the
normal payload.  Thus, the kernel oopses trying to read the
user_key_payload from memory address (int)-ENOKEY = 0x00000000ffffff82.

Fortunately the payload data is stored inline, so it shouldn't be
possible to abuse this as an arbitrary memory read primitive...

Reproducer:
    keyctl new_session
    keyctl request2 user desc '' @s
    keyctl read $(keyctl show | awk '/user: desc/ {print $1}')

It causes a crash like the following:
     BUG: unable to handle kernel paging request at 00000000ffffff92
     IP: user_read+0x33/0xa0
     PGD 36a54067 P4D 36a54067 PUD 0
     Oops: 0000 [] SMP
     CPU: 0 PID: 211 Comm: keyctl Not tainted 4.14.0-rc1 
     Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
     task: ffff90aa3b74c3c0 task.stack: ffff9878c0478000
     RIP: 0010:user_read+0x33/0xa0
     RSP: 0018:ffff9878c047bee8 EFLAGS: 00010246
     RAX: 0000000000000001 RBX: ffff90aa3d7da340 RCX: 0000000000000017
     RDX: 0000000000000000 RSI: 00000000ffffff82 RDI: ffff90aa3d7da340
     RBP: ffff9878c047bf00 R08: 00000024f95da94f R09: 0000000000000000
     R10: 0000000000000001 R11: 0000000000000000 R12: 0000000000000000
     R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
     FS:  00007f58ece69740(0000) GS:ffff90aa3e200000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 00000000ffffff92 CR3: 0000000036adc001 CR4: 00000000003606f0
     Call Trace:
      keyctl_read_key+0xac/0xe0
      SyS_keyctl+0x99/0x120
      entry_SYSCALL_64_fastpath+0x1f/0xbe
     RIP: 0033:0x7f58ec787bb9
     RSP: 002b:00007ffc8d401678 EFLAGS: 00000206 ORIG_RAX: 00000000000000fa
     RAX: ffffffffffffffda RBX: 00007ffc8d402800 RCX: 00007f58ec787bb9
     RDX: 0000000000000000 RSI: 00000000174a63ac RDI: 000000000000000b
     RBP: 0000000000000004 R08: 00007ffc8d402809 R09: 0000000000000020
     R10: 0000000000000000 R11: 0000000000000206 R12: 00007ffc8d402800
     R13: 00007ffc8d4016e0 R14: 0000000000000000 R15: 0000000000000000
     Code: e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb e8 a4 b4 ad ff 85 c0 74 09 80 3d b9 4c 96 00 00 74 43 48 8b b3 20 01 00 00 4d 85 ed <0f> b7 5e 10 74 29 4d 85 e4 74 24 4c 39 e3 4c 89 e2 4c 89 ef 48
     RIP: user_read+0x33/0xa0 RSP: ffff9878c047bee8
     CR2: 00000000ffffff92

Fixes: 61ea0c0ba9 ("KEYS: Skip key state checks when checking for possession")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Eric Biggers
539255aea8 KEYS: prevent creating a different user's keyrings
commit 237bbd29f7a049d310d907f4b2716a7feef9abf3 upstream.

It was possible for an unprivileged user to create the user and user
session keyrings for another user.  For example:

    sudo -u '' sh -c 'keyctl add keyring _uid.4000 "" @u
                           keyctl add keyring _uid_ses.4000 "" @u
                           sleep 15' &
    sleep 1
    sudo -u '' keyctl describe @u
    sudo -u '' keyctl describe @us

This is problematic because these "fake" keyrings won't have the right
permissions.  In particular, the user who created them first will own
them and will have full access to them via the possessor permissions,
which can be used to compromise the security of a user's keys:

    -4: alswrv-----v------------  3000     0 keyring: _uid.4000
    -5: alswrv-----v------------  3000     0 keyring: _uid_ses.4000

Fix it by marking user and user session keyrings with a flag
KEY_FLAG_UID_KEYRING.  Then, when searching for a user or user session
keyring by name, skip all keyrings that don't have the flag set.

Fixes: 69664cf16a ("keys: don't generate user and user session keyrings unless they're accessed")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Eric Biggers
af24e9d8ba KEYS: fix writing past end of user-supplied buffer in keyring_read()
commit e645016abc803dafc75e4b8f6e4118f088900ffb upstream.

Userspace can call keyctl_read() on a keyring to get the list of IDs of
keys in the keyring.  But if the user-supplied buffer is too small, the
kernel would write the full list anyway --- which will corrupt whatever
userspace memory happened to be past the end of the buffer.  Fix it by
only filling the space that is available.

Fixes: b2a4df200d ("KEYS: Expand the capacity of a keyring")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
LEROY Christophe
362711d59b crypto: talitos - fix sha224
commit afd62fa26343be6445479e75de9f07092a061459 upstream.

Kernel crypto tests report the following error at startup

[    2.752626] alg: hash: Test 4 failed for sha224-talitos
[    2.757907] 00000000: 30 e2 86 e2 e7 8a dd 0d d7 eb 9f d5 83 fe f1 b0
00000010: 2d 5a 6c a5 f9 55 ea fd 0e 72 05 22

This patch fixes it

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
LEROY Christophe
231c4f646b crypto: talitos - Don't provide setkey for non hmac hashing algs.
commit 56136631573baa537a15e0012055ffe8cfec1a33 upstream.

Today, md5sum fails with error -ENOKEY because a setkey
function is set for non hmac hashing algs, see strace output below:

mmap(NULL, 378880, PROT_READ, MAP_SHARED, 6, 0) = 0x77f50000
accept(3, 0, NULL)                      = 7
vmsplice(5, [{"bin/\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 378880}], 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 262144
splice(4, NULL, 7, NULL, 262144, SPLICE_F_MORE) = -1 ENOKEY (Required key not available)
write(2, "Generation of hash for file kcap"..., 50) = 50
munmap(0x77f50000, 378880)              = 0

This patch ensures that setkey() function is set only
for hmac hashing.

Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:45 +02:00
Xin Long
9d2534917c scsi: scsi_transport_iscsi: fix the issue that iscsi_if_rx doesn't parse nlmsg properly
commit c88f0e6b06f4092995688211a631bb436125d77b upstream.

ChunYu found a kernel crash by syzkaller:

[  651.617875] kasan: CONFIG_KASAN_INLINE enabled
[  651.618217] kasan: GPF could be caused by NULL-ptr deref or user memory access
[  651.618731] general protection fault: 0000 [] SMP KASAN
[  651.621543] CPU: 1 PID: 9539 Comm: scsi Not tainted 4.11.0.cov 
[  651.621938] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[  651.622309] task: ffff880117780000 task.stack: ffff8800a3188000
[  651.622762] RIP: 0010:skb_release_data+0x26c/0x590
[...]
[  651.627260] Call Trace:
[  651.629156]  skb_release_all+0x4f/0x60
[  651.629450]  consume_skb+0x1a5/0x600
[  651.630705]  netlink_unicast+0x505/0x720
[  651.632345]  netlink_sendmsg+0xab2/0xe70
[  651.633704]  sock_sendmsg+0xcf/0x110
[  651.633942]  ___sys_sendmsg+0x833/0x980
[  651.637117]  __sys_sendmsg+0xf3/0x240
[  651.638820]  SyS_sendmsg+0x32/0x50
[  651.639048]  entry_SYSCALL_64_fastpath+0x1f/0xc2

It's caused by skb_shared_info at the end of sk_buff was overwritten by
ISCSI_KEVENT_IF_ERROR when parsing nlmsg info from skb in iscsi_if_rx.

During the loop if skb->len == nlh->nlmsg_len and both are sizeof(*nlh),
ev = nlmsg_data(nlh) will acutally get skb_shinfo(SKB) instead and set a
new value to skb_shinfo(SKB)->nr_frags by ev->type.

This patch is to fix it by checking nlh->nlmsg_len properly there to
avoid over accessing sk_buff.

Reported-by: ChunYu Wang <chunwang@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Chris Leech <cleech@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Dennis Yang
29854a77f7 md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list
commit 184a09eb9a2fe425e49c9538f1604b05ed33cfef upstream.

In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
set, it indicates that this stripe_head is already in the raid5_plug_cb
list and release_stripe() would be called instead to drop a reference
count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
stripe_head and it will get queued into the raid5_plug_cb list.

Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
A stripe could be re-added to plug list while it is still on that list
in the following situation. If stripe_head A is added to another
stripe_head B's batch list, in this case A will have its
batch_head != NULL and be added into the plug list. After that,
stripe_head B gets handled and called break_stripe_batch_list() to
reset all the batched stripe_head(including A which is still on
the plug list)'s state and reset their batch_head to NULL.
Before the plug list gets processed, if there is another write request
comes in and get stripe_head A, A will have its batch_head == NULL
(cleared by calling break_stripe_batch_list() on B) and be added to
plug list once again.

Signed-off-by: Dennis Yang <dennisyang@qnap.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Shaohua Li
d03d156786 md/raid5: fix a race condition in stripe batch
commit 3664847d95e60a9a943858b7800f8484669740fc upstream.

We have a race condition in below scenario, say have 3 continuous stripes, sh1,
sh2 and sh3, sh1 is the stripe_head of sh2 and sh3:

CPU1				CPU2				CPU3
handle_stripe(sh3)
				stripe_add_to_batch_list(sh3)
				-> lock(sh2, sh3)
				-> lock batch_lock(sh1)
				-> add sh3 to batch_list of sh1
				-> unlock batch_lock(sh1)
								clear_batch_ready(sh1)
								-> lock(sh1) and batch_lock(sh1)
								-> clear STRIPE_BATCH_READY for all stripes in batch_list
								-> unlock(sh1) and batch_lock(sh1)
->clear_batch_ready(sh3)
-->test_and_clear_bit(STRIPE_BATCH_READY, sh3)
--->return 0 as sh->batch == NULL
				-> sh3->batch_head = sh1
				-> unlock (sh2, sh3)

In CPU1, handle_stripe will continue handle sh3 even it's in batch stripe list
of sh1. By moving sh3->batch_head assignment in to batch_lock, we make it
impossible to clear STRIPE_BATCH_READY before batch_head is set.

Thanks Stephane for helping debug this tricky issue.

Reported-and-tested-by: Stephane Thiell <sthiell@stanford.edu>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Bo Yan
68a4a52899 tracing: Erase irqsoff trace with empty write
commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream.

One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com

Cc: <mingo@redhat.com>
Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Bo Yan <byan@nvidia.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Tahsin Erdogan
9c5afa726a tracing: Fix trace_pipe behavior for instance traces
commit 75df6e688ccd517e339a7c422ef7ad73045b18a2 upstream.

When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.

Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:

cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe

Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com

Fixes: 10246fa35d ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Paul Mackerras
f75c0042f1 KVM: PPC: Book3S: Fix race and leak in kvm_vm_ioctl_create_spapr_tce()
commit 47c5310a8dbe7c2cb9f0083daa43ceed76c257fa upstream, with part
of commit edd03602d97236e8fea13cd76886c576186aa307 folded in.

Nixiaoming pointed out that there is a memory leak in
kvm_vm_ioctl_create_spapr_tce() if the call to anon_inode_getfd()
fails; the memory allocated for the kvmppc_spapr_tce_table struct
is not freed, and nor are the pages allocated for the iommu
tables.

David Hildenbrand pointed out that there is a race in that the
function checks early on that there is not already an entry in the
stt->iommu_tables list with the same LIOBN, but an entry with the
same LIOBN could get added between then and when the new entry is
added to the list.

This fixes both problems.  To simplify things, we now call
anon_inode_getfd() before placing the new entry in the list.  The
check for an existing entry is done while holding the kvm->lock
mutex, immediately before adding the new entry to the list.

[paulus@ozlabs.org - folded in that part of edd03602d972 ("KVM:
 PPC: Book3S HV: Protect updates to spapr_tce_tables list", 2017-08-28)
 which restructured the code that 47c5310a8dbe modified, to avoid
 a build failure caused by the absence of put_unused_fd().
 Also removed the locked memory accounting, since it doesn't exist
 in this version, and adjusted the commit message.]

Fixes: 54738c0971 ("KVM: PPC: Accelerate H_PUT_TCE by implementing it in real mode")
Reported-by: Nixiaoming <nixiaoming@huawei.com>
Reported-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Avraham Stern
7d8fbf3db1 mac80211: flush hw_roc_start work before cancelling the ROC
commit 6e46d8ce894374fc135c96a8d1057c6af1fef237 upstream.

When HW ROC is supported it is possible that after the HW notified
that the ROC has started, the ROC was cancelled and another ROC was
added while the hw_roc_start worker is waiting on the mutex (since
cancelling the ROC and adding another one also holds the same mutex).
As a result, the hw_roc_start worker will continue to run after the
new ROC is added but before it is actually started by the HW.
This may result in notifying userspace that the ROC has started before
it actually does, or in case of management tx ROC, in an attempt to
tx while not on the right channel.

In addition, when the driver will notify mac80211 that the second ROC
has started, mac80211 will warn that this ROC has already been
notified.

Fix this by flushing the hw_roc_start work before cancelling an ROC.

Signed-off-by: Avraham Stern <avraham.stern@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Shu Wang
fcc949a488 cifs: release auth_key.response for reconnect.
commit f5c4ba816315d3b813af16f5571f86c8d4e897bd upstream.

There is a race that cause cifs reconnect in cifs_mount,
- cifs_mount
  - cifs_get_tcp_session
    - [ start thread cifs_demultiplex_thread
      - cifs_read_from_socket: -ECONNABORTED
        - DELAY_WORK smb2_reconnect_server ]
  - cifs_setup_session
  - [ smb2_reconnect_server ]

auth_key.response was allocated in cifs_setup_session, and
will release when the session destoried. So when session re-
connect, auth_key.response should be check and released.

Tested with my system:
CIFS VFS: Free previous auth_key.response = ffff8800320bbf80

A simple auth_key.response allocation call trace:
- cifs_setup_session
- SMB2_sess_setup
- SMB2_sess_auth_rawntlmssp_authenticate
- build_ntlmssp_auth_blob
- setup_ntlmv2_rsp

Signed-off-by: Shu Wang <shuwang@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-10-05 09:41:44 +02:00
Jaegeuk Kim
13f002354d f2fs: catch up to v4.14-rc1
This is cherry-picked from upstrea-f2fs-stable-linux-4.4.y.

Changes include:

commit c7fd9e2b4a ("f2fs: hurry up to issue discard after io interruption")
commit 603dde3965 ("f2fs: fix to show correct discard_granularity in sysfs")
...
commit 565f0225f9 ("f2fs: factor out discard command info into discard_cmd_control")
commit c4cc29d19e ("f2fs: remove batched discard in f2fs_trim_fs")

Change-Id: Icd8a85ac0c19a8aa25cd2591a12b4e9b85bdf1c5
Signed-off-by: Jaegeuk Kim <jaegeuk@google.com>
2017-10-03 17:19:14 -07:00
Juri Lelli
b0fa18e1ca UPSTREAM: cpufreq: schedutil: use now as reference when aggregating shared policy requests
Currently, sugov_next_freq_shared() uses last_freq_update_time as a
reference to decide when to start considering CPU contributions as
stale.

However, since last_freq_update_time is set by the last CPU that issued
a frequency transition, this might cause problems in certain cases. In
practice, the detection of stale utilization values fails whenever the
CPU with such values was the last to update the policy. For example (and
please note again that the SCHED_CPUFREQ_RT flag is not the problem
here, but only the detection of after how much time that flag has to be
considered stale), suppose a policy with 2 CPUs:

               CPU0                |               CPU1
                                   |
                                   |     RT task scheduled
                                   |     SCHED_CPUFREQ_RT is set
                                   |     CPU1->last_update = now
                                   |     freq transition to max
                                   |     last_freq_update_time = now
                                   |

                        more than TICK_NSEC nsecs

                                   |
     a small CFS wakes up          |
     CPU0->last_update = now1      |
     delta_ns(CPU0) < TICK_NSEC*   |
     CPU0's util is considered     |
     delta_ns(CPU1) =              |
      last_freq_update_time -      |
      CPU1->last_update = 0        |
      < TICK_NSEC                  |
     CPU1 is still considered      |
     CPU1->SCHED_CPUFREQ_RT is set |
     we stay at max (until CPU1    |
     exits from idle)              |

* delta_ns is actually negative as now1 > last_freq_update_time

While last_freq_update_time is a sensible reference for rate limiting,
it doesn't seem to be useful for working around stale CPU states.

Fix the problem by always considering now (time) as the reference for
deciding when CPUs have stale contributions.

Signed-off-by: Juri Lelli <juri.lelli@arm.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
(cherry picked from commit d86ab9cff8b936aadde444d0e263a8db5ff0349b)
2017-10-03 21:18:45 +00:00
Steve Muckle
69fbcb521a ANDROID: add script to fetch android kernel config fragments
The Android kernel config fragments now live in a separate repository.
To prevent others from having to search for this location, add a script
to fetch and unpack the fragments.

Update .gitignore to include these fragments.

Change-Id: If2d4a59b86e4573b0a9b3190025dfe4191870b46
Signed-off-by: Steve Muckle <smuckle@google.com>
2017-10-03 10:59:04 -07:00