commit 198de51dbc3454d95b015ca0a055b673f85f01bb upstream.
Commit 64d513ac31 ("scsi: use host wide tags by default") causes
the SCSI core to queue more commands then we can handle on devices with
multiple LUNs, limit the queue depth at the scsi-host level instead of
per slave to fix this.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1315013
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3ba3458fb9c050718b95275a3310b74415e767e2 ]
When sending a UDPv6 message longer than MTU, account for the length
of fragmentable IPv6 extension headers in skb->network_header offset.
Same as we do in alloc_new_skb path in __ip6_append_data().
This ensures that later on __ip6_make_skb() will make space in
headroom for fragmentable extension headers:
/* move skb->data to ip header from ext header */
if (skb->data < skb_network_header(skb))
__skb_pull(skb, skb_network_offset(skb));
Prevents a splat due to skb_under_panic:
skbuff: skb_under_panic: text:ffffffff8143397b len:2126 put:14 \
head:ffff880005bacf50 data:ffff880005bacf4a tail:0x48 end:0xc0 dev:lo
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:104!
invalid opcode: 0000 [#1] KASAN
CPU: 0 PID: 160 Comm: reproducer Not tainted 4.6.0-rc2 #65
[...]
Call Trace:
[<ffffffff813eb7b9>] skb_push+0x79/0x80
[<ffffffff8143397b>] eth_header+0x2b/0x100
[<ffffffff8141e0d0>] neigh_resolve_output+0x210/0x310
[<ffffffff814eab77>] ip6_finish_output2+0x4a7/0x7c0
[<ffffffff814efe3a>] ip6_output+0x16a/0x280
[<ffffffff815440c1>] ip6_local_out+0xb1/0xf0
[<ffffffff814f1115>] ip6_send_skb+0x45/0xd0
[<ffffffff81518836>] udp_v6_send_skb+0x246/0x5d0
[<ffffffff8151985e>] udpv6_sendmsg+0xa6e/0x1090
[...]
Reported-by: Ji Jianwen <jiji@redhat.com>
Signed-off-by: Jakub Sitnicki <jkbs@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b6ee376cb0b7fb4e7e07d6cd248bd40436fb9ba6 ]
When creating an ip6tnl tunnel with ip tunnel, rtnl_link_ops is not set
before ip6_tnl_create2 is called. When register_netdevice is called, there
is no linkinfo attribute in the NEWLINK message because of that.
Setting rtnl_link_ops before calling register_netdevice fixes that.
Fixes: 0b11245722 ("ip6tnl: add support of link creation via rtnl")
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit be447f305494e019dfc37ea4cdf3b0e4200b4eba ]
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the
right place.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5745b8232e942abd5e16e85fa9b27cc21324acf0 ]
pskb_may_pull() can change skb->data, so we have to load ptr/optr at the
right place.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 016adb7260f481168c03e09f785184d6d5278894 ]
After commit f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using
alloc_netdev"), default qdisc was changed to noqueue because
tuntap does not set tx_queue_len during .setup(). This patch restores
default qdisc by setting tx_queue_len in tun_setup().
Fixes: f84bb1eac0 ("net: fix IFF_NO_QUEUE for drivers using alloc_netdev")
Cc: Phil Sutter <phil@nwl.cc>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5a5abb1fa3b05dd6aa821525832644c1e7d2905f ]
Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:
[ 52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
[ 52.765688] other info that might help us debug this:
[ 52.765695] rcu_scheduler_active = 1, debug_locks = 1
[ 52.765701] 1 lock held by a.out/1525:
[ 52.765704] #0: (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
[ 52.765721] stack backtrace:
[ 52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
[...]
[ 52.765768] Call Trace:
[ 52.765775] [<ffffffff813e488d>] dump_stack+0x85/0xc8
[ 52.765784] [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
[ 52.765792] [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
[ 52.765801] [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
[ 52.765810] [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
[ 52.765818] [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
[ 52.765827] [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
[ 52.765834] [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
[ 52.765843] [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
[ 52.765850] [<ffffffff81261519>] SyS_ioctl+0x79/0x90
[ 52.765858] [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
[ 52.765866] [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25
Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.
Since the fix in f91ff5b9ff ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.
Since its introduction in 9940516259 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.
Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c57c7a95da842807b475b823ed2e5435c42cb3b0 ]
Size of the attribute IFLA_PHYS_PORT_NAME was missing.
Fixes: db24a9044e ("net: add support for phys_port_name")
CC: David Ahern <dsahern@gmail.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5e263f712691615fb802f06c98d7638c378f5d11 ]
When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP,
we should ignore this error code and continue to set the ageing time.
Fixes: c62987bbd8 ("bridge: push bridge setting ageing_time down to switchdev")
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2d4212261fdf13e29728ddb5ea9d60c342cc92b5 ]
IPv6 counters updates use a different macro than IPv4.
Fixes: 36cbb2452c ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 071d36bf21bcc837be00cea55bcef8d129e7f609 ]
A crash is observed when a decrypted packet is processed in receive
path. get_rps_cpus() tries to dereference the skb->dev fields but it
appears that the device is freed from the poison pattern.
[<ffffffc000af58ec>] get_rps_cpu+0x94/0x2f0
[<ffffffc000af5f94>] netif_rx_internal+0x140/0x1cc
[<ffffffc000af6094>] netif_rx+0x74/0x94
[<ffffffc000bc0b6c>] xfrm_input+0x754/0x7d0
[<ffffffc000bc0bf8>] xfrm_input_resume+0x10/0x1c
[<ffffffc000ba6eb8>] esp_input_done+0x20/0x30
[<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
[<ffffffc0000b7324>] worker_thread+0x2f8/0x418
[<ffffffc0000bb40c>] kthread+0xe0/0xec
-013|get_rps_cpu(
| dev = 0xFFFFFFC08B688000,
| skb = 0xFFFFFFC0C76AAC00 -> (
| dev = 0xFFFFFFC08B688000 -> (
| name =
"......................................................
| name_hlist = (next = 0xAAAAAAAAAAAAAAAA, pprev =
0xAAAAAAAAAAA
Following are the sequence of events observed -
- Encrypted packet in receive path from netdevice is queued
- Encrypted packet queued for decryption (asynchronous)
- Netdevice brought down and freed
- Packet is decrypted and returned through callback in esp_input_done
- Packet is queued again for process in network stack using netif_rx
Since the device appears to have been freed, the dereference of
skb->dev in get_rps_cpus() leads to an unhandled page fault
exception.
Fix this by holding on to device reference when queueing packets
asynchronously and releasing the reference on call back return.
v2: Make the change generic to xfrm as mentioned by Steffen and
update the title to xfrm
Suggested-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Jerome Stanislaus <jeromes@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4cfc86f3dae6ca38ed49cdd78f458a03d4d87992 ]
Field fl4.flowi4_flags is not initialized in fib_compute_spec_dst()
before calling fib_lookup(), which means fib_table_lookup() is
using non-deterministic data at this line:
if (!(flp->flowi4_flags & FLOWI_FLAG_SKIP_NH_OIF)) {
Fix by initializing the entire fl4 structure, which will prevent
similar issues as fields are added in the future by ensuring that
all fields are initialized to zero unless explicitly initialized
to another value.
Fixes: 58189ca7b2 ("net: Fix vti use case with oif in dst lookups")
Suggested-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ad0ea1989cc4d5905941d0a9e62c63ad6d859cef ]
Currently, ingress ipv4 broadcast datagrams are dropped since,
in udp_v4_early_demux(), ip_check_mc_rcu() is invoked even on
bcast packets.
This patch addresses the issue, invoking ip_check_mc_rcu()
only for mcast packets.
Fixes: 6e54030932 ("ipv4/udp: Verify multicast group is ours in upd_v4_early_demux()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fe30937b65354c7fec244caebbdaae68e28ca797 ]
bond_get_stats() can be called from rtnetlink (with RTNL held)
or from /proc/net/dev seq handler (with RCU held)
The logic added in commit 5f0c5f73e5 ("bonding: make global bonding
stats more reliable") kind of assumed only one cpu could run there.
If multiple threads are reading /proc/net/dev, stats can be really
messed up after a while.
A second problem is that some fields are 32bit, so we need to properly
handle the wrap around problem.
Given that RTNL is not always held, we need to use
bond_for_each_slave_rcu().
Fixes: 5f0c5f73e5 ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eee577232203842b4dcadb7ab477a298479633ed ]
When un-mapping skb->data in __bcmgenet_tx_reclaim(),
we must use the length that was used in original dma_map_single(),
instead of skb->len that might be bigger (includes the frags)
We simply can store skb_len into tx_cb_ptr->dma_len and use it
at unmap time.
Fixes: 1c1008c793 ("net: bcmgenet: add main driver file")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2c9a266afefe137bff06bbe0fc48b4d3b3cb348c ]
When running small packets [length < 256 bytes] traffic, packets were
being dropped due to invalid data in those packets which were
delivered by the driver upto the stack. Using pci_dma_sync_single_for_cpu
ensures copying latest and updated data into skb from the receive buffer.
Signed-off-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: Manish Chopra <manish.chopra@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e316ea62e3203d524ff0239a40c56d3a39ad1b5c ]
Now SYN_RECV request sockets are installed in ehash table, an ICMP
handler can find a request socket while another cpu handles an incoming
packet transforming this SYN_RECV request socket into an ESTABLISHED
socket.
We need to remove the now obsolete WARN_ON(req->sk), since req->sk
is set when a new child is created and added into listener accept queue.
If this race happens, the ICMP will do nothing special.
Fixes: 079096f103 ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Ben Lazarus <blazarus@google.com>
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e8e56ffd9d2973398b60ece1f1bebb8d67b4d032 ]
Locking ppp_mutex must be done before dereferencing file->private_data,
otherwise it could be modified before ppp_unattached_ioctl() takes the
lock. This could lead ppp_unattached_ioctl() to override ->private_data,
thus leaking reference to the ppp_file previously pointed to.
v2: lock all ppp_ioctl() instead of just checking private_data in
ppp_unattached_ioctl(), to avoid ambiguous behaviour.
Fixes: f3ff8a4d80 ("ppp: push BKL down into the driver")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 83d6f1f15f8cce844b0a131cbc63e444620e48b5 ]
Code that was added back in 2.6.38 has an obvious overflow
when accessing a static array, and at the time it was added
only a code comment was put in front of it as a reminder
to have it reviewed properly.
This has not happened, but gcc-6 now points to the specific
overflow:
drivers/net/wireless/ath/ath9k/eeprom.c: In function 'ath9k_hw_get_gain_boundaries_pdadcs':
drivers/net/wireless/ath/ath9k/eeprom.c:483:44: error: array subscript is above array bounds [-Werror=array-bounds]
maxPwrT4[i] = data_9287[idxL].pwrPdg[i][4];
~~~~~~~~~~~~~~~~~~~~~~~~~^~~
It turns out that the correct array length exists in the local
'intercepts' variable of this function, so we can just use that
instead of hardcoding '4', so this patch changes all three
instances to use that variable. The other two instances were
already correct, but it's more consistent this way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 940cd2c12e ("ath9k_hw: merge the ar9287 version of ath9k_hw_get_gain_boundaries_pdadcs")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e725a66c0202b5f36c2f9d59d26a65c53bbf21f7 ]
gcc-6 finds an out of bounds access in the fst_add_one function
when calculating the end of the mmio area:
drivers/net/wan/farsync.c: In function 'fst_add_one':
drivers/net/wan/farsync.c:418:53: error: index 2 denotes an offset greater than size of 'u8[2][8192] {aka unsigned char[2][8192]}' [-Werror=array-bounds]
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^
include/linux/compiler-gcc.h:158:21: note: in definition of macro '__compiler_offsetof'
__builtin_offsetof(a, b)
^
drivers/net/wan/farsync.c:418:37: note: in expansion of macro 'offsetof'
#define BUF_OFFSET(X) (BFM_BASE + offsetof(struct buf_window, X))
^~~~~~~~
drivers/net/wan/farsync.c:2519:36: note: in expansion of macro 'BUF_OFFSET'
+ BUF_OFFSET ( txBuffer[i][NUM_TX_BUFFER][0]);
^~~~~~~~~~
The warning is correct, but not critical because this appears
to be a write-only variable that is set by each WAN driver but
never accessed afterwards.
I'm taking the minimal fix here, using the correct pointer by
pointing 'mem_end' to the last byte inside of the register area
as all other WAN drivers do, rather than the first byte outside of
it. An alternative would be to just remove the mem_end member
entirely.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit baefd7015cdb304ce6c94f9679d0486c71954766 ]
The implementation of QP paravirtualization back in linux-3.7 included
some code that looks very dubious, and gcc-6 has grown smart enough
to warn about it:
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'verify_qp_parameters':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3154:5: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation]
if (optpar & MLX4_QP_OPTPAR_ALT_ADDR_PATH) {
^~
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:3144:4: note: ...this 'if' clause, but it is not
if (slave != mlx4_master_func_num(dev))
>From looking at the context, I'm reasonably sure that the indentation
is correct but that it should have contained curly braces from the
start, as the update_gid() function in the same patch correctly does.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 54679e1482 ("mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop")
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 34b88a68f26a75e4fded796f1a49c40f82234b7d ]
The syzkaller fuzzer hit the following use-after-free:
Call Trace:
[<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295
[<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261
[< inline >] SYSC_recvmmsg net/socket.c:2281
[<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270
[<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a
arch/x86/entry/entry_64.S:185
And, as Dmitry rightly assessed, that is because we can drop the
reference and then touch it when the underlying recvmsg calls return
some packets and then hit an error, which will make recvmmsg to set
sock->sk->sk_err, oops, fix it.
Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Fixes: a2e2725541 ("net: Introduce recvmmsg socket syscall")
http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fbd40ea0180a2d328c5adc61414dc8bab9335ce2 ]
When an inetdev is destroyed, every address assigned to the interface
is removed. And in this scenerio we do two pointless things which can
be very expensive if the number of assigned interfaces is large:
1) Address promotion. We are deleting all addresses, so there is no
point in doing this.
2) A full nf conntrack table purge for every address. We only need to
do this once, as is already caught by the existing
masq_dev_notifier so masq_inet_event() can skip this.
Reported-by: Solar Designer <solar@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Tested-by: Cyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4c656c13b254d598e83e586b7b4d36a2043dad85 ]
This fixes a regression in the bridge ageing time caused by:
commit c62987bbd8 ("bridge: push bridge setting ageing_time down to switchdev")
There are users of Linux bridge which use the feature that if ageing time
is set to 0 it causes entries to never expire. See:
https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge
For a pure software bridge, it is unnecessary for the code to have
arbitrary restrictions on what values are allowable.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 88de1cd457e5cb664d6d437e2ea4750d089165f5 ]
In rocker, ageing time is a per-port attribute, so the next time the FDB
cleanup timer fires should be set according to the lowest ageing time.
This will later allow us to delete the BR_MIN_AGEING_TIME macro, which was
added to guarantee minimum ageing time in the bridge layer, thereby breaking
existing behavior.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 869f63a4d28144c03c8f4a4c0d1e8f31f8c11a10 ]
Commit c62987bbd8 ("bridge: push bridge setting ageing_time down to
switchdev") added a check for minimum and maximum ageing time, but this
breaks existing behaviour where one can set ageing time to 0 for a
non-learning bridge.
Push this check down to the driver and allow the check in the bridge
layer to be removed. Currently ageing time 0 is refused by the driver,
but we can later add support for this functionality.
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8e2ad4113ce4671686740f808ff2795395c39eef ]
The stack expects link layer headers in the skb linear section.
Macvtap can create skbs with llheader in frags in edge cases:
when (IFF_VNET_HDR is off or vnet_hdr.hdr_len < ETH_HLEN) and
prepad + len > PAGE_SIZE and vnet_hdr.flags has no or bad csum.
Add checks to ensure linear is always at least ETH_HLEN.
At this point, len is already ensured to be >= ETH_HLEN.
For backwards compatiblity, rounds up short vnet_hdr.hdr_len.
This differs from tap and packet, which return an error.
Fixes b9fb9ee07e ("macvtap: add GSO/csum offload support")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 819bfe764dceec2f6b4551768453f374b4c60443 ]
o While the driver is in the middle of a MB completion processing
and it receives a spurious MB interrupt, it is mistaken as a good MB
completion interrupt leading to premature completion of the next MB
request. Fix the driver to guard against this by checking the current
state of MB processing and ignore the spurious interrupt.
Also added a stats counter to record this condition.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5bf93251cee1fb66141d1d2eaff86e04a9397bdf ]
o atomic_t usage is incorrect as we are not implementing
any atomicity.
Signed-off-by: Rajesh Borundia <rajesh.borundia@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d0ba913488dc8c55d1880f5ed34f096dc45fb05d ]
Iff dma_map_single() fails, 'rxdesc' should point to the last filled RX
descriptor, so that it can be marked as the last one, however the driver
would have already advanced it by that time. In order to fix that, only
fill an RX descriptor once all the data for it is ready.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c1b7fca65070bfadca94dd53a4e6b71cd4f69715 ]
In a low memory situation, if netdev_alloc_skb() fails on a first RX ring
loop iteration in sh_eth_ring_format(), 'rxdesc' is still NULL. Avoid
kernel oops by adding the 'rxdesc' check after the loop.
Reported-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cdc4e47da8f4c32eeb6b2061a8a834f4362a12b7 ]
Lots of places in the kernel use memcpy(buf, comm, TASK_COMM_LEN); but
the result is typically passed to print("%s", buf) and extra bytes
after zero don't cause any harm.
In bpf the result of bpf_get_current_comm() is used as the part of
map key and was causing spurious hash map mismatches.
Use strlcpy() to guarantee zero-terminated string.
bpf verifier checks that output buffer is zero-initialized,
so even for short task names the output buffer don't have junk bytes.
Note it's not a security concern, since kprobe+bpf is root only.
Fixes: ffeedafbf0 ("bpf: introduce current->pid, tgid, uid, gid, comm accessors")
Reported-by: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9ed988cd591500c040b2a6257bc68543e08ceeef ]
Replace link layer header validation check ll_header_truncate with
more generic dev_validate_header.
Validation based on hard_header_len incorrectly drops valid packets
in variable length protocols, such as AX25. dev_validate_header
calls header_ops.validate for such protocols to ensure correctness
below hard_header_len.
See also http://comments.gmane.org/gmane.linux.network/401064
Fixes 9c7077622d ("packet: make packet_snd fail on len smaller than l2 header")
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ea47781c26510e5d97f80f9aceafe9065bd5e3aa ]
As variable length protocol, AX25 fails link layer header validation
tests based on a minimum length. header_ops.validate allows protocols
to validate headers that are shorter than hard_header_len. Implement
this callback for AX25.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2793a23aacbd754dbbb5cb75093deb7e4103bace ]
Netdevice parameter hard_header_len is variously interpreted both as
an upper and lower bound on link layer header length. The field is
used as upper bound when reserving room at allocation, as lower bound
when validating user input in PF_PACKET.
Clarify the definition to be maximum header length. For validation
of untrusted headers, add an optional validate member to header_ops.
Allow bypassing of validation by passing CAP_SYS_RAWIO, for instance
for deliberate testing of corrupt input. In this case, pad trailing
bytes, as some device drivers expect completely initialized headers.
See also http://comments.gmane.org/gmane.linux.network/401064
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6faac63a6986f29ef39827f460edd3a5ba64ad5c ]
Add missing rtnl_unlock() in the error path of ppp_create_interface().
Fixes: 58a89ecaca ("ppp: fix lockdep splat in ppp_dev_uninit()")
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a9d99ce28ed359d68cf6f3c1a69038aefedf6d6a ]
If final packet (ACK) of 3WHS is lost, it appears we do not properly
account the following incoming segment into tcpi_segs_in
While we are at it, starts segs_in with one, to count the SYN packet.
We do not yet count number of SYN we received for a request sock, we
might add this someday.
packetdrill script showing proper behavior after fix :
// Tests tcpi_segs_in when 3rd packet (ACK) of 3WHS is lost
0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
+0 bind(3, ..., ...) = 0
+0 listen(3, 1) = 0
+0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop>
+0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK>
+.020 < P. 1:1001(1000) ack 1 win 32792
+0 accept(3, ..., ...) = 4
+.000 %{ assert tcpi_segs_in == 2, 'tcpi_segs_in=%d' % tcpi_segs_in }%
Fixes: 2efd055c53 ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 59dca1d8a6725a121dae6c452de0b2611d5865dc ]
IPv4 interprets a negative return value from a protocol handler as a
request to redispatch to a new protocol. In contrast, IPv6 interprets a
negative value as an error, and interprets a positive value as a request
for redispatch.
UDP for IPv6 was unaware of this difference. Change __udp6_lib_rcv() to
return a positive value for redispatch. Note that the socket's
encap_rcv hook still needs to return a negative value to request
dispatch, and in the case of IPv6 packets, adjust IP6CB(skb)->nhoff to
identify the byte containing the next protocol.
Signed-off-by: Bill Sommerfeld <wsommerfeld@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1666984c8625b3db19a9abc298931d35ab7bc64b ]
In case bind() works, but a later error forces bailing
in probe() in error cases work and a timer may be scheduled.
They must be killed. This fixes an error case related to
the double free reported in
http://www.spinics.net/lists/netdev/msg367669.html
and needs to go on top of Linus' fix to cdc-ncm.
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 48906f62c96cc2cd35753e59310cb70eb08cc6a5 ]
Some devices will silently fail setup unless they are reset first.
This is necessary even if the data interface is already in
altsetting 0, which it will be when the device is probed for the
first time. Briefly toggling the altsetting forces a function
reset regardless of the initial state.
This fixes a setup problem observed on a number of Huawei devices,
appearing to operate in NTB-32 mode even if we explicitly set them
to NTB-16 mode.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 4024fcf70556311521e7b6cf79fa50e16f31013a ]
When signalling to metadata consumers that the metadata_dst entry
carries additional GBP extension data for vxlan (TUNNEL_VXLAN_OPT),
the dst's vxlan_metadata information is populated, but options_len
is left to zero. F.e. in ovs, ovs_flow_key_extract() checks for
options_len before extracting the data through ip_tunnel_info_opts_get().
Geneve uses ip_tunnel_info_opts_set() helper in receive path, which
sets options_len internally, vxlan however uses ip_tunnel_info_opts(),
so when filling vxlan_metadata, we do need to update options_len.
Fixes: 4c22279848 ("ip-tunnel: Use API to access tunnel metadata options.")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5d150a985520bbe3cb2aa1ceef24a7e32f20c15f ]
When ipv6_find_hdr is used to find a fragment header
(caller specifies target NEXTHDR_FRAGMENT) we erronously return
-ENOENT for all fragments with nonzero offset.
Before commit 9195bb8e38, when target was specified, we did not
enter the exthdr walk loop as nexthdr == target so this used to work.
Now we do (so we can skip empty route headers). When we then stumble upon
a frag with nonzero frag_off we must return -ENOENT ("header not found")
only if the caller did not specifically request NEXTHDR_FRAGMENT.
This allows nfables exthdr expression to match ipv6 fragments, e.g. via
nft add rule ip6 filter input frag frag-off gt 0
Fixes: 9195bb8e38 ("ipv6: improve ipv6_find_hdr() to skip empty routing headers")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bf13c94ccb33c3182efc92ce4989506a0f541243 ]
The MC74xx and EM74xx modules use different IDs by default, according
to the Lenovo EM7455 driver for Windows.
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f214fc402967e1bc94ad7f39faa03db5813d6849 ]
reverts commit 94153e36e7 ("tipc: use existing sk_write_queue for
outgoing packet chain")
In Commit 94153e36e7, we assume that we fill & empty the socket's
sk_write_queue within the same lock_sock() session.
This is not true if the link is congested. During congestion, the
socket lock is released while we wait for the congestion to cease.
This implementation causes a nullptr exception, if the user space
program has several threads accessing the same socket descriptor.
Consider two threads of the same program performing the following:
Thread1 Thread2
-------------------- ----------------------
Enter tipc_sendmsg() Enter tipc_sendmsg()
lock_sock() lock_sock()
Enter tipc_link_xmit(), ret=ELINKCONG spin on socket lock..
sk_wait_event() :
release_sock() grab socket lock
: Enter tipc_link_xmit(), ret=0
: release_sock()
Wakeup after congestion
lock_sock()
skb = skb_peek(pktchain);
!! TIPC_SKB_CB(skb)->wakeup_pending = tsk->link_cong;
In this case, the second thread transmits the buffers belonging to
both thread1 and thread2 successfully. When the first thread wakeup
after the congestion it assumes that the pktchain is intact and
operates on the skb's in it, which leads to the following exception:
[2102.439969] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d0
[2102.440074] IP: [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
[2102.440074] PGD 3fa3f067 PUD 3fa6b067 PMD 0
[2102.440074] Oops: 0000 [#1] SMP
[2102.440074] CPU: 2 PID: 244 Comm: sender Not tainted 3.12.28 #1
[2102.440074] RIP: 0010:[<ffffffffa005f330>] [<ffffffffa005f330>] __tipc_link_xmit+0x2b0/0x4d0 [tipc]
[...]
[2102.440074] Call Trace:
[2102.440074] [<ffffffff8163f0b9>] ? schedule+0x29/0x70
[2102.440074] [<ffffffffa006a756>] ? tipc_node_unlock+0x46/0x170 [tipc]
[2102.440074] [<ffffffffa005f761>] tipc_link_xmit+0x51/0xf0 [tipc]
[2102.440074] [<ffffffffa006d8ae>] tipc_send_stream+0x11e/0x4f0 [tipc]
[2102.440074] [<ffffffff8106b150>] ? __wake_up_sync+0x20/0x20
[2102.440074] [<ffffffffa006dc9c>] tipc_send_packet+0x1c/0x20 [tipc]
[2102.440074] [<ffffffff81502478>] sock_sendmsg+0xa8/0xd0
[2102.440074] [<ffffffff81507895>] ? release_sock+0x145/0x170
[2102.440074] [<ffffffff815030d8>] ___sys_sendmsg+0x3d8/0x3e0
[2102.440074] [<ffffffff816426ae>] ? _raw_spin_unlock+0xe/0x10
[2102.440074] [<ffffffff81115c2a>] ? handle_mm_fault+0x6ca/0x9d0
[2102.440074] [<ffffffff8107dd65>] ? set_next_entity+0x85/0xa0
[2102.440074] [<ffffffff816426de>] ? _raw_spin_unlock_irq+0xe/0x20
[2102.440074] [<ffffffff8107463c>] ? finish_task_switch+0x5c/0xc0
[2102.440074] [<ffffffff8163ea8c>] ? __schedule+0x34c/0x950
[2102.440074] [<ffffffff81504e12>] __sys_sendmsg+0x42/0x80
[2102.440074] [<ffffffff81504e62>] SyS_sendmsg+0x12/0x20
[2102.440074] [<ffffffff8164aed2>] system_call_fastpath+0x16/0x1b
In this commit, we maintain the skb list always in the stack.
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1837b2e2bcd23137766555a63867e649c0b637f0 ]
The current reserved_tailroom calculation fails to take hlen and tlen into
account.
skb:
[__hlen__|__data____________|__tlen___|__extra__]
^ ^
head skb_end_offset
In this representation, hlen + data + tlen is the size passed to alloc_skb.
"extra" is the extra space made available in __alloc_skb because of
rounding up by kmalloc. We can reorder the representation like so:
[__hlen__|__data____________|__extra__|__tlen___]
^ ^
head skb_end_offset
The maximum space available for ip headers and payload without
fragmentation is min(mtu, data + extra). Therefore,
reserved_tailroom
= data + extra + tlen - min(mtu, data + extra)
= skb_end_offset - hlen - min(mtu, skb_end_offset - hlen - tlen)
= skb_tailroom - min(mtu, skb_tailroom - tlen) ; after skb_reserve(hlen)
Compare the second line to the current expression:
reserved_tailroom = skb_end_offset - min(mtu, skb_end_offset)
and we can see that hlen and tlen are not taken into account.
The min() in the third line can be expanded into:
if mtu < skb_tailroom - tlen:
reserved_tailroom = skb_tailroom - mtu
else:
reserved_tailroom = tlen
Depending on hlen, tlen, mtu and the number of multicast address records,
the current code may output skbs that have less tailroom than
dev->needed_tailroom or it may output more skbs than needed because not all
space available is used.
Fixes: 4c672e4b ("ipv6: mld: fix add_grhead skb_over_panic for devs with large MTUs")
Signed-off-by: Benjamin Poirier <bpoirier@suse.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 40b4f0fd74e46c017814618d67ec9127ff20f157 ]
As the member .cmp_addr of sctp_af_inet6, sctp_v6_cmp_addr should also check
the port of addresses, just like sctp_v4_cmp_addr, cause it's invoked by
sctp_cmp_addr_exact().
Now sctp_v6_cmp_addr just check the port when two addresses have different
family, and lack the port check for two ipv6 addresses. that will make
sctp_hash_cmp() cannot work well.
so fix it by adding ports comparison in sctp_v6_cmp_addr().
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>