android_kernel_oneplus_msm8998/net
Jon Maxwell 4ab956b561 dccp/tcp: fix routing redirect race
[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ]

As Eric Dumazet pointed out this also needs to be fixed in IPv6.
v2: Contains the IPv6 tcp/Ipv6 dccp patches as well.

We have seen a few incidents lately where a dst_enty has been freed
with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that
dst_entry. If the conditions/timings are right a crash then ensues when the
freed dst_entry is referenced later on. A Common crashing back trace is:

 #8 [] page_fault at ffffffff8163e648
    [exception RIP: __tcp_ack_snd_check+74]
.
.
 #9 [] tcp_rcv_established at ffffffff81580b64
#10 [] tcp_v4_do_rcv at ffffffff8158b54a
#11 [] tcp_v4_rcv at ffffffff8158cd02
#12 [] ip_local_deliver_finish at ffffffff815668f4
#13 [] ip_local_deliver at ffffffff81566bd9
#14 [] ip_rcv_finish at ffffffff8156656d
#15 [] ip_rcv at ffffffff81566f06
#16 [] __netif_receive_skb_core at ffffffff8152b3a2
#17 [] __netif_receive_skb at ffffffff8152b608
#18 [] netif_receive_skb at ffffffff8152b690
#19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3]
#20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3]
#21 [] net_rx_action at ffffffff8152bac2
#22 [] __do_softirq at ffffffff81084b4f
#23 [] call_softirq at ffffffff8164845c
#24 [] do_softirq at ffffffff81016fc5
#25 [] irq_exit at ffffffff81084ee5
#26 [] do_IRQ at ffffffff81648ff8

Of course it may happen with other NIC drivers as well.

It's found the freed dst_entry here:

 224 static bool tcp_in_quickack_mode(struct sock *sk)↩
 225 {↩
 226 ▹       const struct inet_connection_sock *icsk = inet_csk(sk);↩
 227 ▹       const struct dst_entry *dst = __sk_dst_get(sk);↩
 228 ↩
 229 ▹       return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩
 230 ▹       ▹       (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩
 231 }↩

But there are other backtraces attributed to the same freed dst_entry in
netfilter code as well.

All the vmcores showed 2 significant clues:

- Remote hosts behind the default gateway had always been redirected to a
different gateway. A rtable/dst_entry will be added for that host. Making
more dst_entrys with lower reference counts. Making this more probable.

- All vmcores showed a postitive LockDroppedIcmps value, e.g:

LockDroppedIcmps                  267

A closer look at the tcp_v4_err() handler revealed that do_redirect() will run
regardless of whether user space has the socket locked. This can result in a
race condition where the same dst_entry cached in sk->sk_dst_entry can be
decremented twice for the same socket via:

do_redirect()->__sk_dst_check()-> dst_release().

Which leads to the dst_entry being prematurely freed with another socket
pointing to it via sk->sk_dst_cache and a subsequent crash.

To fix this skip do_redirect() if usespace has the socket locked. Instead let
the redirect take place later when user space does not have the socket
locked.

The dccp/IPv6 code is very similar in this respect, so fixing it there too.

As Eric Garver pointed out the following commit now invalidates routes. Which
can set the dst->obsolete flag so that ipv4_dst_check() returns null and
triggers the dst_release().

Fixes: ceb3320610 ("ipv4: Kill routes during PMTU/redirect updates.")
Cc: Eric Garver <egarver@redhat.com>
Cc: Hannes Sowa <hsowa@redhat.com>
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22 12:04:17 +01:00
..
6lowpan 6lowpan: put mcast compression in an own function 2015-10-21 00:49:25 +02:00
9p IB/cma: Add support for network namespaces 2015-10-28 12:32:48 -04:00
802
8021q net: add recursion limit to GRO 2016-11-15 07:46:38 +01:00
appletalk
atm atm: deal with setting entry before mkip was called 2015-09-17 22:13:32 -07:00
ax25 ax25: Fix segfault after sock connection timeout 2017-02-04 09:45:09 +01:00
batman-adv batman-adv: Check for alloc errors when preparing TT local data 2016-12-15 08:49:23 -08:00
bluetooth Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU 2016-08-20 18:09:19 +02:00
bridge bridge: drop netfilter fake rtable unconditionally 2017-03-22 12:04:17 +01:00
caif net: caif: fix misleading indentation 2016-09-30 10:18:35 +02:00
can can: Fix kernel panic at security_sock_rcv_skb 2017-02-18 16:39:26 +01:00
ceph libceph: verify authorize reply on connect 2017-01-09 08:07:52 +01:00
core net: fix socket refcounting in skb_complete_tx_timestamp() 2017-03-22 12:04:15 +01:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp dccp/tcp: fix routing redirect race 2017-03-22 12:04:17 +01:00
decnet decnet: Do not build routes to devices without decnet private data. 2016-05-18 17:06:35 -07:00
dns_resolver net: dns_resolver: convert time_t to time64_t 2015-11-18 16:27:46 -05:00
dsa net: dsa: Bring back device detaching in dsa_slave_suspend() 2017-02-04 09:45:10 +01:00
ethernet net: introduce device min_header_len 2017-02-18 16:39:27 +01:00
hsr net/hsr: fix a warning message 2015-11-23 14:56:15 -05:00
ieee802154 net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
ipv4 dccp/tcp: fix routing redirect race 2017-03-22 12:04:17 +01:00
ipv6 dccp/tcp: fix routing redirect race 2017-03-22 12:04:17 +01:00
ipx
irda irda: Fix lockdep annotations in hashbin_delete(). 2017-02-26 11:07:50 +01:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-03-03 15:07:03 -08:00
key af_key: fix two typos 2015-10-23 03:05:19 -07:00
l2tp l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv 2017-03-22 12:04:14 +01:00
l3mdev net: Add netif_is_l3_slave 2015-10-07 04:27:43 -07:00
lapb
llc net/llc: avoid BUG_ON() in skb_orphan() 2017-02-26 11:07:49 +01:00
mac80211 mac80211: flush delayed work when entering suspend 2017-03-15 09:57:14 +08:00
mac802154 mac802154: llsec: use kzfree 2015-10-21 00:49:24 +02:00
mpls mpls: Send route delete notifications when router module is unloaded 2017-03-22 12:04:16 +01:00
netfilter netfilter: nft_dynset: fix element timeout for HZ != 1000 2016-11-26 09:54:54 +01:00
netlabel netlabel: add address family checks to netlbl_{sock,req}_delattr() 2016-08-20 18:09:22 +02:00
netlink netlink: remove mmapped netlink support 2017-03-22 12:04:13 +01:00
netrom
nfc net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
openvswitch openvswitch: maintain correct checksum state in conntrack actions 2017-02-04 09:45:09 +01:00
packet net: don't call strlen() on the user buffer in packet_bind_spkt() 2017-03-22 12:04:14 +01:00
phonet phonet: properly unshare skbs in phonet_rcv() 2016-01-31 11:29:00 -08:00
rds rds: fix an infoleak in rds_inc_info_copy 2016-09-15 08:27:51 +02:00
rfkill rfkill: fix rfkill_fop_read wait_event usage 2016-03-03 15:07:26 -08:00
rose
rxrpc net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
sched act_connmark: avoid crashing on malformed nlattrs with null parms 2017-03-22 12:04:16 +01:00
sctp sctp: avoid BUG_ON on sctp_wait_for_sndbuf 2017-02-18 16:39:27 +01:00
sunrpc svcrpc: fix oops in absence of krb5 module 2017-02-09 08:02:45 +01:00
switchdev switchdev: pass pointer to fib_info instead of copy 2016-06-24 10:18:16 -07:00
tipc tipc: fix NULL pointer dereference in shutdown() 2016-09-30 10:18:36 +02:00
unix af_unix: move unix_mknod() out of bindlock 2017-02-04 09:45:09 +01:00
vmw_vsock VSOCK: do not disconnect socket when peer has shutdown SEND only 2016-05-18 17:06:41 -07:00
wimax
wireless nl80211: fix sched scan netlink socket owner destruction 2017-01-19 20:17:20 +01:00
x25 net: fix a kernel infoleak in x25 module 2016-05-18 17:06:43 -07:00
xfrm xfrm: Fix crash observed during device unregistration and decryption 2016-04-20 15:42:05 +09:00
compat.c
Kconfig net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
Makefile net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
socket.c net: socket: fix recvmmsg not returning error from sock_error 2017-02-26 11:07:50 +01:00
sysctl_net.c net: Use ns_capable_noaudit() when determining net sysctl permissions 2016-09-15 08:27:50 +02:00