android_kernel_oneplus_msm8998/net/ipv4
Lance Richardson 0bb225a04d vti: flush x-netns xfrm cache when vti interface is removed
[ Upstream commit a5d0dc810abf3d6b241777467ee1d6efb02575fc ]

When executing the script included below, the netns delete operation
hangs with the following message (repeated at 10 second intervals):

  kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1

This occurs because a reference to the lo interface in the "secure" netns
is still held by a dst entry in the xfrm bundle cache in the init netns.

Address this problem by garbage collecting the tunnel netns flow cache
when a cross-namespace vti interface receives a NETDEV_DOWN notification.

A more detailed description of the problem scenario (referencing commands
in the script below):

(1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1

  The vti_test interface is created in the init namespace. vti_tunnel_init()
  attaches a struct ip_tunnel to the vti interface's netdev_priv(dev),
  setting the tunnel net to &init_net.

(2) ip link set vti_test netns secure

  The vti_test interface is moved to the "secure" netns. Note that
  the associated struct ip_tunnel still has tunnel->net set to &init_net.

(3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

  The first packet sent using the vti device causes xfrm_lookup() to be
  called as follows:

      dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0);

  Note that tunnel->net is the init namespace, while skb_dst(skb) references
  the vti_test interface in the "secure" namespace. The returned dst
  references an interface in the init namespace.

  Also note that the first parameter to xfrm_lookup() determines which flow
  cache is used to store the computed xfrm bundle, so after xfrm_lookup()
  returns there will be a cached bundle in the init namespace flow cache
  with a dst referencing a device in the "secure" namespace.

(4) ip netns del secure

  Kernel begins to delete the "secure" namespace.  At some point the
  vti_test interface is deleted, at which point dst_ifdown() changes
  the dst->dev in the cached xfrm bundle flow from vti_test to lo (still
  in the "secure" namespace however).
  Since nothing has happened to cause the init namespace's flow cache
  to be garbage collected, this dst remains attached to the flow cache,
  so the kernel loops waiting for the last reference to lo to go away.

<Begin script>
ip link add br1 type bridge
ip link set dev br1 up
ip addr add dev br1 1.1.1.1/8

ip netns add secure
ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1
ip link set vti_test netns secure
ip netns exec secure ip link set vti_test up
ip netns exec secure ip link s lo up
ip netns exec secure ip addr add dev lo 192.168.100.1/24
ip netns exec secure ip route add 192.168.200.0/24 dev vti_test
ip xfrm policy flush
ip xfrm state flush
ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \
   proto esp mode tunnel mark 1
ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \
   proto esp mode tunnel mark 1
ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788
ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \
   mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788

ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1

ip netns del secure
<End script>

Reported-by: Hangbin Liu <haliu@redhat.com>
Reported-by: Jan Tluka <jtluka@redhat.com>
Signed-off-by: Lance Richardson <lrichard@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
..
netfilter netfilter: x_tables: introduce and use xt_copy_counters_from_user 2016-06-24 10:18:24 -07:00
af_inet.c net: add validation for the socket syscall protocol argument 2015-12-14 16:09:30 -05:00
ah4.c ah4: Fix error return in ah_input(). 2015-08-25 13:38:50 -07:00
arp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-20 06:08:27 -07:00
cipso_ipv4.c
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: Don't do expensive useless work during inetdev destroy. 2016-04-20 15:42:03 +09:00
esp4.c esp: Fix ESN generation under UDP encapsulation 2016-07-11 09:31:11 -07:00
fib_frontend.c ipv4/fib: don't warn when primary address is missing if in_dev is dead 2016-05-18 17:06:36 -07:00
fib_lookup.h
fib_rules.c net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
fib_semantics.c ipv4: reject RTNH_F_DEAD and RTNH_F_LINKDOWN from user space 2016-08-16 09:30:47 +02:00
fib_trie.c ipv4: panic in leaf_walk_rcu due to stale node pointer 2016-09-30 10:18:34 +02:00
fou.c fou: clean up socket with kfree_rcu 2015-12-16 19:03:02 -05:00
gre_demux.c gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gre_offload.c ipv6: gre: support SIT encapsulation 2015-10-26 22:01:18 -07:00
icmp.c Revert "ipv4/icmp: redirect messages can use the ingress daddr as source" 2015-10-14 06:01:07 -07:00
igmp.c mld, igmp: Fix reserved tailroom calculation 2016-04-20 15:41:58 +09:00
inet_connection_sock.c tcp/dccp: fix another race at listener dismantle 2016-03-03 15:07:07 -08:00
inet_diag.c tcp/dccp: install syn_recv requests into ehash table 2015-10-03 04:32:41 -07:00
inet_fragment.c net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
inet_hashtables.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
inet_lro.c
inet_timewait_sock.c tcp/dccp: fix timewait races in timer handling 2015-09-21 16:32:29 -07:00
inetpeer.c net: Add helper function to compare inetpeer addresses 2015-08-28 13:32:36 -07:00
ip_forward.c net: Pass net into dst_output and remove dst_output_okfn 2015-10-08 04:26:54 -07:00
ip_fragment.c inet: frag: Always orphan skbs inside ip_defrag() 2016-03-03 15:07:04 -08:00
ip_gre.c vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-06-24 10:18:18 -07:00
ip_input.c ipv4: Pass struct net into ip_defrag and ip_check_defrag 2015-10-12 19:44:16 -07:00
ip_options.c
ip_output.c ipv4: only create late gso-skb if skb is already set up with CHECKSUM_PARTIAL 2016-04-20 15:41:57 +09:00
ip_sockglue.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
ip_tunnel.c vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-06-24 10:18:18 -07:00
ip_tunnel_core.c ipv4, ipv6: Pass net into ip_local_out and ip6_local_out 2015-10-08 04:27:02 -07:00
ip_vti.c vti: flush x-netns xfrm cache when vti interface is removed 2016-09-30 10:18:36 +02:00
ipcomp.c
ipconfig.c ipconfig: send Client-identifier in DHCP requests 2015-10-18 19:23:52 -07:00
ipip.c ipip: ioctl: Remove superfluous IP-TTL handling. 2015-12-18 16:07:59 -05:00
ipmr.c ipmr/ip6mr: Initialize the last assert time of mfc entries. 2016-07-11 09:31:11 -07:00
Kconfig geneve: Consolidate Geneve functionality in single module. 2015-08-27 15:42:48 -07:00
Makefile tcp: track the packet timings in RACK 2015-10-21 07:00:48 -07:00
netfilter.c ipv4: Pass struct net into ip_route_me_harder 2015-09-29 20:21:32 +02:00
ping.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
proc.c
protocol.c
raw.c ipv4: fix memory leaks in ip_cmsg_send() callers 2016-03-03 15:07:06 -08:00
route.c route: do not cache fib route info on local routes with oif 2016-05-18 17:06:35 -07:00
syncookies.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
sysctl_net_ipv4.c ipv4: disable BH when changing ip local port range 2015-11-04 21:29:06 -05:00
tcp.c net:Add sysctl_max_skb_frags 2016-03-03 15:07:05 -08:00
tcp_bic.c
tcp_cdg.c
tcp_cong.c tcp: remove tcp_ecn_make_synack() socket argument 2015-09-25 13:00:38 -07:00
tcp_cubic.c tcp_cubic: do not set epoch_start in the future 2015-09-17 22:35:07 -07:00
tcp_dctcp.c tcp: allow dctcp alpha to drop to zero 2015-10-23 02:46:52 -07:00
tcp_diag.c tcp: ensure proper barriers in lockless contexts 2015-11-15 18:36:38 -05:00
tcp_fastopen.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
tcp_highspeed.c
tcp_htcp.c
tcp_hybla.c
tcp_illinois.c
tcp_input.c tcp: enable per-socket rate limiting of all 'challenge acks' 2016-08-16 09:30:47 +02:00
tcp_ipv4.c tcp: properly scale window in tcp_v[46]_reqsk_send_ack() 2016-09-30 10:18:34 +02:00
tcp_lp.c
tcp_memcontrol.c
tcp_metrics.c tcp: convert cached rtt from usec to jiffies when feeding initial rto 2016-04-20 15:41:56 +09:00
tcp_minisocks.c tcp: fix tcpi_segs_in after connection establishment 2016-04-20 15:42:00 +09:00
tcp_offload.c
tcp_output.c tcp: consider recv buf for the initial window scale 2016-08-16 09:30:48 +02:00
tcp_probe.c
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c
tcp_timer.c tcp: fix Fast Open snmp over-counting bug 2015-11-20 10:51:12 -05:00
tcp_vegas.c
tcp_vegas.h
tcp_veno.c
tcp_westwood.c
tcp_yeah.c tcp: cwnd does not increase in TCP YeAH 2016-09-30 10:18:34 +02:00
tunnel4.c
udp.c udp: properly support MSG_PEEK with truncated buffers 2016-09-15 08:27:49 +02:00
udp_diag.c
udp_impl.h
udp_offload.c
udp_tunnel.c tunnel: Clear IPCB(skb)->opt before dst_link_failure called 2016-04-20 15:41:56 +09:00
udplite.c
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec 2015-12-22 16:26:31 -05:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c