android_kernel_oneplus_msm8998/net
Jon Paul Maloy 2847736f56 tipc: correct error in node fsm
commit c4282ca76c5b81ed73ef4c5eb5c07ee397e51642 upstream.

commit 88e8ac7000dc ("tipc: reduce transmission rate of reset messages
when link is down") revealed a flaw in the node FSM, as defined in
the log of commit 66996b6c47 ("tipc: extend node FSM").

We see the following scenario:
1: Node B receives a RESET message from node A before its link endpoint
   is fully up, i.e., the node FSM is in state SELF_UP_PEER_COMING. This
   event will not change the node FSM state, but the (distinct) link FSM
   will move to state RESETTING.
2: As an effect of the previous event, the local endpoint on B will
   declare node A lost, and post the event SELF_DOWN to the its node
   FSM. This moves the FSM state to SELF_DOWN_PEER_LEAVING, meaning
   that no messages will be accepted from A until it receives another
   RESET message that confirms that A's endpoint has been reset. This
   is  wasteful, since we know this as a fact already from the first
   received RESET, but worse is that the link instance's FSM has not
   wasted this information, but instead moved on to state ESTABLISHING,
   meaning that it repeatedly sends out ACTIVATE messages to the reset
   peer A.
3: Node A will receive one of the ACTIVATE messages, move its link FSM
   to state ESTABLISHED, and start repeatedly sending out STATE messages
   to node B.
4: Node B will consistently drop these messages, since it can only accept
   accept a RESET according to its node FSM.
5: After four lost STATE messages node A will reset its link and start
   repeatedly sending out RESET messages to B.
6: Because of the reduced send rate for RESET messages, it is very
   likely that A will receive an ACTIVATE (which is sent out at a much
   higher frequency) before it gets the chance to send a RESET, and A
   may hence quickly move back to state ESTABLISHED and continue sending
   out STATE messages, which will again be dropped by B.
7: GOTO 5.
8: After having repeated the cycle 5-7 a number of times, node A will
   by chance get in between with sending a RESET, and the situation is
   resolved.

Unfortunately, we have seen that it may take a substantial amount of
time before this vicious loop is broken, sometimes in the order of
minutes.

We correct this by making a small correction to the node FSM: When a
node in state SELF_UP_PEER_COMING receives a SELF_DOWN event, it now
moves directly back to state SELF_DOWN_PEER_DOWN, instead of as now
SELF_DOWN_PEER_LEAVING. This is logically consistent, since we don't
need to wait for RESET confirmation from of an endpoint that we alread
know has been reset. It also means that node B in the scenario above
will not be dropping incoming STATE messages, and the link can come up
immediately.

Finally, a symmetry comparison reveals that the  FSM has a similar
error when receiving the event PEER_DOWN in state PEER_UP_SELF_COMING.
Instead of moving to PERR_DOWN_SELF_LEAVING, it should move directly
to SELF_DOWN_PEER_DOWN. Although we have never seen any negative effect
of this logical error, we choose fix this one, too.

The node FSM looks as follows after those changes:

                           +----------------------------------------+
                           |                           PEER_DOWN_EVT|
                           |                                        |
  +------------------------+----------------+                       |
  |SELF_DOWN_EVT           |                |                       |
  |                        |                |                       |
  |              +-----------+          +-----------+               |
  |              |NODE_      |          |NODE_      |               |
  |   +----------|FAILINGOVER|<---------|SYNCHING   |-----------+   |
  |   |SELF_     +-----------+ FAILOVER_+-----------+   PEER_   |   |
  |   |DOWN_EVT   |          A BEGIN_EVT  A         |   DOWN_EVT|   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |FAILOVER_ |FAILOVER_   |SYNCH_   |SYNCH_     |   |
  |   |           |END_EVT   |BEGIN_EVT   |BEGIN_EVT|END_EVT    |   |
  |   |           |          |            |         |           |   |
  |   |           |          |            |         |           |   |
  |   |           |         +--------------+        |           |   |
  |   |           +-------->|   SELF_UP_   |<-------+           |   |
  |   |   +-----------------|   PEER_UP    |----------------+   |   |
  |   |   |SELF_DOWN_EVT    +--------------+   PEER_DOWN_EVT|   |   |
  |   |   |                    A        A                   |   |   |
  |   |   |                    |        |                   |   |   |
  |   |   |         PEER_UP_EVT|        |SELF_UP_EVT        |   |   |
  |   |   |                    |        |                   |   |   |
  V   V   V                    |        |                   V   V   V
+------------+       +-----------+    +-----------+       +------------+
|SELF_DOWN_  |       |SELF_UP_   |    |PEER_UP_   |       |PEER_DOWN   |
|PEER_LEAVING|       |PEER_COMING|    |SELF_COMING|       |SELF_LEAVING|
+------------+       +-----------+    +-----------+       +------------+
       |               |       A        A       |                |
       |               |       |        |       |                |
       |       SELF_   |       |SELF_   |PEER_  |PEER_           |
       |       DOWN_EVT|       |UP_EVT  |UP_EVT |DOWN_EVT        |
       |               |       |        |       |                |
       |               |       |        |       |                |
       |               |    +--------------+    |                |
       |PEER_DOWN_EVT  +--->|  SELF_DOWN_  |<---+   SELF_DOWN_EVT|
       +------------------->|  PEER_DOWN   |<--------------------+
                            +--------------+

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-30 05:49:27 +02:00
..
6lowpan 6lowpan: put mcast compression in an own function 2015-10-21 00:49:25 +02:00
9p IB/cma: Add support for network namespaces 2015-10-28 12:32:48 -04:00
802
8021q net: add recursion limit to GRO 2016-11-15 07:46:38 +01:00
appletalk
atm atm: deal with setting entry before mkip was called 2015-09-17 22:13:32 -07:00
ax25 ax25: Fix segfault after sock connection timeout 2017-02-04 09:45:09 +01:00
batman-adv batman-adv: Check for alloc errors when preparing TT local data 2016-12-15 08:49:23 -08:00
bluetooth Bluetooth: Fix l2cap_sock_setsockopt() with optname BT_RCVMTU 2016-08-20 18:09:19 +02:00
bridge bridge: drop netfilter fake rtable unconditionally 2017-03-22 12:04:17 +01:00
caif net: caif: fix misleading indentation 2016-09-30 10:18:35 +02:00
can can: Fix kernel panic at security_sock_rcv_skb 2017-02-18 16:39:26 +01:00
ceph libceph: force GFP_NOIO for socket allocations 2017-04-08 09:53:30 +02:00
core socket, bpf: fix sk_filter use after free in sk_clone_lock 2017-03-30 09:35:14 +02:00
dcb net/dcb: make dcbnl.c explicitly non-modular 2015-10-09 07:52:27 -07:00
dccp dccp: fix memory leak during tear-down of unsuccessful connection request 2017-03-22 12:04:17 +01:00
decnet decnet: Do not build routes to devices without decnet private data. 2016-05-18 17:06:35 -07:00
dns_resolver net: dns_resolver: convert time_t to time64_t 2015-11-18 16:27:46 -05:00
dsa net: dsa: Bring back device detaching in dsa_slave_suspend() 2017-02-04 09:45:10 +01:00
ethernet net: introduce device min_header_len 2017-02-18 16:39:27 +01:00
hsr net/hsr: fix a warning message 2015-11-23 14:56:15 -05:00
ieee802154 net: fix percpu memory leaks 2015-11-02 22:47:14 -05:00
ipv4 tcp: initialize icsk_ack.lrcvtime at session start time 2017-03-30 09:35:14 +02:00
ipv6 net: ipv6: check route protocol when deleting routes 2017-04-21 09:30:08 +02:00
ipx
irda irda: Fix lockdep annotations in hashbin_delete(). 2017-02-26 11:07:50 +01:00
iucv af_iucv: Validate socket address length in iucv_sock_bind() 2016-03-03 15:07:03 -08:00
key af_key: fix two typos 2015-10-23 03:05:19 -07:00
l2tp l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv 2017-03-22 12:04:14 +01:00
l3mdev net: Add netif_is_l3_slave 2015-10-07 04:27:43 -07:00
lapb
llc net/llc: avoid BUG_ON() in skb_orphan() 2017-02-26 11:07:49 +01:00
mac80211 mac80211: reject ToDS broadcast data frames 2017-04-27 09:09:33 +02:00
mac802154 mac802154: llsec: use kzfree 2015-10-21 00:49:24 +02:00
mpls mpls: Send route delete notifications when router module is unloaded 2017-03-22 12:04:16 +01:00
netfilter netfilter: nft_dynset: fix element timeout for HZ != 1000 2016-11-26 09:54:54 +01:00
netlabel netlabel: add address family checks to netlbl_{sock,req}_delattr() 2016-08-20 18:09:22 +02:00
netlink netlink: remove mmapped netlink support 2017-03-22 12:04:13 +01:00
netrom
nfc net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
openvswitch net/openvswitch: Set the ipv6 source tunnel key address attribute correctly 2017-03-30 09:35:12 +02:00
packet net/packet: fix overflow in check for priv area size 2017-04-18 07:14:37 +02:00
phonet phonet: properly unshare skbs in phonet_rcv() 2016-01-31 11:29:00 -08:00
rds rds: fix an infoleak in rds_inc_info_copy 2016-09-15 08:27:51 +02:00
rfkill rfkill: fix rfkill_fop_read wait_event usage 2016-03-03 15:07:26 -08:00
rose
rxrpc net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA 2015-12-01 15:45:05 -05:00
sched net sched actions: decrement module reference count after table flush. 2017-03-22 12:04:18 +01:00
sctp sctp: deny peeloff operation on asocs with threads sleeping on it 2017-04-21 09:30:08 +02:00
sunrpc SUNRPC: fix refcounting problems with auth_gss messages. 2017-04-21 09:30:08 +02:00
switchdev switchdev: pass pointer to fib_info instead of copy 2016-06-24 10:18:16 -07:00
tipc tipc: correct error in node fsm 2017-04-30 05:49:27 +02:00
unix net: unix: properly re-increment inflight counter of GC discarded candidates 2017-03-30 09:35:13 +02:00
vmw_vsock VSOCK: Detach QP check should filter out non matching QPs. 2017-04-27 09:09:32 +02:00
wimax net:wimax: Fix doucble word "the the" in networking.xml 2015-08-09 22:43:52 -07:00
wireless nl80211: fix dumpit error path RTNL deadlocks 2017-03-30 09:35:18 +02:00
x25 net: fix a kernel infoleak in x25 module 2016-05-18 17:06:43 -07:00
xfrm xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder 2017-03-31 09:49:52 +02:00
compat.c
Kconfig net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
Makefile net: Introduce L3 Master device abstraction 2015-09-29 20:40:32 -07:00
socket.c net: socket: fix recvmmsg not returning error from sock_error 2017-02-26 11:07:50 +01:00
sysctl_net.c net: Use ns_capable_noaudit() when determining net sysctl permissions 2016-09-15 08:27:50 +02:00