android_kernel_oneplus_msm8998/net/ipv4
Koen De Schepper 6833735404 tcp: Ensure DCTCP reacts to losses
[ Upstream commit aecfde23108b8e637d9f5c5e523b24fb97035dc3 ]

RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to
loss episodes in the same way as conventional TCP".

Currently, Linux DCTCP performs no cwnd reduction when losses
are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets
alpha to its maximal value if a RTO happens. This behavior
is sub-optimal for at least two reasons: i) it ignores losses
triggering fast retransmissions; and ii) it causes unnecessary large
cwnd reduction in the future if the loss was isolated as it resets
the historical term of DCTCP's alpha EWMA to its maximal value (i.e.,
denoting a total congestion). The second reason has an especially
noticeable effect when using DCTCP in high BDP environments, where
alpha normally stays at low values.

This patch replace the clamping of alpha by setting ssthresh to
half of cwnd for both fast retransmissions and RTOs, at most once
per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter
has been removed.

The table below shows experimental results where we measured the
drop probability of a PIE AQM (not applying ECN marks) at a
bottleneck in the presence of a single TCP flow with either the
alpha-clamping option enabled or the cwnd halving proposed by this
patch. Results using reno or cubic are given for comparison.

                          |  Link   |   RTT    |    Drop
                 TCP CC   |  speed  | base+AQM | probability
        ==================|=========|==========|============
                    CUBIC |  40Mbps |  7+20ms  |    0.21%
                     RENO |         |          |    0.19%
        DCTCP-CLAMP-ALPHA |         |          |   25.80%
         DCTCP-HALVE-CWND |         |          |    0.22%
        ------------------|---------|----------|------------
                    CUBIC | 100Mbps |  7+20ms  |    0.03%
                     RENO |         |          |    0.02%
        DCTCP-CLAMP-ALPHA |         |          |   23.30%
         DCTCP-HALVE-CWND |         |          |    0.04%
        ------------------|---------|----------|------------
                    CUBIC | 800Mbps |   1+1ms  |    0.04%
                     RENO |         |          |    0.05%
        DCTCP-CLAMP-ALPHA |         |          |   18.70%
         DCTCP-HALVE-CWND |         |          |    0.06%

We see that, without halving its cwnd for all source of losses,
DCTCP drives the AQM to large drop probabilities in order to keep
the queue length under control (i.e., it repeatedly faces RTOs).
Instead, if DCTCP reacts to all source of losses, it can then be
controlled by the AQM using similar drop levels than cubic or reno.

Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com>
Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com>
Cc: Bob Briscoe <research@bobbriscoe.net>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Daniel Borkmann <borkmann@iogearbox.net>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Andrew Shewmaker <agshew@gmail.com>
Cc: Glenn Judd <glenn.judd@morganstanley.com>
Acked-by: Florian Westphal <fw@strlen.de>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:33:55 +02:00
..
netfilter netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES 2019-03-23 08:44:29 +01:00
af_inet.c gso_segment: Reset skb->mac_len after modifying network header 2018-09-29 03:08:52 -07:00
ah4.c ipsec: check return value of skb_to_sgvec always 2018-04-13 19:50:23 +02:00
arp.c arp: fix arp_filter on l3slave devices 2018-04-13 19:50:24 +02:00
cipso_ipv4.c netlabel: fix out-of-bounds memory accesses 2019-03-23 08:44:23 +01:00
datagram.c net: Set sk_txhash from a random number 2015-07-29 22:44:04 -07:00
devinet.c ipv4: igmp: guard against silly MTU values 2018-01-02 20:33:24 +01:00
esp4.c ipsec: check return value of skb_to_sgvec always 2018-04-13 19:50:23 +02:00
fib_frontend.c net: ipv4: Fix memory leak in network namespace dismantle 2019-02-06 19:43:03 +01:00
fib_lookup.h ipv4: consider TOS in fib_select_default 2015-07-24 22:46:11 -07:00
fib_rules.c net: ipv6: use common fib_default_rule_pref 2015-09-09 14:19:50 -07:00
fib_semantics.c net: ipv4: update fnhe_pmtu when first hop's MTU changes 2018-10-20 09:52:36 +02:00
fib_trie.c net: ipv4: Fix memory leak in network namespace dismantle 2019-02-06 19:43:03 +01:00
fou.c gro: Allow tunnel stacking in the case of FOU/GUE 2018-11-10 07:41:38 -08:00
gre_demux.c gre: Remove support for sharing GRE protocol hook. 2015-08-10 14:03:54 -07:00
gre_offload.c net: add recursion limit to GRO 2016-11-15 07:46:38 +01:00
icmp.c net: Add __icmp_send helper. 2019-03-23 08:44:23 +01:00
igmp.c net: igmp: add a missing rcu locking section 2018-02-16 20:09:37 +01:00
inet_connection_sock.c tcp/dccp: remove reqsk_put() from inet_child_forget() 2019-03-23 08:44:31 +01:00
inet_diag.c net: diag: support v4mapped sockets in inet_diag_find_one_icsk() 2019-04-03 06:23:21 +02:00
inet_fragment.c inet: frags: better deal with smp races 2019-02-08 11:25:33 +01:00
inet_hashtables.c tcp/dccp: fix hashdance race for passive sessions 2015-10-23 05:42:21 -07:00
inet_lro.c
inet_timewait_sock.c soreuseport: initialise timewait reuseport field 2018-05-16 10:06:50 +02:00
inetpeer.c net: ipv4: use a dedicated counter for icmp_v4 redirect packets 2019-02-23 09:05:14 +01:00
ip_forward.c net: Pass net into dst_output and remove dst_output_okfn 2015-10-08 04:26:54 -07:00
ip_fragment.c net: ipv4: do not handle duplicate fragments as overlapping 2019-02-08 11:25:33 +01:00
ip_gre.c vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices 2016-06-24 10:18:18 -07:00
ip_input.c net: Fix usage of pskb_trim_rcsum 2019-02-06 19:43:02 +01:00
ip_options.c net: avoid use IPCB in cipso_v4_error 2019-03-23 08:44:23 +01:00
ip_output.c ip: hash fragments consistently 2018-07-28 07:45:02 +02:00
ip_sockglue.c ip: on queued skb use skb_header_pointer instead of pskb_may_pull 2019-01-26 09:42:49 +01:00
ip_tunnel.c ip_tunnel: Fix name string concatenate in __ip_tunnel_create() 2018-12-13 09:21:29 +01:00
ip_tunnel_core.c ip_tunnel: don't force DF when MTU is locked 2018-11-27 16:07:57 +01:00
ip_vti.c vti4: Fix a ipip packet processing bug in 'IPCOMP' virtual tunnel 2019-03-23 08:44:24 +01:00
ipcomp.c
ipconfig.c ipconfig: Correctly initialise ic_nameservers 2018-08-06 16:24:38 +02:00
ipip.c ipip: only increase err_count for some certain type icmp in ipip_err 2017-11-18 11:11:06 +01:00
ipmr.c ipv4: Fix potential Spectre v1 vulnerability 2019-01-13 10:05:27 +01:00
Kconfig ipv4+ipv6: Make INET*_ESP select CRYPTO_ECHAINIV 2018-08-15 17:42:05 +02:00
Makefile tcp: track the packet timings in RACK 2015-10-21 07:00:48 -07:00
netfilter.c netfilter: use skb_to_full_sk in ip_route_me_harder 2018-03-18 11:17:51 +01:00
ping.c ipv4: fix memory leaks in udp_sendmsg, ping_v4_sendmsg 2018-05-26 08:48:46 +02:00
proc.c ip: discard IPv4 datagrams with overlapping segments. 2019-02-08 11:25:32 +01:00
protocol.c
raw.c net: ipv4: fix for a race condition in raw_sendmsg 2018-01-02 20:33:25 +01:00
route.c route: set the deleted fnhe fnhe_daddr to 0 in ip_del_fnhe to fix a race 2019-03-23 08:44:30 +01:00
syncookies.c tcp: handle inet_csk_reqsk_queue_add() failures 2019-03-23 08:44:30 +01:00
sysctl_net_ipv4.c ipv4: Return EINVAL when ping_group_range sysctl doesn't map to user ns 2018-07-25 10:18:16 +02:00
tcp.c tcp: clear icsk_backoff in tcp_write_queue_purge() 2019-02-23 09:05:14 +01:00
tcp_bic.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_cdg.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_cong.c tcp: disallow cwnd undo when switching congestion control 2017-06-14 13:16:19 +02:00
tcp_cubic.c tcp_cubic: do not set epoch_start in the future 2015-09-17 22:35:07 -07:00
tcp_dctcp.c tcp: Ensure DCTCP reacts to losses 2019-04-27 09:33:55 +02:00
tcp_diag.c tcp: ensure proper barriers in lockless contexts 2015-11-15 18:36:38 -05:00
tcp_fastopen.c tcp: initialize max window for a new fastopen socket 2017-02-04 09:45:09 +01:00
tcp_highspeed.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_htcp.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_hybla.c tcp: do not slow start when cwnd equals ssthresh 2015-07-09 14:22:52 -07:00
tcp_illinois.c net/tcp/illinois: replace broken algorithm reference link 2018-05-30 07:49:02 +02:00
tcp_input.c tcp/dccp: drop SYN packets if accept queue is full 2019-04-03 06:23:18 +02:00
tcp_ipv4.c tcp: tcp_v4_err() should be more careful 2019-02-23 09:05:13 +01:00
tcp_lp.c tcp: fix wraparound issue in tcp_lp 2017-05-14 13:32:58 +02:00
tcp_memcontrol.c
tcp_metrics.c tcp: convert cached rtt from usec to jiffies when feeding initial rto 2016-04-20 15:41:56 +09:00
tcp_minisocks.c tcp: use an RB tree for ooo receive queue 2018-10-13 09:11:34 +02:00
tcp_offload.c tcp: reserve tcp_skb_mss() to tcp stack 2015-06-11 16:33:10 -07:00
tcp_output.c tcp: fix NULL ref in tail loss probe 2018-12-17 21:55:09 +01:00
tcp_probe.c
tcp_recovery.c tcp: use RACK to detect losses 2015-10-21 07:00:53 -07:00
tcp_scalable.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_timer.c net: tcp: close sock if net namespace is exiting 2018-01-31 12:06:14 +01:00
tcp_vegas.c tcp: fix under-evaluated ssthresh in TCP Vegas 2017-12-25 14:22:15 +01:00
tcp_vegas.h
tcp_veno.c tcp: add tcp_in_slow_start helper 2015-07-09 14:22:52 -07:00
tcp_westwood.c
tcp_yeah.c tcp: cwnd does not increase in TCP YeAH 2016-09-30 10:18:34 +02:00
tunnel4.c
udp.c udplite: call proper backlog handlers 2019-03-23 08:44:29 +01:00
udp_diag.c sock_diag: specify info_size per inet protocol 2015-06-15 19:49:22 -07:00
udp_impl.h udplite: call proper backlog handlers 2019-03-23 08:44:29 +01:00
udp_offload.c net: avoid skb_warn_bad_offload false positives on UFO 2017-08-12 19:29:08 -07:00
udp_tunnel.c tunnel: Clear IPCB(skb)->opt before dst_link_failure called 2016-04-20 15:41:56 +09:00
udplite.c udplite: call proper backlog handlers 2019-03-23 08:44:29 +01:00
xfrm4_input.c netfilter: Pass net into okfn 2015-09-17 17:18:37 -07:00
xfrm4_mode_beet.c
xfrm4_mode_transport.c
xfrm4_mode_tunnel.c
xfrm4_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-10-24 06:54:12 -07:00
xfrm4_policy.c ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu 2018-05-30 07:49:04 +02:00
xfrm4_protocol.c
xfrm4_state.c
xfrm4_tunnel.c