Based on review from Eric Dumazet, my backport of commit
3d4bf93ac12003f9b8e1e2de37fe27983deebdcf to older kernels was a bit
incorrect. This patch fixes this.
Reported-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 3d4bf93ac12003f9b8e1e2de37fe27983deebdcf ]
In case an attacker feeds tiny packets completely out of order,
tcp_collapse_ofo_queue() might scan the whole rb-tree, performing
expensive copies, but not changing socket memory usage at all.
1) Do not attempt to collapse tiny skbs.
2) Add logic to exit early when too many tiny skbs are detected.
We prefer not doing aggressive collapsing (which copies packets)
for pathological flows, and revert to tcp_prune_ofo_queue() which
will be less expensive.
In the future, we might add the possibility of terminating flows
that are proven to be malicious.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit f4a3313d8e2ca9fd8d8f45e40a2903ba782607e7 ]
Right after a TCP flow is created, receiving tiny out of order
packets allways hit the condition :
if (atomic_read(&sk->sk_rmem_alloc) >= sk->sk_rcvbuf)
tcp_clamp_window(sk);
tcp_clamp_window() increases sk_rcvbuf to match sk_rmem_alloc
(guarded by tcp_rmem[2])
Calling tcp_collapse_ofo_queue() in this case is not useful,
and offers a O(N^2) surface attack to malicious peers.
Better not attempt anything before full queue capacity is reached,
forcing attacker to spend lots of resource and allow us to more
easily detect the abuse.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAls7QB8ACgkQONu9yGCS
aT6trQ/9EO1dgc0lZO0zGCxFFiikPzzMp1auSKd99FhSaqlrCPutT5K0gBVc1rug
EvggbqWj2MBX2HZvxQR8LbGNvp7+kkM3apIdYOqyTQPvs7x03YNeuvXZUF3EFyPO
eDZ71nLuwgnEeySceJ+Z9HcVBcWR/0dEkwjhjpIJ2IO25tcecWzbqOOdzNypBIKK
EG4dGhO5JY6jLqxbEFZ9d302bGZQozOQHiDfEZz6NueI0yYVJIjQQvuLp/V0ChDg
TN+PgTOdzxIPCpZw9y4XzN4nhdsOial1xeX7agzAkZDjdbprNpbZrxjfY0NLdpQ0
4ZV3vLqIZ5rs8xuCRgNJ7yTVt6X7miw/h7TQp30qpeDuRf1SHZa4ITqMzdXJUahW
BT+XkjrrCjKxXkCH+rWy0txtouUaVwM+sKHIW0bvrOJwHM0UJXNAUppt4NrBtgtD
7Zt/FDKAHCJk1GuW3U5zXOHmgn+QkRNEndpwbUjwRowvHcE5jVSLLkH4XZkA0+SL
ucQCxOqGKrbHjhyXT+e2Kpx4Z5sqJIUHhc4iw6gi7xyaoJ55kHZ2S+sCwo3cjreq
B43SrwkQ0EJXwHzcrmvDfnvEFf7ylDVWH597lQsIQMNI7Gg04fXixYpvr6DYOBSN
AKHvoqd7VztHnX/ZogyLXp4jWiU5dU6qYXdj/zEs+tB8DYPZ4+c=
=Mli0
-----END PGP SIGNATURE-----
Merge 4.4.139 into android-4.4
Changes in 4.4.139
xfrm6: avoid potential infinite loop in _decode_session6()
netfilter: ebtables: handle string from userspace with care
ipvs: fix buffer overflow with sync daemon and service
atm: zatm: fix memcmp casting
net: qmi_wwan: Add Netgear Aircard 779S
net/sonic: Use dma_mapping_error()
Revert "Btrfs: fix scrub to repair raid6 corruption"
tcp: do not overshoot window_clamp in tcp_rcv_space_adjust()
Btrfs: make raid6 rebuild retry more
usb: musb: fix remote wakeup racing with suspend
bonding: re-evaluate force_primary when the primary slave name changes
tcp: verify the checksum of the first data segment in a new connection
ext4: update mtime in ext4_punch_hole even if no blocks are released
ext4: fix fencepost error in check for inode count overflow during resize
driver core: Don't ignore class_dir_create_and_add() failure.
btrfs: scrub: Don't use inode pages for device replace
ALSA: hda - Handle kzalloc() failure in snd_hda_attach_pcm_stream()
ALSA: hda: add dock and led support for HP EliteBook 830 G5
ALSA: hda: add dock and led support for HP ProBook 640 G4
cpufreq: Fix new policy initialization during limits updates via sysfs
libata: zpodd: make arrays cdb static, reduces object code size
libata: zpodd: small read overflow in eject_tray()
libata: Drop SanDisk SD7UB3Q*G1001 NOLPM quirk
w1: mxc_w1: Enable clock before calling clk_get_rate() on it
fs/binfmt_misc.c: do not allow offset overflow
x86/spectre_v1: Disable compiler optimizations over array_index_mask_nospec()
m68k/mm: Adjust VM area to be unmapped by gap size for __iounmap()
serial: sh-sci: Use spin_{try}lock_irqsave instead of open coding version
signal/xtensa: Consistenly use SIGBUS in do_unaligned_user
usb: do not reset if a low-speed or full-speed device timed out
1wire: family module autoload fails because of upper/lower case mismatch.
ASoC: dapm: delete dapm_kcontrol_data paths list before freeing it
ASoC: cirrus: i2s: Fix LRCLK configuration
ASoC: cirrus: i2s: Fix {TX|RX}LinCtrlData setup
lib/vsprintf: Remove atomic-unsafe support for %pCr
mips: ftrace: fix static function graph tracing
branch-check: fix long->int truncation when profiling branches
ipmi:bt: Set the timeout before doing a capabilities check
Bluetooth: hci_qca: Avoid missing rampatch failure with userspace fw loader
fuse: atomic_o_trunc should truncate pagecache
fuse: don't keep dead fuse_conn at fuse_fill_super().
fuse: fix control dir setup and teardown
powerpc/mm/hash: Add missing isync prior to kernel stack SLB switch
powerpc/ptrace: Fix setting 512B aligned breakpoints with PTRACE_SET_DEBUGREG
powerpc/ptrace: Fix enforcement of DAWR constraints
cpuidle: powernv: Fix promotion from snooze if next state disabled
powerpc/fadump: Unregister fadump on kexec down path.
ARM: 8764/1: kgdb: fix NUMREGBYTES so that gdb_regs[] is the correct size
of: unittest: for strings, account for trailing \0 in property length field
IB/qib: Fix DMA api warning with debug kernel
RDMA/mlx4: Discard unknown SQP work requests
mtd: cfi_cmdset_0002: Change write buffer to check correct value
mtd: cfi_cmdset_0002: Use right chip in do_ppb_xxlock()
mtd: cfi_cmdset_0002: fix SEGV unlocking multiple chips
mtd: cfi_cmdset_0002: Fix unlocking requests crossing a chip boudary
mtd: cfi_cmdset_0002: Avoid walking all chips when unlocking.
MIPS: BCM47XX: Enable 74K Core ExternalSync for PCIe erratum
PCI: pciehp: Clear Presence Detect and Data Link Layer Status Changed on resume
MIPS: io: Add barrier after register read in inX()
time: Make sure jiffies_to_msecs() preserves non-zero time periods
Btrfs: fix clone vs chattr NODATASUM race
iio:buffer: make length types match kfifo types
scsi: qla2xxx: Fix setting lower transfer speed if GPSC fails
scsi: zfcp: fix missing SCSI trace for result of eh_host_reset_handler
scsi: zfcp: fix missing SCSI trace for retry of abort / scsi_eh TMF
scsi: zfcp: fix misleading REC trigger trace where erp_action setup failed
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io early return
scsi: zfcp: fix missing REC trigger trace on terminate_rport_io for ERP_FAILED
scsi: zfcp: fix missing REC trigger trace for all objects in ERP_FAILED
scsi: zfcp: fix missing REC trigger trace on enqueue without ERP thread
linvdimm, pmem: Preserve read-only setting for pmem devices
md: fix two problems with setting the "re-add" device state.
ubi: fastmap: Cancel work upon detach
UBIFS: Fix potential integer overflow in allocation
xfrm: Ignore socket policies when rebuilding hash tables
xfrm: skip policies marked as dead while rehashing
backlight: as3711_bl: Fix Device Tree node lookup
backlight: max8925_bl: Fix Device Tree node lookup
backlight: tps65217_bl: Fix Device Tree node lookup
mfd: intel-lpss: Program REMAP register in PIO mode
perf tools: Fix symbol and object code resolution for vdso32 and vdsox32
perf intel-pt: Fix sync_switch INTEL_PT_SS_NOT_TRACING
perf intel-pt: Fix decoding to accept CBR between FUP and corresponding TIP
perf intel-pt: Fix MTC timing after overflow
perf intel-pt: Fix "Unexpected indirect branch" error
perf intel-pt: Fix packet decoding of CYC packets
media: v4l2-compat-ioctl32: prevent go past max size
media: cx231xx: Add support for AverMedia DVD EZMaker 7
media: dvb_frontend: fix locking issues at dvb_frontend_get_event()
nfsd: restrict rd_maxcount to svc_max_payload in nfsd_encode_readdir
NFSv4: Fix possible 1-byte stack overflow in nfs_idmap_read_and_verify_message
video: uvesafb: Fix integer overflow in allocation
Input: elan_i2c - add ELAN0618 (Lenovo v330 15IKB) ACPI ID
xen: Remove unnecessary BUG_ON from __unbind_from_irq()
udf: Detect incorrect directory size
Input: elan_i2c_smbus - fix more potential stack buffer overflows
Input: elantech - enable middle button of touchpads on ThinkPad P52
Input: elantech - fix V4 report decoding for module with middle key
ALSA: hda/realtek - Add a quirk for FSC ESPRIMO U9210
Btrfs: fix unexpected cow in run_delalloc_nocow
spi: Fix scatterlist elements size in spi_map_buf
block: Fix transfer when chunk sectors exceeds max
dm thin: handle running out of data space vs concurrent discard
cdc_ncm: avoid padding beyond end of skb
Bluetooth: Fix connection if directed advertising and privacy is used
Linux 4.4.139
Change-Id: I93013bedf2ebe3e6a8718972d8854723609963cc
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 02db55718d53f9d426cee504c27fb768e9ed4ffe upstream.
While rcvbuf is properly clamped by tcp_rmem[2], rcvwin
is left to a potentially too big value.
It has no serious effect, since :
1) tcp_grow_window() has very strict checks.
2) window_clamp can be mangled by user space to any value anyway.
tcp_init_buffer_space() and companions use tcp_full_space(),
we use tcp_win_from_space() to avoid reloading sk->sk_rcvbuf
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Benjamin Gilbert <benjamin.gilbert@coreos.com>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 607065bad9931e72207b0cac365d7d4abc06bd99 upstream.
When using large tcp_rmem[2] values (I did tests with 500 MB),
I noticed overflows while computing rcvwin.
Lets fix this before the following patch.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Wei Wang <weiwan@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
[Backport: sysctl_tcp_rmem is not Namespace-ify'd in older kernels]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlrlXREACgkQONu9yGCS
aT6hkA//dewP2AXQLE4Nb3ufxBOo2CcDOKRrMQTgwnaqe06FnXcdty8jZ6I6ziBV
MtpYmyTY4Ujj062LPjwov6CEI3qS5aNXFd9Xqtq490FQBeKBA56MgZ7qOEb/aHDQ
4e7D9hIFp+gu2J41toWMJO/BZ1qcMKOEo9e49NNQG+4JjNz6fhdQpdsGC8QqFauQ
aT6l+cHhU+yxuBdj/Nzu5YnQrE4Ijk3omOsLrGR6hUCjG0pSOX2YK7XF+W+VcWd9
7Uky54yPC/DuDLTLCWdZJxD/2WzrE0vvKhfkhty4X0k9ZXqnGbSr1x/FSKLrnXNi
aigN2OBvlHI2RBwl8J06QOfmG32Ndi32nNj7f5+45VXQkiVK7OS3B0CgkYh4m+7J
dEZVvH8Ddj4p6Io8WqWWRP0UDRNPP+reUwlIIg19dSKMv0veez1g0AFO7AbWh2Lu
q9QRxtKTeYR7We8w3F2CeLHjP1NLhKvdffT8TVjhEL6MekWkOjFjQXGoRIOPpeld
T0buhSQMgoEuJXMPz9zywJ2MmgKcRtmQKGKfZ7MDa82kwiIY5KRemTVnwn1lEKXU
EC+kCQmjiobcuGbBbH1hXfWcSpDl/+ZwI5r0L2Lzkzf0R+Efge7rGBXZQCgz3FGp
DVtpPPnRnXbGZX+0GyeiZ5O9wCfZTQxqOrMk2rtPbxJY5f6yIcE=
=IA5i
-----END PGP SIGNATURE-----
Merge 4.4.130 into android-4.4
Changes in 4.4.130
cifs: do not allow creating sockets except with SMB1 posix exensions
x86/tsc: Prevent 32bit truncation in calc_hpet_ref()
perf: Return proper values for user stack errors
staging: ion : Donnot wakeup kswapd in ion system alloc
r8152: add Linksys USB3GIGV1 id
Input: drv260x - fix initializing overdrive voltage
ath9k_hw: check if the chip failed to wake up
jbd2: fix use after free in kjournald2()
Revert "ath10k: send (re)assoc peer command when NSS changed"
s390: introduce CPU alternatives
s390: enable CPU alternatives unconditionally
KVM: s390: wire up bpb feature
s390: scrub registers on kernel entry and KVM exit
s390: add optimized array_index_mask_nospec
s390/alternative: use a copy of the facility bit mask
s390: add options to change branch prediction behaviour for the kernel
s390: run user space and KVM guests with modified branch prediction
s390: introduce execute-trampolines for branches
s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
s390: do not bypass BPENTER for interrupt system calls
s390/entry.S: fix spurious zeroing of r0
s390: move nobp parameter functions to nospec-branch.c
s390: add automatic detection of the spectre defense
s390: report spectre mitigation via syslog
s390: add sysfs attributes for spectre
s390: correct nospec auto detection init order
s390: correct module section names for expoline code revert
bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
KEYS: DNS: limit the length of option strings
l2tp: check sockaddr length in pppol2tp_connect()
net: validate attribute sizes in neigh_dump_table()
llc: delete timers synchronously in llc_sk_free()
tcp: don't read out-of-bounds opsize
team: avoid adding twice the same option to the event list
team: fix netconsole setup over team
packet: fix bitfield update race
pppoe: check sockaddr length in pppoe_connect()
vlan: Fix reading memory beyond skb->tail in skb_vlan_tagged_multi
sctp: do not check port in sctp_inet6_cmp_addr
llc: hold llc_sap before release_sock()
llc: fix NULL pointer deref for SOCK_ZAPPED
tipc: add policy for TIPC_NLA_NET_ADDR
net: fix deadlock while clearing neighbor proxy table
tcp: md5: reject TCP_MD5SIG or TCP_MD5SIG_EXT on established sockets
net: af_packet: fix race in PACKET_{R|T}X_RING
ipv6: add RTA_TABLE and RTA_PREFSRC to rtm_ipv6_policy
scsi: mptsas: Disable WRITE SAME
cdrom: information leak in cdrom_ioctl_media_changed()
s390/cio: update chpid descriptor after resource accessibility event
s390/uprobes: implement arch_uretprobe_is_alive()
Linux 4.4.130
Change-Id: I58646180c70ac61da3e2a602085760881d914eb5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlrQ7ekACgkQONu9yGCS
aT6Znw//VtAP82BGP/+H6X6gt0rBRIYseEJkHOpKRu5PK+Vpx7mMQFIfBId95P6R
buq1QyzY9yz8ixbByg/w60WA2jK/I9i0tDGBnSlZzNmUvbk01oBN+cc/weZDynF7
rFbSvD3aTmPB4nm9VE+n7V/tgGeuu/bwi04zulAm/B0/zA+w9GZv/aAto3WlLdjF
ogZPSo5y6ifm6Qryq9sTR42LyDBXOy1klRSIK5EXY1OnIvPL1HSYR3ea2yj3AMXB
RPvpCCY8j7zC9yVifX1c+Gfv2tXVHb9kjgheJixP2J4M3fFlR5tjLQXtTP2S2I8G
cuMcdT6MiQw31rMoLcpej66dMtkL3k6sEpzcnSPPNenTuDIolz7BLEyaO/hhgi9J
6vIXAd4Xm9D8HkH3iG/L3GtD3JXpVPtHyli/X1M3hz/VNUSOUPENIjMmGoxfBOtQ
d7c8VGxDjnqmafri3fBAm4c603qW7O1wqJ7vLs9z7vgOIxlOLoJ/uiazoJKgW6O0
z0S/BABWqpAUAI9jgm2GvRDR2keM2mhQIgIrY0+ZpnaLSGe3MugB+GbK6xdBCuYA
anOv9VTEAPlTc8gb+GlusbUVjQyacEDwXoT6f9mELCW8cqpMgh+3TiKFihbYkUTN
ly/DxZH3jpva0dq94Mgjv1u/nlg9ac3zqGeo9buQQFC7MSoZKEM=
=LiZa
-----END PGP SIGNATURE-----
Merge 4.4.128 into android-4.4
Changes in 4.4.128
cfg80211: make RATE_INFO_BW_20 the default
md/raid5: make use of spin_lock_irq over local_irq_disable + spin_lock
rtc: snvs: fix an incorrect check of return value
x86/asm: Don't use RBP as a temporary register in csum_partial_copy_generic()
NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
IB/srpt: Fix abort handling
af_key: Fix slab-out-of-bounds in pfkey_compile_policy.
mac80211: bail out from prep_connection() if a reconfig is ongoing
bna: Avoid reading past end of buffer
qlge: Avoid reading past end of buffer
ipmi_ssif: unlock on allocation failure
net: cdc_ncm: Fix TX zero padding
net: ethernet: ti: cpsw: adjust cpsw fifos depth for fullduplex flow control
lockd: fix lockd shutdown race
drivers/misc/vmw_vmci/vmci_queue_pair.c: fix a couple integer overflow tests
pidns: disable pid allocation if pid_ns_prepare_proc() is failed in alloc_pid()
s390: move _text symbol to address higher than zero
net/mlx4_en: Avoid adding steering rules with invalid ring
NFSv4.1: Work around a Linux server bug...
CIFS: silence lockdep splat in cifs_relock_file()
blk-mq: NVMe 512B/4K+T10 DIF/DIX format returns I/O error on dd with split op
net: qca_spi: Fix alignment issues in rx path
netxen_nic: set rcode to the return status from the call to netxen_issue_cmd
Input: elan_i2c - check if device is there before really probing
Input: elantech - force relative mode on a certain module
KVM: PPC: Book3S PR: Check copy_to/from_user return values
vmxnet3: ensure that adapter is in proper state during force_close
SMB2: Fix share type handling
bus: brcmstb_gisb: Use register offsets with writes too
bus: brcmstb_gisb: correct support for 64-bit address output
PowerCap: Fix an error code in powercap_register_zone()
ARM: dts: imx53-qsrb: Pulldown PMIC IRQ pin
staging: wlan-ng: prism2mgmt.c: fixed a double endian conversion before calling hfa384x_drvr_setconfig16, also fixes relative sparse warning
x86/tsc: Provide 'tsc=unstable' boot parameter
ARM: dts: imx6qdl-wandboard: Fix audio channel swap
ipv6: avoid dad-failures for addresses with NODAD
async_tx: Fix DMA_PREP_FENCE usage in do_async_gen_syndrome()
usb: dwc3: keystone: check return value
btrfs: fix incorrect error return ret being passed to mapping_set_error
ata: libahci: properly propagate return value of platform_get_irq()
neighbour: update neigh timestamps iff update is effective
arp: honour gratuitous ARP _replies_
usb: chipidea: properly handle host or gadget initialization failure
USB: ene_usb6250: fix first command execution
net: x25: fix one potential use-after-free issue
USB: ene_usb6250: fix SCSI residue overwriting
serial: 8250: omap: Disable DMA for console UART
serial: sh-sci: Fix race condition causing garbage during shutdown
sh_eth: Use platform device for printing before register_netdev()
scsi: csiostor: fix use after free in csio_hw_use_fwconfig()
powerpc/mm: Fix virt_addr_valid() etc. on 64-bit hash
ath5k: fix memory leak on buf on failed eeprom read
selftests/powerpc: Fix TM resched DSCR test with some compilers
xfrm: fix state migration copy replay sequence numbers
iio: hi8435: avoid garbage event at first enable
iio: hi8435: cleanup reset gpio
ext4: handle the rest of ext4_mb_load_buddy() ENOMEM errors
md-cluster: fix potential lock issue in add_new_disk
ARM: davinci: da8xx: Create DSP device only when assigned memory
ray_cs: Avoid reading past end of buffer
leds: pca955x: Correct I2C Functionality
sched/numa: Use down_read_trylock() for the mmap_sem
net/mlx5: Tolerate irq_set_affinity_hint() failures
selinux: do not check open permission on sockets
block: fix an error code in add_partition()
mlx5: fix bug reading rss_hash_type from CQE
net: ieee802154: fix net_device reference release too early
libceph: NULL deref on crush_decode() error path
netfilter: ctnetlink: fix incorrect nf_ct_put during hash resize
pNFS/flexfiles: missing error code in ff_layout_alloc_lseg()
ASoC: rsnd: SSI PIO adjust to 24bit mode
scsi: bnx2fc: fix race condition in bnx2fc_get_host_stats()
fix race in drivers/char/random.c:get_reg()
ext4: fix off-by-one on max nr_pages in ext4_find_unwritten_pgoff()
tcp: better validation of received ack sequences
net: move somaxconn init from sysctl code
Input: elan_i2c - clear INT before resetting controller
bonding: Don't update slave->link until ready to commit
KVM: nVMX: Fix handling of lmsw instruction
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
ARM: dts: ls1021a: add "fsl,ls1021a-esdhc" compatible string to esdhc node
thermal: power_allocator: fix one race condition issue for thermal_instances list
perf probe: Add warning message if there is unexpected event name
l2tp: fix missing print session offset info
rds; Reset rs->rs_bound_addr in rds_add_bound() failure path
hwmon: (ina2xx) Make calibration register value fixed
media: videobuf2-core: don't go out of the buffer range
ASoC: Intel: cht_bsw_rt5645: Analog Mic support
scsi: libiscsi: Allow sd_shutdown on bad transport
scsi: mpt3sas: Proper handling of set/clear of "ATA command pending" flag.
vfb: fix video mode and line_length being set when loaded
gpio: label descriptors using the device name
ASoC: Intel: sst: Fix the return value of 'sst_send_byte_stream_mrfld()'
wl1251: check return from call to wl1251_acx_arp_ip_filter
hdlcdrv: Fix divide by zero in hdlcdrv_ioctl
ovl: filter trusted xattr for non-admin
powerpc/[booke|4xx]: Don't clobber TCR[WP] when setting TCR[DIE]
dmaengine: imx-sdma: Handle return value of clk_prepare_enable
arm64: futex: Fix undefined behaviour with FUTEX_OP_OPARG_SHIFT usage
net/mlx5: avoid build warning for uniprocessor
cxgb4: FW upgrade fixes
rtc: opal: Handle disabled TPO in opal_get_tpo_time()
rtc: interface: Validate alarm-time before handling rollover
SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()
net: freescale: fix potential null pointer dereference
KVM: SVM: do not zero out segment attributes if segment is unusable or not present
clk: scpi: fix return type of __scpi_dvfs_round_rate
clk: Fix __set_clk_rates error print-string
powerpc/spufs: Fix coredump of SPU contexts
perf trace: Add mmap alias for s390
qlcnic: Fix a sleep-in-atomic bug in qlcnic_82xx_hw_write_wx_2M and qlcnic_82xx_hw_read_wx_2M
mISDN: Fix a sleep-in-atomic bug
drm/omap: fix tiled buffer stride calculations
cxgb4: fix incorrect cim_la output for T6
Fix serial console on SNI RM400 machines
bio-integrity: Do not allocate integrity context for bio w/o data
skbuff: return -EMSGSIZE in skb_to_sgvec to prevent overflow
sit: reload iphdr in ipip6_rcv
net/mlx4: Fix the check in attaching steering rules
net/mlx4: Check if Granular QoS per VF has been enabled before updating QP qos_vport
perf header: Set proper module name when build-id event found
perf report: Ensure the perf DSO mapping matches what libdw sees
tags: honor COMPILED_SOURCE with apart output directory
e1000e: fix race condition around skb_tstamp_tx()
cx25840: fix unchecked return values
mceusb: sporadic RX truncation corruption fix
net: phy: avoid genphy_aneg_done() for PHYs without clause 22 support
ARM: imx: Add MXC_CPU_IMX6ULL and cpu_is_imx6ull
e1000e: Undo e1000e_pm_freeze if __e1000_shutdown fails
perf/core: Correct event creation with PERF_FORMAT_GROUP
MIPS: mm: fixed mappings: correct initialisation
MIPS: mm: adjust PKMAP location
MIPS: kprobes: flush_insn_slot should flush only if probe initialised
Fix loop device flush before configure v3
net: emac: fix reset timeout with AR8035 phy
perf tests: Decompress kernel module before objdump
skbuff: only inherit relevant tx_flags
xen: avoid type warning in xchg_xen_ulong
bnx2x: Allow vfs to disable txvlan offload
sctp: fix recursive locking warning in sctp_do_peeloff
sparc64: ldc abort during vds iso boot
iio: magnetometer: st_magn_spi: fix spi_device_id table
Bluetooth: Send HCI Set Event Mask Page 2 command only when needed
cpuidle: dt: Add missing 'of_node_put()'
ACPICA: Events: Add runtime stub support for event APIs
ACPICA: Disassembler: Abort on an invalid/unknown AML opcode
s390/dasd: fix hanging safe offline
vxlan: dont migrate permanent fdb entries during learn
bcache: stop writeback thread after detaching
bcache: segregate flash only volume write streams
scsi: libsas: fix memory leak in sas_smp_get_phy_events()
scsi: libsas: fix error when getting phy events
scsi: libsas: initialize sas_phy status according to response of DISCOVER
blk-mq: fix kernel oops in blk_mq_tag_idle()
tty: n_gsm: Allow ADM response in addition to UA for control dlci
EDAC, mv64x60: Fix an error handling path
cxgb4vf: Fix SGE FL buffer initialization logic for 64K pages
perf tools: Fix copyfile_offset update of output offset
ipsec: check return value of skb_to_sgvec always
rxrpc: check return value of skb_to_sgvec always
virtio_net: check return value of skb_to_sgvec always
virtio_net: check return value of skb_to_sgvec in one more location
random: use lockless method of accessing and updating f->reg_idx
futex: Remove requirement for lock_page() in get_futex_key()
Kbuild: provide a __UNIQUE_ID for clang
arp: fix arp_filter on l3slave devices
net: fix possible out-of-bound read in skb_network_protocol()
net/ipv6: Fix route leaking between VRFs
netlink: make sure nladdr has correct size in netlink_connect()
net/sched: fix NULL dereference in the error path of tcf_bpf_init()
pptp: remove a buggy dst release in pptp_connect()
sctp: do not leak kernel memory to user space
sctp: sctp_sockaddr_af must check minimal addr length for AF_INET6
sky2: Increase D3 delay to sky2 stops working after suspend
vhost: correctly remove wait queue during poll failure
vlan: also check phy_driver ts_info for vlan's real device
bonding: fix the err path for dev hwaddr sync in bond_enslave
bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
bonding: process the err returned by dev_set_allmulti properly in bond_enslave
net: fool proof dev_valid_name()
ip_tunnel: better validate user provided tunnel names
ipv6: sit: better validate user provided tunnel names
ip6_gre: better validate user provided tunnel names
ip6_tunnel: better validate user provided tunnel names
vti6: better validate user provided tunnel names
r8169: fix setting driver_data after register_netdev
net sched actions: fix dumping which requires several messages to user space
net/ipv6: Increment OUTxxx counters after netfilter hook
ipv6: the entire IPv6 header chain must fit the first fragment
vrf: Fix use after free and double free in vrf_finish_output
Revert "xhci: plat: Register shutdown for xhci_plat"
Linux 4.4.128
Change-Id: I9c1e58f634cc18f15a840c9d192c892dfcc5ff73
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit d0e1a1b5a833b625c93d3d49847609350ebd79db ]
Paul Fiterau Brostean reported :
<quote>
Linux TCP stack we analyze exhibits behavior that seems odd to me.
The scenario is as follows (all packets have empty payloads, no window
scaling, rcv/snd window size should not be a factor):
TEST HARNESS (CLIENT) LINUX SERVER
1. - LISTEN (server listen,
then accepts)
2. - --> <SEQ=100><CTL=SYN> --> SYN-RECEIVED
3. - <-- <SEQ=300><ACK=101><CTL=SYN,ACK> <-- SYN-RECEIVED
4. - --> <SEQ=101><ACK=301><CTL=ACK> --> ESTABLISHED
5. - <-- <SEQ=301><ACK=101><CTL=FIN,ACK> <-- FIN WAIT-1 (server
opts to close the data connection calling "close" on the connection
socket)
6. - --> <SEQ=101><ACK=99999><CTL=FIN,ACK> --> CLOSING (client sends
FIN,ACK with not yet sent acknowledgement number)
7. - <-- <SEQ=302><ACK=102><CTL=ACK> <-- CLOSING (ACK is 102
instead of 101, why?)
... (silence from CLIENT)
8. - <-- <SEQ=301><ACK=102><CTL=FIN,ACK> <-- CLOSING
(retransmission, again ACK is 102)
Now, note that packet 6 while having the expected sequence number,
acknowledges something that wasn't sent by the server. So I would
expect
the packet to maybe prompt an ACK response from the server, and then be
ignored. Yet it is not ignored and actually leads to an increase of the
acknowledgement number in the server's retransmission of the FIN,ACK
packet. The explanation I found is that the FIN in packet 6 was
processed, despite the acknowledgement number being unacceptable.
Further experiments indeed show that the server processes this FIN,
transitioning to CLOSING, then on receiving an ACK for the FIN it had
send in packet 5, the server (or better said connection) transitions
from CLOSING to TIME_WAIT (as signaled by netstat).
</quote>
Indeed, tcp_rcv_state_process() calls tcp_ack() but
does not exploit the @acceptable status but for TCP_SYN_RECV
state.
What we want here is to send a challenge ACK, if not in TCP_SYN_RECV
state. TCP_FIN_WAIT1 state is not the only state we should fix.
Add a FLAG_NO_CHALLENGE_ACK so that tcp_rcv_state_process()
can choose to send a challenge ACK and discard the packet instead
of wrongly change socket state.
With help from Neal Cardwell.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Paul Fiterau Brostean <p.fiterau-brostean@science.ru.nl>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlq2IVkACgkQONu9yGCS
aT5SGQ//eDH59qGQJFy8GwDQULnYV8JEeBZT2tIzx2LDVJvKKz/8BkJmcGZrQ0gH
9EnMCiPkbxRH6HV6Cu1SHcyLFK+iVXGEw28Uk/sEcesQSZvrRxUKIgRLvyWhANgP
E1FiPbI6V1tXTROmUk/lrGZYgHMVUMAa+kYRndpbijtB++7qO25mVngP0Yg6OveO
oyw0MRx+v9mwQS/SID2EtlXnZOVGM+7kd3gg5fLY5w0qJT1jHjhrtZNWyKH2SMrS
QnzkFDMGDIWk5TrHJY0LEvpZI/toKNtPrG/Mt5gOvtSgjmQ4+EEVndrlutPAOa3K
xF6ERjtyBIRRG2ftk122fm6brjpxyCoqycD8JgCm1BxalNN7Bg+1ogQCGwE0EWNp
yYEsxLmSbLllX/4rkZtx+WsgiG4rwNWG1/IsPsEo80La3M53WzA3My8E0aVLpbIj
aqkzR62lZ1TSRgtbFjm6Bl4DtpJH4f9uELbO0VBi7b8LRTUI99Qcz4bOoKH+WKOC
umvHsuyMwDm8wc0KhQ4hdyGb1le0nS4Y1xydp1p0pcASLfeA2VOIg20ZvxQVBTZA
rlHo7zEQVkVYvphoPONUodOUVZ8/AGxoVvim3HlI6s5VvtdF6k/hQUWaOuWDh59J
bjw2vnMWOLWSqmmkpK9HmU6ohIV310TJewPu8BShzAuZRSNDxl8=
=OCjx
-----END PGP SIGNATURE-----
Merge 4.4.124 into android-4.4
Changes in 4.4.124
tpm: fix potential buffer overruns caused by bit glitches on the bus
tpm_tis: fix potential buffer overruns caused by bit glitches on the bus
SMB3: Validate negotiate request must always be signed
CIFS: Enable encryption during session setup phase
staging: android: ashmem: Fix possible deadlock in ashmem_ioctl
platform/x86: asus-nb-wmi: Add wapf4 quirk for the X302UA
regulator: anatop: set default voltage selector for pcie
x86: i8259: export legacy_pic symbol
rtc: cmos: Do not assume irq 8 for rtc when there are no legacy irqs
Input: ar1021_i2c - fix too long name in driver's device table
time: Change posix clocks ops interfaces to use timespec64
ACPI/processor: Fix error handling in __acpi_processor_start()
ACPI/processor: Replace racy task affinity logic
cpufreq/sh: Replace racy task affinity logic
genirq: Use irqd_get_trigger_type to compare the trigger type for shared IRQs
i2c: i2c-scmi: add a MS HID
net: ipv6: send unsolicited NA on admin up
media/dvb-core: Race condition when writing to CAM
spi: dw: Disable clock after unregistering the host
ath: Fix updating radar flags for coutry code India
clk: ns2: Correct SDIO bits
scsi: virtio_scsi: Always try to read VPD pages
KVM: PPC: Book3S PR: Exit KVM on failed mapping
ARM: 8668/1: ftrace: Fix dynamic ftrace with DEBUG_RODATA and !FRAME_POINTER
iommu/omap: Register driver before setting IOMMU ops
md/raid10: wait up frozen array in handle_write_completed
NFS: Fix missing pg_cleanup after nfs_pageio_cond_complete()
tcp: remove poll() flakes with FastOpen
e1000e: fix timing for 82579 Gigabit Ethernet controller
ALSA: hda - Fix headset microphone detection for ASUS N551 and N751
IB/ipoib: Fix deadlock between ipoib_stop and mcast join flow
IB/ipoib: Update broadcast object if PKey value was changed in index 0
HSI: ssi_protocol: double free in ssip_pn_xmit()
IB/mlx4: Take write semaphore when changing the vma struct
IB/mlx4: Change vma from shared to private
ASoC: Intel: Skylake: Uninitialized variable in probe_codec()
Fix driver usage of 128B WQEs when WQ_CREATE is V1.
netfilter: xt_CT: fix refcnt leak on error path
openvswitch: Delete conntrack entry clashing with an expectation.
mmc: host: omap_hsmmc: checking for NULL instead of IS_ERR()
wan: pc300too: abort path on failure
qlcnic: fix unchecked return value
scsi: mac_esp: Replace bogus memory barrier with spinlock
infiniband/uverbs: Fix integer overflows
NFS: don't try to cross a mountpount when there isn't one there.
iio: st_pressure: st_accel: Initialise sensor platform data properly
mt7601u: check return value of alloc_skb
rndis_wlan: add return value validation
Btrfs: send, fix file hole not being preserved due to inline extent
mac80211: don't parse encrypted management frames in ieee80211_frame_acked
mfd: palmas: Reset the POWERHOLD mux during power off
mtip32xx: use runtime tag to initialize command header
staging: unisys: visorhba: fix s-Par to boot with option CONFIG_VMAP_STACK set to y
staging: wilc1000: fix unchecked return value
mmc: sdhci-of-esdhc: limit SD clock for ls1012a/ls1046a
ARM: DRA7: clockdomain: Change the CLKTRCTRL of CM_PCIE_CLKSTCTRL to SW_WKUP
ipmi/watchdog: fix wdog hang on panic waiting for ipmi response
ACPI / PMIC: xpower: Fix power_table addresses
drm/nouveau/kms: Increase max retries in scanout position queries.
bnx2x: Align RX buffers
power: supply: pda_power: move from timer to delayed_work
Input: twl4030-pwrbutton - use correct device for irq request
md/raid10: skip spare disk as 'first' disk
ia64: fix module loading for gcc-5.4
tcm_fileio: Prevent information leak for short reads
video: fbdev: udlfb: Fix buffer on stack
sm501fb: don't return zero on failure path in sm501fb_start()
net: hns: fix ethtool_get_strings overflow in hns driver
cifs: small underflow in cnvrtDosUnixTm()
rtc: ds1374: wdt: Fix issue with timeout scaling from secs to wdt ticks
rtc: ds1374: wdt: Fix stop/start ioctl always returning -EINVAL
perf tests kmod-path: Don't fail if compressed modules aren't supported
Bluetooth: hci_qca: Avoid setup failure on missing rampatch
media: c8sectpfe: fix potential NULL pointer dereference in c8sectpfe_timer_interrupt
drm/msm: fix leak in failed get_pages
RDMA/iwpm: Fix uninitialized error code in iwpm_send_mapinfo()
rtlwifi: rtl_pci: Fix the bug when inactiveps is enabled.
media: bt8xx: Fix err 'bt878_probe()'
media: [RESEND] media: dvb-frontends: Add delay to Si2168 restart
cros_ec: fix nul-termination for firmware build info
platform/chrome: Use proper protocol transfer function
mmc: avoid removing non-removable hosts during suspend
IB/ipoib: Avoid memory leak if the SA returns a different DGID
RDMA/cma: Use correct size when writing netlink stats
IB/umem: Fix use of npages/nmap fields
vgacon: Set VGA struct resource types
drm/omap: DMM: Check for DMM readiness after successful transaction commit
pty: cancel pty slave port buf's work in tty_release
coresight: Fix disabling of CoreSight TPIU
pinctrl: Really force states during suspend/resume
iommu/vt-d: clean up pr_irq if request_threaded_irq fails
ip6_vti: adjust vti mtu according to mtu of lower device
RDMA/ocrdma: Fix permissions for OCRDMA_RESET_STATS
nfsd4: permit layoutget of executable-only files
clk: si5351: Rename internal plls to avoid name collisions
dmaengine: ti-dma-crossbar: Fix event mapping for TPCC_EVT_MUX_60_63
RDMA/ucma: Fix access to non-initialized CM_ID object
Linux 4.4.124
Change-Id: Iac6f5bda7941f032c5b1f58750e084140b0e3f23
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 0f9fa831aecfc297b7b45d4f046759bcefcf87f0 ]
When using TCP FastOpen for an active session, we send one wakeup event
from tcp_finish_connect(), right before the data eventually contained in
the received SYNACK is queued to sk->sk_receive_queue.
This means that depending on machine load or luck, poll() users
might receive POLLOUT events instead of POLLIN|POLLOUT
To fix this, we need to move the call to sk->sk_state_change()
after the (optional) call to tcp_rcv_fastopen_synack()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlosIJUACgkQONu9yGCS
aT5/DQ/9Hkroo15HPErzBhLUIo3tdT8HyMBSA5gjy3x+fYBBkbj1HvyPWlwbZVgQ
q2xQs7+Au8UlLLOV2UDz5C3K+hU4tM+uS3TC781ats5+VYWXAg6lXMshqwuXCDCX
eZ1joC1cs+tHQ1AinXV341c6TX9HOQ1/kwZRxvS9q/n8PQd71uzIXxsp/AkqloMQ
zTYz4FKdFPmKrHmMHR2X4MbRX85S5S57IJwmsfFEB+89PhfHGT5UkuClUuVrxV6m
a34BcDDG2SZ9V58OEDS99SJIvVthyisv7iDwoe6qBO0cITI/DiZ15T53gR+6nhMd
L+Y78wWohLFXxNoQFVXax/viFlsSju9j3f9NZeBb3olFlGuSiITsoJethU33Tzg0
Lf1X9ELMekBDyE/e2nYN+wILz6WioWxHrJ9nqAkwxOAL8YQTjfiUnjv/TUO/AJZ3
dmoJyrHOjkUTJpSQXx60ZwyvIyBv/oy8CWbr0lzVoq23OBrP26o/1C1Sbrm6bGMG
6fUccAqFEG9IW7fxUM0ojFRGL/+eSkHIMeEQopfNs4XXIBV53kUD8en+Hf4n5sic
NskK/UZSrC8KCAG4wmv1T+alDSrjhn+7xMdzRieEthhjFouwH3i8OROaVS+YX8+y
oDZpCZxBMhMCFKl+CWbqj8+DdNJQgs5+7R6YhEjEmxpiPfIPQhQ=
=XPGm
-----END PGP SIGNATURE-----
Merge 4.4.105 into android-4.4
Changes in 4.4.105
bcache: only permit to recovery read error when cache device is clean
bcache: recover data from backing when data is clean
uas: Always apply US_FL_NO_ATA_1X quirk to Seagate devices
usb: quirks: Add no-lpm quirk for KY-688 USB 3.1 Type-C Hub
serial: 8250_pci: Add Amazon PCI serial device ID
s390/runtime instrumentation: simplify task exit handling
USB: serial: option: add Quectel BG96 id
ima: fix hash algorithm initialization
s390/pci: do not require AIS facility
selftests/x86/ldt_get: Add a few additional tests for limits
serial: 8250_fintek: Fix rs485 disablement on invalid ioctl()
spi: sh-msiof: Fix DMA transfer size check
usb: phy: tahvo: fix error handling in tahvo_usb_probe()
serial: 8250: Preserve DLD[7:4] for PORT_XR17V35X
x86/entry: Use SYSCALL_DEFINE() macros for sys_modify_ldt()
EDAC, sb_edac: Fix missing break in switch
sysrq : fix Show Regs call trace on ARM
perf test attr: Fix ignored test case result
kprobes/x86: Disable preemption in ftrace-based jprobes
net: systemport: Utilize skb_put_padto()
net: systemport: Pad packet before inserting TSB
ARM: OMAP1: DMA: Correct the number of logical channels
vti6: fix device register to report IFLA_INFO_KIND
net/appletalk: Fix kernel memory disclosure
ravb: Remove Rx overflow log messages
nfs: Don't take a reference on fl->fl_file for LOCK operation
KVM: arm/arm64: Fix occasional warning from the timer work function
NFSv4: Fix client recovery when server reboots multiple times
drm/exynos/decon5433: set STANDALONE_UPDATE_F on output enablement
net: sctp: fix array overrun read on sctp_timer_tbl
tipc: fix cleanup at module unload
dmaengine: pl330: fix double lock
tcp: correct memory barrier usage in tcp_check_space()
mm: avoid returning VM_FAULT_RETRY from ->page_mkwrite handlers
xen-netfront: Improve error handling during initialization
net: fec: fix multicast filtering hardware setup
Revert "ocfs2: should wait dio before inode lock in ocfs2_setattr()"
usb: hub: Cycle HUB power when initialization fails
usb: xhci: fix panic in xhci_free_virt_devices_depth_first
usb: Add USB 3.1 Precision time measurement capability descriptor support
usb: ch9: Add size macro for SSP dev cap descriptor
USB: core: Add type-specific length check of BOS descriptors
USB: Increase usbfs transfer limit
USB: devio: Prevent integer overflow in proc_do_submiturb()
USB: usbfs: Filter flags passed in from user space
usb: host: fix incorrect updating of offset
xen-netfront: avoid crashing on resume after a failure in talk_to_netback()
Linux 4.4.105
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 56d806222ace4c3aeae516cd7a855340fb2839d8 ]
sock_reset_flag() maps to __clear_bit() not the atomic version clear_bit().
Thus, we need smp_mb(), smp_mb__after_atomic() is not sufficient.
Fixes: 3c7151275c ("tcp: add memory barriers to write space paths")
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAloQBz0ACgkQONu9yGCS
aT5nQxAAs/xWKpYLSLvLPYnTOmSmNJ36isgFriVT+wWOLzkrWTWuoQnluDjMQjie
nH6whZMlOnG+k5GrGF3XymxZ66tDj9TlXnPAHCC8ikcqir2/dBO/gO5v2gmFgF2E
j52mt09I3acBQJEt+Rz3xJCMa5so61uDYGtqk/URcPEW1nBa1rfA1QIy/9zv2/aw
2yPSz4NQlv+7yvjguw4Ik5Yt/hGeu1Y8Kuc4bVHG2TB+y0QYwri42bBwQV7llili
XqwfjFJYGMqWJqHGF/p0hD+/Xylw6GnDzxDZQDMNhsuWcfe3tUhuOVkX30E96fh2
ipT4wI5DTmql8EN/r/P7VS2BKL4W5HEMeNEd2APkGNGnSrzKGbd0CQ+cWVZbr645
R03AbqZjhaQKwRi+n82q1mMb4p+3Z/F/T8twHmYg/DLta3kzzRdfVJPNFBFIoSnF
Bay8KJKqoMv2Bjhla78pHMoqSQ9j/fJc2iPIAABtFlsTjic+/STiS7ANAsmDdJtt
8XXc6mFQfbulKKlKKqudPLjOpUNu1SrsOcc9gmovbTy7dN6FBOfJwFMCYonNyXAc
6/ACSxYJlnZ9YEacEmcXmz0GTytyKiTYE3fNsXc/8fHnRZ1+yea9Mo77wWkj7K4V
IqNIJMCW8K+P97oL6mdUBZwMUi4zrWueakMq8SWBdKYaD5yeV2k=
=ql4j
-----END PGP SIGNATURE-----
Merge 4.4.99 into android-4.4
Changes in 4.4.99
mac80211: accept key reinstall without changing anything
mac80211: use constant time comparison with keys
mac80211: don't compare TKIP TX MIC key in reinstall prevention
usb: usbtest: fix NULL pointer dereference
Input: ims-psu - check if CDC union descriptor is sane
ALSA: seq: Cancel pending autoload work at unbinding device
tun/tap: sanitize TUNSETSNDBUF input
tcp: fix tcp_mtu_probe() vs highest_sack
l2tp: check ps->sock before running pppol2tp_session_ioctl()
tun: call dev_get_valid_name() before register_netdevice()
sctp: add the missing sock_owned_by_user check in sctp_icmp_redirect
packet: avoid panic in packet_getsockopt()
ipv6: flowlabel: do not leave opt->tot_len with garbage
net/unix: don't show information about sockets from other namespaces
ip6_gre: only increase err_count for some certain type icmpv6 in ip6gre_err
tun: allow positive return values on dev_get_valid_name() call
sctp: reset owner sk for data chunks on out queues when migrating a sock
ppp: fix race in ppp device destruction
ipip: only increase err_count for some certain type icmp in ipip_err
tcp/dccp: fix ireq->opt races
tcp/dccp: fix lockdep splat in inet_csk_route_req()
tcp/dccp: fix other lockdep splats accessing ireq_opt
security/keys: add CONFIG_KEYS_COMPAT to Kconfig
tipc: fix link attribute propagation bug
brcmfmac: remove setting IBSS mode when stopping AP
target/iscsi: Fix iSCSI task reassignment handling
target: Fix node_acl demo-mode + uncached dynamic shutdown regression
misc: panel: properly restore atomic counter on error path
Linux 4.4.99
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmmdSUACgkQONu9yGCS
aT4lHg/7BJMLfX+Cu7XVaZgxNFym3gdh6+AnsSvqGqenbjRirCeh+bdK4u6iNM8v
h8rGYyp92rYJ168piFxdsRoAl2u4dZBpczOqhpEkwFDx8tI+/B+icWeILI4SX0N2
QWhim6tTTWy2Thw862M7lh5aJl2GxwJtxi/RXXzHq4u4w0NKPFUb+AfXEmUHDoXB
Q6Hz8mo6dcjsW5gyNsBvsYQwvqHpB935Ok2Juz7dwarHx7CWJ+v2fqk9cIf3Nll8
Ia04sg1HCRTePyWD0yld6jCpL51X2ZMVLa37RZCw/9WEDotFdVQO5NUg2ryCQQzN
hNmoiJ47QLBXbZR2rQn5XEtSfWZtplOnm0tB+UYRvxJxtxJGzGTdwUNFdu4iBG4+
xDSXbchTfyH7x93TxsvSZ+PS1NfFblYX8HETvoI2MO8PrGDdeHBZllVfF32xcK3L
VyU+wA1L3quPk0h3MvaFXwoOW8gUAIUyQZEXGXOWTMFDCz88UeBbvPkRAfkyIeYs
UhN8mlnM5cHhC3pPyQKFJ3kTFdQ6pZ79KLNqhvmordvfXBjTZwPt0zNYOlZKWTQR
49WFvxEGH4B68TVc2D4mHGbciqtb+GoTQx4w3HsmyS6FF3hzPqR0L4UOvhiMaDVe
kumziwhF9C6viis7dRlgXyJ5iydUJIcD5mJydfuPT2XIkG85eiU=
=SWxy
-----END PGP SIGNATURE-----
Merge 4.4.85 into android-4.4
Changes in 4.4.85
af_key: do not use GFP_KERNEL in atomic contexts
dccp: purge write queue in dccp_destroy_sock()
dccp: defer ccid_hc_tx_delete() at dismantle time
ipv4: fix NULL dereference in free_fib_info_rcu()
net_sched/sfq: update hierarchical backlog when drop packet
ipv4: better IP_MAX_MTU enforcement
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()
tipc: fix use-after-free
ipv6: reset fn->rr_ptr when replacing route
ipv6: repair fib6 tree in failure case
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP
irda: do not leak initialized list.dev to userspace
net: sched: fix NULL pointer dereference when action calls some targets
net_sched: fix order of queue length updates in qdisc_replace()
mei: me: add broxton pci device ids
mei: me: add lewisburg device ids
Input: trackpoint - add new trackpoint firmware ID
Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310
ALSA: core: Fix unexpected error at replacing user TLV
ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses
i2c: designware: Fix system suspend
drm: Release driver tracking before making the object available again
drm/atomic: If the atomic check fails, return its value first
drm: rcar-du: lvds: Fix PLL frequency-related configuration
drm: rcar-du: lvds: Rename PLLEN bit to PLLON
drm: rcar-du: Fix crash in encoder failure error path
drm: rcar-du: Fix display timing controller parameter
drm: rcar-du: Fix H/V sync signal polarity configuration
tracing: Fix freeing of filter in create_filter() when set_str is false
cifs: Fix df output for users with quota limits
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()
nfsd: Limit end of page list when decoding NFSv4 WRITE
perf/core: Fix group {cpu,task} validation
Bluetooth: hidp: fix possible might sleep error in hidp_session_thread
Bluetooth: cmtp: fix possible might sleep error in cmtp_session
Bluetooth: bnep: fix possible might sleep error in bnep_session
binder: use group leader instead of open thread
binder: Use wake up hint for synchronous transactions.
ANDROID: binder: fix proc->tsk check.
iio: imu: adis16480: Fix acceleration scale factor for adis16480
iio: hid-sensor-trigger: Fix the race with user space powering up sensors
staging: rtl8188eu: add RNX-N150NUB support
ASoC: simple-card: don't fail if sysclk setting is not supported
ASoC: rsnd: disable SRC.out only when stop timing
ASoC: rsnd: avoid pointless loop in rsnd_mod_interrupt()
ASoC: rsnd: Add missing initialization of ADG req_rate
ASoC: rsnd: ssi: 24bit data needs right-aligned settings
ASoC: rsnd: don't call update callback if it was NULL
ntb_transport: fix qp count bug
ntb_transport: fix bug calculating num_qps_mw
ACPI: ioapic: Clear on-stack resource before using it
ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal
Linux 4.4.85
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit cdbeb633ca71a02b7b63bfeb94994bf4e1a0b894 ]
In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.
Fixes: 6ba8a3b19e ("tcp: Tail loss probe (TLP)")
Suggested-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlmPuZcACgkQONu9yGCS
aT5o0BAAlT21EbhyxoMPC6xrPHAF1Oi8mTVfpu+618AUs3B1M6xge/EKI08B/8DP
MZgaqSqY5ttaIlDKX5OVhY+HiuMg3SbIaFDzhS+OzjpuIjSA9ljNHazp5/l2HsQu
9zyPFN02L2zqWYppyDo6FQBfStB5rUHB4eVMgD6zuNU/YQovtibGqAY4LBfWvxf/
eDO6VfjiS4zzcCoplZxcxim1YVZ+HX09BuwniJzukM4C4/uMDubwMlJmrN9YsQZW
x5zWnLHce2MATk9yF4BzMI/iRDR+Bm6Vx3m1Vzq9WDu7/kkMTVdYjXHmZn02YQub
q4q1nDZyzBf8SvE4kf+fMYS8+dUrwiKf0lahBTK31J5Bc33lRfBfvv+dr/aEnp/Q
FhraSkcBDrulnxuq77WZbvWzj0otF1pCTtURyCSfdc4SOFwVIz2NLQ2ZnnXk4gnN
h5TqjxSDwr2CwTMzOnaKjBcuWnKPvn3/Pjm+/MJS8wvQYPZv8a4AzMIwxjDEN78Z
+FvtaWEoUCnlP869hyR7gTfk2541+qjMdDRRUPSQ16PvepKy1AG9iCqVvZThScyQ
PygaiBYZ9pbcyFuExLQrj2FDY2odinPfN8IsCQQbk5Es5mCdzJZOOkLeO2PO0MxD
Dya79igFnpNj7ZEu6T7lD6Izg/6fYWu7qKmpDKQ7/xn4hHxj/Ig=
=D9TG
-----END PGP SIGNATURE-----
Merge 4.4.82 into android-4.4
Changes in 4.4.82
tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states
net: fix keepalive code vs TCP_FASTOPEN_CONNECT
bpf, s390: fix jit branch offset related to ldimm64
net: sched: set xt_tgchk_param par.nft_compat as 0 in ipt_init_target
tcp: fastopen: tcp_connect() must refresh the route
net: avoid skb_warn_bad_offload false positives on UFO
packet: fix tp_reserve race in packet_set_ring
revert "net: account for current skb length when deciding about UFO"
revert "ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output"
udp: consistently apply ufo or fragmentation
sparc64: Prevent perf from running during super critical sections
KVM: arm/arm64: Handle hva aging while destroying the vm
mm/mempool: avoid KASAN marking mempool poison checks as use-after-free
ipv4: Should use consistent conditional judgement for ip fragment in __ip_append_data and ip_finish_output
net: account for current skb length when deciding about UFO
Linux 4.4.82
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit ed254971edea92c3ac5c67c6a05247a92aa6075e ]
If the sender switches the congestion control during ECN-triggered
cwnd-reduction state (CA_CWR), upon exiting recovery cwnd is set to
the ssthresh value calculated by the previous congestion control. If
the previous congestion control is BBR that always keep ssthresh
to TCP_INIFINITE_SSTHRESH, cwnd ends up being infinite. The safe
step is to avoid assigning invalid ssthresh value when recovery ends.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp5y8ACgkQONu9yGCS
aT6g8hAApzYi9TwiaF6wyYXsrp7YvOm4NyMaVBl4t7v/nFql6VsUL+qWaJKB5EL9
o72ybYPUzbxGTVWCm/wiBO31VWea0ak0pBbyywBiowGgwAcgG/jpqZobale4Y2TE
15jEpmpA5+3BmXpMkrv/dz4LHZ4jm65/ADhMbkPGRZqUJ3mHmyVoi50l67dpTE5+
xWQIErycwlVMppJGnXPeFFgeD7Etch7OJ9CishQRNMb3F8H58WiQrMWWe1NfL0DO
H2g18IBHMsxEYJqnRqxviTOMe8S96Km+lKGX0LOTRYt+2OQpfIF7buU6N+6C96rO
7V2n2G02m2mOFVUFlDYF1RQ9IBrxHJf9iGkaZBwsaxX7XAK63ZjRxgjnEL7gMPU/
TMCOWZ53BdZezz2eAmdhySsV+4Xt6MmJJE8rR47AgsM2Le3tgK421zmraunmA0fR
eoJS99YHcftAHXCD3puGLafEwGVe0G4eQbY4L7mj1Y9VjaAbmmsWq9rlNOQMZRgH
JTNyYik1C7yGPJX1iKi9hLAKldzBwPuM3GfZMZQIOjA4t2VtSon7in5iKrihRg3N
BSKXr6+orNw32tsqcC4kpLPbFUFb6zx3EKELwSJwD9ICN7swJEk7gFw7w/F/SOxI
C1W4Ulm6EcYTWHDePERQ4zHlllHAmyJup61d9HnwA6HhPOLaff4=
=oeNk
-----END PGP SIGNATURE-----
Merge 4.4.77 into android-4.4
Changes in 4.4.77
fs: add a VALID_OPEN_FLAGS
fs: completely ignore unknown open flags
driver core: platform: fix race condition with driver_override
bgmac: reset & enable Ethernet core before using it
mm: fix classzone_idx underflow in shrink_zones()
tracing/kprobes: Allow to create probe with a module name starting with a digit
drm/virtio: don't leak bo on drm_gem_object_init failure
usb: dwc3: replace %p with %pK
USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
Add USB quirk for HVR-950q to avoid intermittent device resets
usb: usbip: set buffer pointers to NULL after free
usb: Fix typo in the definition of Endpoint[out]Request
mac80211_hwsim: Replace bogus hrtimer clockid
sysctl: don't print negative flag for proc_douintvec
sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
pinctrl: meson: meson8b: fix the NAND DQS pins
pinctrl: sunxi: Fix SPDIF function name for A83T
pinctrl: mxs: atomically switch mux and drive strength config
pinctrl: sh-pfc: Update info pointer after SoC-specific init
USB: serial: option: add two Longcheer device ids
USB: serial: qcserial: new Sierra Wireless EM7305 device ID
gfs2: Fix glock rhashtable rcu bug
x86/tools: Fix gcc-7 warning in relocs.c
x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
ath10k: override CE5 config for QCA9377
KEYS: Fix an error code in request_master_key()
RDMA/uverbs: Check port number supplied by user verbs cmds
mqueue: fix a use-after-free in sys_mq_notify()
tools include: Add a __fallthrough statement
tools string: Use __fallthrough in perf_atoll()
tools strfilter: Use __fallthrough
perf top: Use __fallthrough
perf intel-pt: Use __fallthrough
perf thread_map: Correctly size buffer used with dirent->dt_name
perf scripting perl: Fix compile error with some perl5 versions
perf tests: Avoid possible truncation with dirent->d_name + snprintf
perf bench numa: Avoid possible truncation when using snprintf()
perf tools: Use readdir() instead of deprecated readdir_r()
perf thread_map: Use readdir() instead of deprecated readdir_r()
perf script: Use readdir() instead of deprecated readdir_r()
perf tools: Remove duplicate const qualifier
perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
perf pmu: Fix misleadingly indented assignment (whitespace)
perf dwarf: Guard !x86_64 definitions under #ifdef else clause
perf trace: Do not process PERF_RECORD_LOST twice
perf tests: Remove wrong semicolon in while loop in CQM test
perf tools: Use readdir() instead of deprecated readdir_r() again
md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
md: fix super_offset endianness in super_1_rdev_size_change
tcp: fix tcp_mark_head_lost to check skb len before fragmenting
staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
staging: comedi: fix clean-up of comedi_class in comedi_init()
ext4: check return value of kstrtoull correctly in reserved_clusters_store
x86/mm/pat: Don't report PAT on CPUs that don't support it
saa7134: fix warm Medion 7134 EEPROM read
Linux 4.4.77
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d88270eef4b56bd7973841dd1fed387ccfa83709 upstream.
This commit fixes a corner case in tcp_mark_head_lost() which was
causing the WARN_ON(len > skb->len) in tcp_fragment() to fire.
tcp_mark_head_lost() was assuming that if a packet has
tcp_skb_pcount(skb) of N, then it's safe to fragment off a prefix of
M*mss bytes, for any M < N. But with the tricky way TCP pcounts are
maintained, this is not always true.
For example, suppose the sender sends 4 1-byte packets and have the
last 3 packet sacked. It will merge the last 3 packets in the write
queue into an skb with pcount = 3 and len = 3 bytes. If another
recovery happens after a sack reneging event, tcp_mark_head_lost()
may attempt to split the skb assuming it has more than 2*MSS bytes.
This sounds very counterintuitive, but as the commit description for
the related commit c0638c247f ("tcp: don't fragment SACKed skbs in
tcp_mark_head_lost()") notes, this is because tcp_shifted_skb()
coalesces adjacent regions of SACKed skbs, and when doing this it
preserves the sum of their packet counts in order to reflect the
real-world dynamics on the wire. The c0638c247f commit tried to
avoid problems by not fragmenting SACKed skbs, since SACKed skbs are
where the non-proportionality between pcount and skb->len/mss is known
to be possible. However, that commit did not handle the case where
during a reneging event one of these weird SACKed skbs becomes an
un-SACKed skb, which tcp_mark_head_lost() can then try to fragment.
The fix is to simply mark the entire skb lost when this happens.
This makes the recovery slightly more aggressive in such corner
cases before we detect reordering. But once we detect reordering
this code path is by-passed because FACK is disabled.
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlk30BwACgkQONu9yGCS
aT5cmhAAh3etTuZ3xRw2eGW/Y/C8L2F2CjJjmR4vp1ms8P55uZg3xA20r5jNj7Ho
pwag3WTNzHpVfKFApavfEzToqDszRAtXcvYPPW9uXUPeu8LWyBJyvmN7lSQVKgDc
M9SWsd+8EGceopaj8KHjLMxNsV2n8j2ckxNf/BL/KgiMtJlgp/1TCDKUVS1k0cA7
CsuxDhxpRYpQofsIVww1hdrwCxVuntAY7u+/3B19ozXGFSRe/h5GO6xYRcG8pqfT
lvIgD6btdQJwI55QoSpJCpL96a534zc+akO0dtyaMJ3Q8UWQXD3JF8ZxMiPPrAe8
CLW390ATranIafmLi9g9DU1vQeEPNFXpeiYfxe65YL7igeAj/uPtVzKp0MvRcKG7
IBVNxbtsTQa73ig7gKSJ323CnpEfrr/XG73JNVtUQLxHa2poY7SUonRI587MFW2T
sONl9Pk3TxRC7Rc45si4RFsIj4jEF8ubUDXOPb2CrmDMB7MrM0PHfOW9lLCP92FD
pn0fM4vwNvm2ILsblqNcBumgeIBQ8ld2TBTbhRbh2FK4Rzxd2TSlWh4KqkcWcXCt
Lz8conU06AwTvDob1xoht3m6Gj32maopKZKGn5/Wq0YlfjOB/70CXOvPO3ChhKTh
QGNgA66bYdm+xn55wf7ty7Bq8yO6kcSNPQCXOb9S61nfCLA4KHM=
=U7IH
-----END PGP SIGNATURE-----
Merge 4.4.71 into android-4.4
Changes in 4.4.71
sparc: Fix -Wstringop-overflow warning
dccp/tcp: do not inherit mc_list from parent
ipv6/dccp: do not inherit ipv6_mc_list from parent
s390/qeth: handle sysfs error during initialization
s390/qeth: unbreak OSM and OSN support
s390/qeth: avoid null pointer dereference on OSN
tcp: avoid fragmenting peculiar skbs in SACK
sctp: fix src address selection if using secondary addresses for ipv6
sctp: do not inherit ipv6_{mc|ac|fl}_list from parent
tcp: eliminate negative reordering in tcp_clean_rtx_queue
net: Improve handling of failures on link and route dumps
ipv6: Prevent overrun when parsing v6 header options
ipv6: Check ip6_find_1stfragopt() return value properly.
bridge: netlink: check vlan_default_pvid range
qmi_wwan: add another Lenovo EM74xx device ID
bridge: start hello_timer when enabling KERNEL_STP in br_stp_start
ipv6: fix out of bound writes in __ip6_append_data()
be2net: Fix offload features for Q-in-Q packets
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
tcp: avoid fastopen API to be used on AF_UNSPEC
sctp: fix ICMP processing if skb is non-linear
ipv4: add reference counting to metrics
netem: fix skb_orphan_partial()
net: phy: marvell: Limit errata to 88m1101
vlan: Fix tcp checksum offloads in Q-in-Q vlans
i2c: i2c-tiny-usb: fix buffer not being DMA capable
mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read
HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference
scsi: mpt3sas: Force request partial completion alignment
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/radeon: Unbreak HPD handling for r600+
pcmcia: remove left-over %Z format
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
slub/memcg: cure the brainless abuse of sysfs attributes
drm/gma500/psb: Actually use VBT mode when it is found
mm/migrate: fix refcount handling when !hugepage_migration_supported()
mlock: fix mlock count can not decrease in race condition
xfs: Fix missed holes in SEEK_HOLE implementation
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs: fix over-copying of getbmap parameters from userspace
xfs: handle array index overrun in xfs_dir2_leaf_readbuf()
xfs: prevent multi-fsb dir readahead from reading random blocks
xfs: fix up quotacheck buffer list error handling
xfs: support ability to wait on new inodes
xfs: update ag iterator to support wait on new inodes
xfs: wait on new inodes during quotaoff dquot release
xfs: fix indlen accounting error on partial delalloc conversion
xfs: bad assertion for delalloc an extent that start at i_size
xfs: fix unaligned access in xfs_btree_visit_blocks
xfs: in _attrlist_by_handle, copy the cursor back to userspace
xfs: only return -errno or success from attr ->put_listent
Linux 4.4.71
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit bafbb9c73241760023d8981191ddd30bb1c6dbac ]
tcp_ack() can call tcp_fragment() which may dededuct the
value tp->fackets_out when MSS changes. When prior_fackets
is larger than tp->fackets_out, tcp_clean_rtx_queue() can
invoke tcp_update_reordering() with negative values. This
results in absurd tp->reodering values higher than
sysctl_tcp_max_reordering.
Note that tcp_update_reordering indeeds sets tp->reordering
to min(sysctl_tcp_max_reordering, metric), but because
the comparison is signed, a negative metric always wins.
Fixes: c7caf8d3ed ("[TCP]: Fix reord detection due to snd_una covered holes")
Reported-by: Rebecca Isaacs <risaacs@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b451e5d24ba6687c6f0e7319c727a709a1846c06 ]
This patch fixes a bug in splitting an SKB during SACK
processing. Specifically if an skb contains multiple
packets and is only partially sacked in the higher sequences,
tcp_match_sack_to_skb() splits the skb and marks the second fragment
as SACKed.
The current code further attempts rounding up the first fragment
to MSS boundaries. But it misses a boundary condition when the
rounded-up fragment size (pkt_len) is exactly skb size. Spliting
such an skb is pointless and causses a kernel warning and aborts
the SACK processing. This patch universally checks such over-split
before calling tcp_fragment to prevent these unnecessary warnings.
Fixes: adb92db857 ("tcp: Make SACK code to split only at mss boundaries")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAljctYYACgkQONu9yGCS
aT6MbxAAwyGobI5sOr63yX5Myji1jf17vlY2h5dXet+8lu/csbFKmqHxaTLNwGIw
7u6V3AJ4zWdX8Q22dcVC98oySxcLxUhv+Rv/Dbonr3CYM00wNIex2wzON8f77eJQ
CBGJRNJR4/VG6opbVI/qp0t/c2oFiqHJXPldm3/Ru7jcBrLo5UHDWDY6cDhrj/Tg
F1maCBMAu1qW0z9KTnrQDvHjPHXmKfCviGzXpFTSVBQrh1s1bJkZkTqcY9eZNa/u
AXhHek5ZLFxlhkO105leR0YtXADbopiJ5c4EgXCASzNQ92/6IKsl21eOhHgOU9OA
YUCYftwKVMcxXGB6QFbdefLVtnCjUtlDa9+70oW1/4Ecee5FUBzNjVWVHOtYEsTY
pA3DqQI+U7EBCuIXsTtV0DiRWhHKq5uS1aphXZwnq/8qc2A3PD86JV/MBK5sWZfB
2V1N7xkitLFFCR6vMFLuusM8Np7kJ3zaAxQOd3IRc72iiNLkbNjdfJcAQ+E9b/Zx
5tpcthOl2RKhlOHHVKmYIioar8+RkZgWWl64+RTt6M1KjvHs07lPdCI+4cW8ELLM
/FUeRNTLmOiUv4dEPj5INYukEcLuCNp4fIo9lq8HrDceXbJXwiMFtHHlo4SQ/ubm
9v5iYdEmGet8jYPrfa5LDtD7G8K//k08nElVmI6N/4fNxRC2GcY=
=TI1/
-----END PGP SIGNATURE-----
Merge 4.4.48 into android-4.4
Changes in 4.4.48:
net/openvswitch: Set the ipv6 source tunnel key address attribute correctly
net: bcmgenet: Do not suspend PHY if Wake-on-LAN is enabled
net: properly release sk_frag.page
amd-xgbe: Fix jumbo MTU processing on newer hardware
net: unix: properly re-increment inflight counter of GC discarded candidates
net/mlx5: Increase number of max QPs in default profile
net/mlx5e: Count LRO packets correctly
net: bcmgenet: remove bcmgenet_internal_phy_setup()
ipv4: provide stronger user input validation in nl_fib_input()
socket, bpf: fix sk_filter use after free in sk_clone_lock
tcp: initialize icsk_ack.lrcvtime at session start time
Input: elan_i2c - add ASUS EeeBook X205TA special touchpad fw
Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000
Input: iforce - validate number of endpoints before using them
Input: ims-pcu - validate number of endpoints before using them
Input: hanwang - validate number of endpoints before using them
Input: yealink - validate number of endpoints before using them
Input: cm109 - validate number of endpoints before using them
Input: kbtab - validate number of endpoints before using them
Input: sur40 - validate number of endpoints before using them
ALSA: seq: Fix racy cell insertions during snd_seq_pool_done()
ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call
ALSA: hda - Adding a group of pin definition to fix headset problem
USB: serial: option: add Quectel UC15, UC20, EC21, and EC25 modems
USB: serial: qcserial: add Dell DW5811e
ACM gadget: fix endianness in notifications
usb: gadget: f_uvc: Fix SuperSpeed companion descriptor's wBytesPerInterval
usb-core: Add LINEAR_FRAME_INTR_BINTERVAL USB quirk
USB: uss720: fix NULL-deref at probe
USB: lvtest: fix NULL-deref at probe
USB: idmouse: fix NULL-deref at probe
USB: wusbcore: fix NULL-deref at probe
usb: musb: cppi41: don't check early-TX-interrupt for Isoch transfer
usb: hub: Fix crash after failure to read BOS descriptor
uwb: i1480-dfu: fix NULL-deref at probe
uwb: hwa-rc: fix NULL-deref at probe
mmc: ushc: fix NULL-deref at probe
iio: adc: ti_am335x_adc: fix fifo overrun recovery
iio: hid-sensor-trigger: Change get poll value function order to avoid sensor properties losing after resume from S3
parport: fix attempt to write duplicate procfiles
ext4: mark inode dirty after converting inline directory
mmc: sdhci: Do not disable interrupts while waiting for clock
xen/acpi: upload PM state from init-domain to Xen
iommu/vt-d: Fix NULL pointer dereference in device_to_iommu
ARM: at91: pm: cpu_idle: switch DDR to power-down mode
ARM: dts: at91: sama5d2: add dma properties to UART nodes
cpufreq: Restore policy min/max limits on CPU online
raid10: increment write counter after bio is split
libceph: don't set weight to IN when OSD is destroyed
xfs: don't allow di_size with high bit set
xfs: fix up xfs_swap_extent_forks inline extent handling
nl80211: fix dumpit error path RTNL deadlocks
USB: usbtmc: add missing endpoint sanity check
xfs: clear _XBF_PAGES from buffers when readahead page
xen: do not re-use pirq number cached in pci device msi msg data
igb: Workaround for igb i210 firmware issue
igb: add i211 to i210 PHY workaround
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
PCI: Separate VF BAR updates from standard BAR updates
PCI: Remove pci_resource_bar() and pci_iov_resource_bar()
PCI: Add comments about ROM BAR updating
PCI: Decouple IORESOURCE_ROM_ENABLE and PCI_ROM_ADDRESS_ENABLE
PCI: Don't update VF BARs while VF memory space is enabled
PCI: Update BARs using property bits appropriate for type
PCI: Ignore BAR updates on virtual functions
PCI: Do any VF BAR updates before enabling the BARs
vfio/spapr: Postpone allocation of userspace version of TCE table
block: allow WRITE_SAME commands with the SG_IO ioctl
s390/zcrypt: Introduce CEX6 toleration
uvcvideo: uvc_scan_fallback() for webcams with broken chain
ACPI / blacklist: add _REV quirks for Dell Precision 5520 and 3520
ACPI / blacklist: Make Dell Latitude 3350 ethernet work
serial: 8250_pci: Detach low-level driver during PCI error recovery
fbcon: Fix vc attr at deinit
crypto: algif_hash - avoid zero-sized array
Linux 4.4.58
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 15bb7745e94a665caf42bfaabf0ce062845b533b ]
icsk_ack.lrcvtime has a 0 value at socket creation time.
tcpi_last_data_recv can have bogus value if no payload is ever received.
This patch initializes icsk_ack.lrcvtime for active sessions
in tcp_finish_connect(), and for passive sessions in
tcp_create_openreq_child()
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 019b1c9fe32a2a32c1153e31375f87ec3e591273 ]
If DBGUNDO() is enabled (FASTRETRANS_DEBUG > 1), a compile
error will happen, since inet6_sk(sk)->daddr became sk->sk_v6_daddr
Fixes: efe4208f47 ("ipv6: make lookups simpler and faster")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 083ae308280d13d187512b9babe3454342a7987e ]
The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:
commit f2b2c582e8 ("tcp: mitigate ACK loops for connections as tcp_sock")
And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.
Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.
It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758 ]
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The default initial rwnd is hardcoded to 10.
Now we allow it to be controlled via
/proc/sys/net/ipv4/tcp_default_init_rwnd
which limits the values from 3 to 100
This is somewhat needed because ipv6 routes are
autoconfigured by the kernel.
See "An Argument for Increasing TCP's Initial Congestion Window"
in https://developers.google.com/speed/articles/tcp_initcwnd_paper.pdf
Change-Id: I386b2a9d62de0ebe05c1ebe1b4bd91b314af5c54
Signed-off-by: JP Abgrall <jpa@google.com>
Conflicts:
net/ipv4/sysctl_net_ipv4.c
net/ipv4/tcp_input.c
Patch 3759824da8 ("tcp: PRR uses CRB mode by default and SS mode
conditionally") introduced a bug that cwnd may become 0 when both
inflight and sndcnt are 0 (cwnd = inflight + sndcnt). This may lead
to a div-by-zero if the connection starts another cwnd reduction
phase by setting tp->prior_cwnd to the current cwnd (0) in
tcp_init_cwnd_reduction().
To prevent this we skip PRR operation when nothing is acked or
sacked. Then cwnd must be positive in all cases as long as ssthresh
is positive:
1) The proportional reduction mode
inflight > ssthresh > 0
2) The reduction bound mode
a) inflight == ssthresh > 0
b) inflight < ssthresh
sndcnt > 0 since newly_acked_sacked > 0 and inflight < ssthresh
Therefore in all cases inflight and sndcnt can not both be 0.
We check invalid tp->prior_cwnd to avoid potential div0 bugs.
In reality this bug is triggered only with a sequence of less common
events. For example, the connection is terminating an ECN-triggered
cwnd reduction with an inflight 0, then it receives reordered/old
ACKs or DSACKs from prior transmission (which acks nothing). Or the
connection is in fast recovery stage that marks everything lost,
but fails to retransmit due to local issues, then receives data
packets from other end which acks nothing.
Fixes: 3759824da8 ("tcp: PRR uses CRB mode by default and SS mode conditionally")
Reported-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dmitry provided a syzkaller (http://github.com/google/syzkaller)
generated program that triggers the WARNING at
net/ipv4/tcp.c:1729 in tcp_recvmsg() :
WARN_ON(tp->copied_seq != tp->rcv_nxt &&
!(flags & (MSG_PEEK | MSG_TRUNC)));
His program is specifically attempting a Cross SYN TCP exchange,
that we support (for the pleasure of hackers ?), but it looks we
lack proper tcp->copied_seq initialization.
Thanks again Dmitry for your report and testings.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
tcp_send_rcvq() is used for re-injecting data into tcp receive queue.
Problems :
- No check against size is performed, allowed user to fool kernel in
attempting very large memory allocations, eventually triggering
OOM when memory is fragmented.
- In case of fault during the copy we do not return correct errno.
Lets use alloc_skb_with_frags() to cook optimal skbs.
Fixes: 292e8d8c85 ("tcp: Move rcvq sending to tcp_input.c")
Fixes: c0e88ff0f2 ("tcp: Repair socket queues")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements the second half of RACK that uses the the most
recent transmit time among all delivered packets to detect losses.
tcp_rack_mark_lost() is called upon receiving a dubious ACK.
It then checks if an not-yet-sacked packet was sent at least
"reo_wnd" prior to the sent time of the most recently delivered.
If so the packet is deemed lost.
The "reo_wnd" reordering window starts with 1msec for fast loss
detection and changes to min-RTT/4 when reordering is observed.
We found 1msec accommodates well on tiny degree of reordering
(<3 pkts) on faster links. We use min-RTT instead of SRTT because
reordering is more of a path property but SRTT can be inflated by
self-inflicated congestion. The factor of 4 is borrowed from the
delayed early retransmit and seems to work reasonably well.
Since RACK is still experimental, it is now used as a supplemental
loss detection on top of existing algorithms. It is only effective
after the fast recovery starts or after the timeout occurs. The
fast recovery is still triggered by FACK and/or dupack threshold
instead of RACK.
We introduce a new sysctl net.ipv4.tcp_recovery for future
experiments of loss recoveries. For now RACK can be disabled by
setting it to 0.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is the first half of the RACK loss recovery.
RACK loss recovery uses the notion of time instead
of packet sequence (FACK) or counts (dupthresh). It's inspired by the
previous FACK heuristic in tcp_mark_lost_retrans(): when a limited
transmit (new data packet) is sacked, then current retransmitted
sequence below the newly sacked sequence must been lost,
since at least one round trip time has elapsed.
But it has several limitations:
1) can't detect tail drops since it depends on limited transmit
2) is disabled upon reordering (assumes no reordering)
3) only enabled in fast recovery ut not timeout recovery
RACK (Recently ACK) addresses these limitations with the notion
of time instead: a packet P1 is lost if a later packet P2 is s/acked,
as at least one round trip has passed.
Since RACK cares about the time sequence instead of the data sequence
of packets, it can detect tail drops when later retransmission is
s/acked while FACK or dupthresh can't. For reordering RACK uses a
dynamically adjusted reordering window ("reo_wnd") to reduce false
positives on ever (small) degree of reordering.
This patch implements tcp_advanced_rack() which tracks the
most recent transmission time among the packets that have been
delivered (ACKed or SACKed) in tp->rack.mstamp. This timestamp
is the key to determine which packet has been lost.
Consider an example that the sender sends six packets:
T1: P1 (lost)
T2: P2
T3: P3
T4: P4
T100: sack of P2. rack.mstamp = T2
T101: retransmit P1
T102: sack of P2,P3,P4. rack.mstamp = T4
T205: ACK of P4 since the hole is repaired. rack.mstamp = T101
We need to be careful about spurious retransmission because it may
falsely advance tp->rack.mstamp by an RTT or an RTO, causing RACK
to falsely mark all packets lost, just like a spurious timeout.
We identify spurious retransmission by the ACK's TS echo value.
If TS option is not applicable but the retransmission is acknowledged
less than min-RTT ago, it is likely to be spurious. We refrain from
using the transmission time of these spurious retransmissions.
The second half is implemented in the next patch that marks packet
lost using RACK timestamp.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
a helper to prepare the main RACK patch
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the existing lost retransmit detection because RACK subsumes
it completely. This also stops the overloading the ack_seq field of
the skb control block.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Kathleen Nichols' algorithm for tracking the minimum RTT of a
data stream over some measurement window. It uses constant space
and constant time per update. Yet it almost always delivers
the same minimum as an implementation that has to keep all
the data in the window. The measurement window is tunable via
sysctl.net.ipv4.tcp_min_rtt_wlen with a default value of 5 minutes.
The algorithm keeps track of the best, 2nd best & 3rd best min
values, maintaining an invariant that the measurement time of
the n'th best >= n-1'th best. It also makes sure that the three
values are widely separated in the time window since that bounds
the worse case error when that data is monotonically increasing
over the window.
Upon getting a new min, we can forget everything earlier because
it has no value - the new min is less than everything else in the
window by definition and it's the most recent. So we restart fresh
on every new min and overwrites the 2nd & 3rd choices. The same
property holds for the 2nd & 3rd best.
Therefore we have to maintain two invariants to maximize the
information in the samples, one on values (1st.v <= 2nd.v <=
3rd.v) and the other on times (now-win <=1st.t <= 2nd.t <= 3rd.t <=
now). These invariants determine the structure of the code
The RTT input to the windowed filter is the minimum RTT measured
from ACK or SACK, or as the last resort from TCP timestamps.
The accessor tcp_min_rtt() returns the minimum RTT seen in the
window. ~0U indicates it is not available. The minimum is 1usec
even if the true RTT is below that.
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently ca_seq_rtt_us does not use Kern's check. Fix that by
checking if any packet acked is a retransmit, for both RTT used
for RTT estimation and congestion control.
Fixes: 5b08e47ca ("tcp: prefer packet timing to TS-ECR for RTT")
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
At the time of commit fff3269907 ("tcp: reflect SYN queue_mapping into
SYNACK packets") we had little ways to cope with SYN floods.
We no longer need to reflect incoming skb queue mappings, and instead
can pick a TX queue based on cpu cooking the SYNACK, with normal XPS
affinities.
Note that all SYNACK retransmits were picking TX queue 0, this no longer
is a win given that SYNACK rtx are now distributed on all cpus.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
inet_reqsk_alloc() is used to allocate a temporary request
in order to generate a SYNACK with a cookie. Then later,
syncookie validation also uses a temporary request.
These paths already took a reference on listener refcount,
we can avoid a couple of atomic operations.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are multiple races that need fixes :
1) skb_get() + queue skb + kfree_skb() is racy
An accept() can be done on another cpu, data consumed immediately.
tcp_recvmsg() uses __kfree_skb() as it is assumed all skb found in
socket receive queue are private.
Then the kfree_skb() in tcp_rcv_state_process() uses an already freed skb
2) tcp_reqsk_record_syn() needs to be done before tcp_try_fastopen()
for the same reasons.
3) We want to send the SYNACK before queueing child into accept queue,
otherwise we might reintroduce the ooo issue fixed in
commit 7c85af8810 ("tcp: avoid reorders for TFO passive connections")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If a listen backlog is very big (to avoid syncookies), then
the listener sk->sk_wmem_alloc is the main source of false
sharing, as we need to touch it twice per SYNACK re-transmit
and TX completion.
(One SYN packet takes listener lock once, but up to 6 SYNACK
are generated)
By attaching the skb to the request socket, we remove this
source of contention.
Tested:
listen(fd, 10485760); // single listener (no SO_REUSEPORT)
16 RX/TX queue NIC
Sustain a SYNFLOOD attack of ~320,000 SYN per second,
Sending ~1,400,000 SYNACK per second.
Perf profiles now show listener spinlock being next bottleneck.
20.29% [kernel] [k] queued_spin_lock_slowpath
10.06% [kernel] [k] __inet_lookup_established
5.12% [kernel] [k] reqsk_timer_handler
3.22% [kernel] [k] get_next_timer_interrupt
3.00% [kernel] [k] tcp_make_synack
2.77% [kernel] [k] ipt_do_table
2.70% [kernel] [k] run_timer_softirq
2.50% [kernel] [k] ip_finish_output
2.04% [kernel] [k] cascade
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In this patch, we insert request sockets into TCP/DCCP
regular ehash table (where ESTABLISHED and TIMEWAIT sockets
are) instead of using the per listener hash table.
ACK packets find SYN_RECV pseudo sockets without having
to find and lock the listener.
In nominal conditions, this halves pressure on listener lock.
Note that this will allow for SO_REUSEPORT refinements,
so that we can select a listener using cpu/numa affinities instead
of the prior 'consistent hash', since only SYN packets will
apply this selection logic.
We will shrink listen_sock in the following patch to ease
code review.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
long term plan is to remove struct listen_sock when its hash
table is no longer there.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>