We clean up the logical end points only for the un-registering device
in bridge mode. However, the other physical end point's local end
point still points to the the un-registered device.
This may lead up to a crash if one of the physical devices in bridge
mode is un-registered. Fix this by unsetting the local endpoint.
It is still possible that packets in a different context across cores
might try to access this data. This usually manifests as packets
requesting a very large headroom. Handle this by dropping these stale
skb's.
CRs-Fixed: 1098513
Change-Id: I1ba4d877a6ed3eca66946fe056938f0927bcd9a5
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
rmnet_data assigns device name by the order they are created.
This causes problems which multiple processes are trying to
create devices and leads to random device names.
Assign device name as specified by user.
CRs-Fixed: 2018785
Change-Id: Iab8e053c6ccacbeedaa7763e760d0c12e756b5d0
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
rmnet_data netlink handler currently does not check for the
incoming process pid and instead just loops back the pid.
A malicious root user could potentially send a message with
source pid 0 and this could cause rmnet_data to loop the message
back till an out of memory situation occurs.
rmnet_data also does not check for the message length of the
incoming netlink messages and instead casts the netlink message
without checking for the boundary.
Fix these two scenarios by adding the pid and message length checks
respectively.
Bug: 31252965
CRs-Fixed: 1098801
Change-Id: I172c1a7112e67e82959b397af7ddfd963d819bdc
Signed-off-by: Conner Huff <chuff@codeaurora.org>
Port single flow dynamic GRO changes into rmnet driver. This
feature uses configurable parameters that determine when GRO
should be flushed based on both timing and receive byte count.
Although the parameters are initially set, they dynamically
fluctuate based on incoming traffic.
CRs-Fixed: 1113125
Change-Id: I7469118d3108e0c9f71c4f50efd8ceebfb324a63
Signed-off-by: Conner Huff <chuff@codeaurora.org>
Generic receive offload is enabled by default on a net_device
whenever it is registered. In case of rmnet_data, a physical
device could theoretically pass cloned frames and rmnet_data
would pass on these cloned frames to GRO framework. This
would cause memory corruption or crashes since GRO modifies
the skb shared info which is shared across clones.
While cloned frames are usually not sent to rmnet_data, this
configuration actually requires userspace intervention. If
userspace does not makes appropriate calls to kernel, we will
run into crashes. Handle this scenario by disabling GRO by
default. Userspace will need to explicitly enable GRO if
required to do so.
CRs-Fixed: 1097389
Change-Id: I40d5ce940f4722b128c0138c07232c33d0b74e14
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
rmnet_data does not free skb's when phy netdev
fails to xmit a qmap control packet cauing memory leak.
To avoid this we queue the packet back into device queue
if start_xmit fails.
Change-Id: Id7efdd10ac76c989c086cb5f934a4b666b7c5939
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
After commit 605ad7f184 ("tcp: refine TSO autosizing"), kernel
throttles uplink TCP data in case there is not sufficient amount
of socket buffer available due to delayed release of buffers
through TX completions in the physical net device.
Work around this by orphaning the socket buffer. This makes the
kernel assume that more packets can be sent in this scenario.
Out of band signaling and flow controlling at qdisc / HTB layer
should guarantee no issue for flow control.
Throughput difference for IPv4 TCP UL -
Before change : 143Mbps
After change : 146Mbps
CRs-Fixed: 1088104
Change-Id: I251ed7938c29e08954d4c81d3041cb235a39d266
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Some creators of skb's may fail to set sufficient headroom or
tailroom to stamp the MAP header or tailroom respectively when
transmitting packets to rmnet_data.
In these cases, rmnet_data was failing to free these packets which
lead to out of memory scenarios in low memory targets. Free these
skb's in these scenarios.
CRs-Fixed: 1086873
Change-Id: I8bd2eb93393766bf9c301766e525354770577e0a
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Using %pK instead of %p to hide kernel pointers
based on kptr_restrict.
Change-Id: I065cff2a9e092d74d0e8c35da6551fab3805e83e
Signed-off-by: Ashwanth Goli <ashwanth@codeaurora.org>
rmnet_data currently frees incoming memory when de-aggregating
large incoming aggregated packets. This may introduce additional
overhead in the memory allocator. Add a handler as part of the
rx_handler_data to recycle the skb's. This handler needs to
defined within the specific transport driver. If the recycle
handler is not implemented by the transport, rmnet_data will
free the skb (default behavior).
CRs-Fixed: 1048396
Change-Id: I14b929d78c87ced26cff3c32876d2eec5de33350
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Following stack trace was seen while doing a data transfer
Unable to handle kernel paging request at virtual address
6b6b6b6b6b6b6ef3
pgd = ffffffc01c7c5000 [6b6b6b6b6b6b6ef3] *pgd=0000000000000000,
*pud=0000000000000000
Internal error: Oops: 96000004 [#1] PREEMPT SMP
Call trace:
[<ffffffc000f669ac>] rmnet_map_command+0x19c/0x238
[<ffffffc000f6504c>] _rmnet_map_ingress_handler+0x3c/0x264
[<ffffffc000f65500>] rmnet_ingress_handler+0x1b4/0x3a4
[<ffffffc000f65704>] rmnet_rx_handler+0x14/0x2c
[<ffffffc000d8b5ac>] __netif_receive_skb_core+0x514/0x71c
[<ffffffc000d8c270>] __netif_receive_skb+0x30/0x98
[<ffffffc000d8d3bc>] process_backlog+0xb0/0x184
[<ffffffc000d8d1f8>] net_rx_action+0xfc/0x210
[<ffffffc00016a2e0>] __do_softirq+0x1c0/0x39c
[<ffffffc00016a824>] irq_exit+0x88/0xf4
[<ffffffc0001565e8>] handle_IPI+0x340/0x4b4
[<ffffffc0001455e8>] gic_handle_irq+0xc4/0xec
This is because an invalid MAP command was received and was freed
and rmnet_data was trying to send the freed skb as an ACK. Fix this
by returning if an invalid MAP command is detected.
CRs-Fixed: 1019188
Change-Id: Ib52e6551ac67215dab2bc5770ddcf037568f8b77
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Hardware does not require pad bytes in egress packets when uplink
aggregation is not enabled.
CRs-Fixed: 1002396
Change-Id: I86459b7bc18da16b66f6c701ac324f28be8848fa
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Print format %p displays the kernel address while bypassing the
kptr_restrict sysctl settings.
Change the print format for addresses from %p to %pK. If
kptr_restrict is enabled, addresses are printed as zeroes. To view
the actual addresses, disable kptr_restrict by -
echo 0 > /proc/sys/kernel/kptr_restrict
CRs-Fixed: 987054
Change-Id: Icb8ef62c8263ae7b17d6883c0e6a1c93d2156a6a
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
The current GRO implementation relies on NET_RX to complete
processing or the max possible TCP segment size for it to flush the
GRO coalesced packets. This leads to coalescing a large number of
packets which translates to very few ACKs. Since the number of
ACKs are very few and delayed during the slow start phase (stretch
ACK's), we see that the initial throughput ramp up is slow compared
to normal RFC TCP where we send an ACK per two packets. Note that
there is no difference between GRO and non GRO after the max window
size is reached.
Add a mechanism within rmnet_data to force the flush of packets
every 10 micro seconds (experimentally determined) by default. This
is controlled by the module parameter "gro_flush_time" and can be
configured to any value less than a second. To disable this feature,
set this entry to 0.
This reduces the coalesce of packets which translates to increased
number of ACK's compared to normal GRO but lesser ACK's compared
to NO GRO. There is no increase in power for a day to day use case.
Note that this optimization is specific to TCP GRO path only.
Some useful stats below for TCP DL at 400Mbps -
|TCP GRO Default | TCP GRO flush timer 10us
==================================================================
iperf 1st second tput | 300Mbps | 330Mbps
coalesce ratio | 15 | 4.5
CRs-Fixed: 961186
Change-Id: Ie8d76c493d61f3f4c256dbaa0378b22a361eed49
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Update copyright separately for this file so that the subsequent
change can be propagated without merge conflicts.
CRs-Fixed: 991792
Change-Id: Ib1af02d00e438f48619eacee291b1875671978e1
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
This patch addresses two issues
a) The original GSO patch had a condition to check
for padding on a non linear skb and the egress
device, but the device name was incorrect leading
to the check always failing. This patch eliminates
the need for the check completely.
b) It is possible that the last frag in a GSO frame
would not be size aligned to a word boundary. In
this and all cases involving a non linear skb
frame we should skip padding. The driver can
expect a non linear packet when Scatter Gather
feature is enabled on the device. This patch
does just that.
Change-Id: Ifea5b1ae26b154bb047044e4bc3ad579d0da7f6d
CRs-fixed: 990751
Acked-by: Ashwanth Goli <ashwanth@qti.qualcomm.com>
Signed-off-by: Sharat Masetty <smasetty@codeaurora.org>
Change the log level from high to medium when userspace calls an
unknown IOCTL on an rmnet data device. This log message is not
very useful and often causes log spam in the serial device
output which may occasionally lead to watchdog timeouts.
CRs-fixed: 1000453
Change-Id: I50e038d838eded30ee8304fefb2c13328eaf9683
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Specifications state that the MAP packet length in the MAP header do
not account for the number of bytes of packet trailer from DL checksum
offload. Current implementation takes this into account for data
packets but not for command packets. As a result, the additional
8 bytes were sent to hardware as part of the MAP ACK's and were
subsequently dropped.
Fix this by truncating the extra bytes of the DL checksum trailer from
the MAP ACK.
CRs-Fixed: 961336
Change-Id: I175dde695e7ee09d16c99fb71898d635c7a812ab
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Set the protocol as ETH_P_MAP when sending a MAP ACK as a response
to a MAP command packet. Physical device drivers may drop these
packets if the ETH_P_MAP protocol is not present on the skb.
CRs-Fixed: 961336
Change-Id: I05ca1350fb16747b0300106654a74d3318389a30
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Commit 32dce968dd ("ipv6: Allow for partial checksums on non-ufo
packets) adds support for checksum offload in hardware for single
UDP packets. As a result, the custom NETIF_F_IPV6_UDP_CSUM feature
is no longer needed.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Commit 28207b510dca ("net: rmnet_data: Add NAPI context for rmnet_data
devices") added a NAPI struct per rmnet_data device. This was to
ensure that the NAPI struct is always available even if there was a
hotplug. However, this seems to be leading to some races where the
NAPI struct is accessed concurrently across cores.
The race here is between napi_gro_receive on one core with
napi_complete running on the other accessing the same NAPI struct.
If napi_gro_receive runs slightly earlier, napi_complete would see
that the napi->gro_list would be non NULL in __napi_complete even
though it had cleared earlier and would lead to a BUG.
If napi_complete runs slightly earlier, napi_gro_receive would
dereference a NULL pointer even though it had assigned an skb to
napi->gro_list.
Fix this by using the per cpu backlog struct as the NAPI struct for
queuing packets to GRO engine. Access across cores would not be a
problem in case of hotplug as they would use core specific structures.
CRs-Fixed: 966095
Change-Id: I831df5b93cc6ee77355f2e98af89efcffe825bd8
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
This patch enables hardware device features such as
NETIF_F_SG NETIF_F_GSO NETIF_F_GSO_UDP_TUNNEL
NETIF_F_GSO_UDP_TUNNEL_CSUM. This patch also ensures
to skip padding(to align length to word boundaries)
for outgoing non linear skbs.
This patch also adds a new ioctl interface
RMNET_IOCTL_GET_SG_SUPPORT to query the physical
network devices for Scatter Gather support.
Change-Id: I9788d75c249ab2dac5b598dad131c0692ed84e4d
Acked-by: Abhishek Chauhan <abchauha@qti.qualcomm.com>
Signed-off-by: Ravinder Konka <rkonka@codeaurora.org>
Checksum offload is supported in MAPv3/v4. Add the device feature
to indicate this support.
Change-Id: I89caf6d9029cd483759404542681621909de70a3
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Hardware does not handle pad bytes in egress packets when uplink
aggregation is not enabled. This is due to the translation support
added on hardware for MAPv4.
Change-Id: Ic246a4548561450035d5252221032d72eff44518
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Add the MAPv4 ingress and data format handlers. MAPv4 requires the
checksum for uplink TCP and UDP packets to be 1's complemented
before passing the packet onto the physical netdevice.
This workaround is needed due to failures seen in hardware while
processing translated packets.
Change-Id: Ib79382fa7e8b2bd0c1adbe68b8de75f1602df10b
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Commit 1a37e412a0 ("net: Use 16bits for *_headers fields of
struct skbuff") changes the data type of mac_header from pointer to
__u16. As a result, it is invalid to assign data to mac_header
in architectures where NET_SKBUFF_DATA_USES_OFFSET is 0.
Change-Id: I97ce04e3747983839d3908ca8111fd588c8e43a2
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
While running a UDP test using gro handler, there was a slight
increase in MIPS observed. This is because all UDP packets passed
through napi_gro_receive need to be checked if it is an encapsulated
UDP packet which could actually be coalesced and processed further.
As of now, clients of rmnet_data do not support this and we can
save MIPS on this by passing only TCP packets through the GRO
handler. All non TCP packets will directly be passed directly to
the network stack. This also helps us to avoid an atomic operation
unnecessarily when GRO path is not exercised.
There is 1% savings in MIPS observed when using a single stream
UDP DL connection at 300Mbps.
Change-Id: Ie601000de21afeacfca93f23117aeb0f7cefda98
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
This was originally added for debug and is no longer needed.
Change-Id: I9780e86ff0db31eaa13f2470c19f1424db3311b8
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
This is needed so that we can pass the virtual netdevice NAPI struct
to napi_gro_receive rather than relying on the current CPU NAPI
context which can cause issues due to CPU hotplug.
Change-Id: I41977c3a3a51212aa2fe092427b0ca924045b477
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
It is not guaranteed that MAP commands will be in their own frame. Some
commands may be embedded in a large aggregated frame of datagrams. This
patch forces MAP command processing to occur after the deaggregation
routing. This has the side-effect of incurring a malloc/memcpy latency
penalty for each MAP command. This also introduces a side-effect where
every packet after de-aggregation will need to be inspected for the
cmd bit.
Change-Id: Icc5ad1e7d622a35883f858c2c132c9679f43c79e
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Userspace applications report that SIOCETHTOOL IOCTL fails for
option ETHTOOL_STXCSUM even though rmnet_data devices with
prefix were created. This is because commit I183ba7
("net: rmnet_data: adding support to GRO") replaced the dev
features with NETIF_F_GRO only.
Fix this this by specifying the expected set of dev features.
CRs-fixed: 860895
Change-Id: Ic0935718a3a3f7bab5ea70d81c7dff99ebf0a7fc
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
skb's passed to the network stack through napi_gro_receive can
be freed before it is used in the gro tracepoint to print the
ingress device.
Fix this by removing the skb as an argument. We could copy the
skb->dev to a string before passing it to napi_gro_receive and then
use it as an argument for the tracepoint but that would mean
unnecessary code in hotpath for debugging purposes.
Change-Id: I82055dbc9b84f405e8f63f7b2eeb7c80e5ae0c3a
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
The initial implementation of MAP command would grab the global
tx lock on the physical transport interface directly. It seems grabbing
the lock on its own is not enough to serialize access to the underlying
device's ndo_start_xmit() function. In addition to grabbing the lock
the transmit queue must be placed in frozen state. The fix is to use the
standard netif_tx_lock/unlock APIs as provided by the core framework
rather than manually grabbing the spinlocks.
Additionally, the null check has been optimized with an unlikely().
A new LOGD() message was added for further debugging of MAP ACK
sending.
CRs-Fixed: 841068
Change-Id: Ia76396f3075f25cdf6bf7278bba0ec78433b2934
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Remove all references of watchdog_timeo since rmnet_data does not
support recovering from tx timeouts.
Change-Id: I7d92c430e502ac53f8dee2315b64b58cd2ab7745
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Specifications state that the MAP packet length in the MAP header do
not account for the number of bytes of packet trailer from DL checksum
offload. Current implementation does not take this into account when
MAP aggregation is enabled.
Fix this by accounting for the extra bytes of the DL checksum trailer
if DL checksum offload is enabled when computing packet length during
MAP deaggregation
Change-Id: I9c10bb9726413b1f14f94210dbe194c2c15349f5
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Adding support to GRO (Generic Receive Offload) in rmnet_data
interfaces.
Rmnet_data interface invokes the GRO API napi_gro_receive() using
the napi_struct which belongs to the current NAPI context, as
given by the network stack.
CRs-fixed: 784626
Change-Id: I183ba73f176cb0ee0ccb94cc2d77209bb26f7506
Signed-off-by: Sivan Reinstein <sivanr@codeaurora.org>
Modify all alloc's to use the style as mentioned in the kernel
coding guidelines.
Change-Id: Id471e6640d69c4cad2b6f44049e291a06e0306b3
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Using skb_clone() to deaggregate frames results in multiple side effects.
First side effect is that skb->truesize will reflect the total
aggregation buffer size for each 1500-byte frame charged to a socket.
Hence, the socket thinks it is using more memory than it actually is
resulting in unnecessary packet drops.
Second side effect is that GRO will not work correctly since the cloned
SKBs share the same shinfo struct. GRO uses the gso_segs field here
and will get confused when the packets have bad values here.
The new algorithm removes skb_clone altogether. Instead a new empty
skb is allocated with alloc_skb() and a manual memcpy is performed to
move the data. This guarantees that there are no shared segments
between SKBs.
Change-Id: I1f1b69a22ed4726c31b8d3295622a604af95d008
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Update RmNet Data to pass this new parameter in during VND
creation. Going with NET_NAME_ENUM as these are reusable
names.
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
MAPv3 UL checksum header requires that the CHECKSUM_INSERT_OFFSET
will be relative to CHECKSUM_START_OFFSET.
CRs-fixed: 806129
Change-Id: I1f8363e90588dfbd3ac4f9f35defd6259406d8b5
Signed-off-by: Sivan Reinstein <sivanr@codeaurora.org>
Previously, one new work-queue item was created and scheduled with
schedule_work() for each VND getting unregistered. Since we know the
exact set of VNDs which need to be cleared ahead of time, the VNDs
are added to a list and freed at the same time with a single work-queue
task. This saves us from having to malloc/schedule/free for each
VND and provides a speed up on some low tier hardware.
CRs-Fixed: 738039
Change-Id: I02d4de1308a2aed9d493f6fd58cf0984265facba
Acked-by: Nagarjuna Chaganti <nchagant@qti.qualcomm.com>
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Whenever handling a force-unassociate, make sure the device is closed
before freeing the logical endoint configuration. Whenever the endpoint
config is cleared, the egress device is set to null. This can cause null
pointer dereference if the endpoint config is cleared at the same time a
packet is being transmitted.
[ 479.906025] [RMNET:HI] rmnet_config_notify_cb(): Kernel is trying to un
register rmnet_ipa0
[ 479.913428] Unable to handle kernel NULL pointer dereference at virtual
address 000002c0
[ 480.068123] [<ffffffc000c73608>] rmnet_egress_handler+0x30/0x2bc
[ 480.074109] [<ffffffc000c728e8>] rmnet_vnd_start_xmit+0x108/0x13c
[ 480.080192] [<ffffffc000ae42ec>] dev_hard_start_xmit+0x260/0x484
[ 480.086178] [<ffffffc000afd390>] sch_direct_xmit+0x68/0x198
[ 480.091732] [<ffffffc000afd5b0>] __qdisc_run+0xf0/0x140
[ 480.096938] [<ffffffc000ae4794>] dev_queue_xmit+0x284/0x400
Change-Id: Ib87b123dc565b087374dfde6d3c40ddccf2a257d
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Flow control the entire RmNet Data virtual network device whenever
we receive a MAP flow control command with flow ID 0xFFFFFFFF. Since
it is guaranteed that we will never mix 0xFFFFFFFF with other flow IDs
(e.g.. disable 0xFFFFFFFF enable 0x00000001), TC based flow control
is not required. Instead netif stop/wake queue APIs are used in immediate
context.
CRs-Fixed: 767337
Change-Id: I8eff0988fa38726284789b70e045cc4b1dbb5d4e
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Do not aggregate frames if they are sapced out more than 10ms. Since the
scedule_delayed_work() API only takes time in jiffies, ping packets
are getting substantially delayed. Instead, just send them. This parameter
is tunable from the module parameters location.
CRs-Fixed: 772705
Change-Id: I6ac337c8d61b1290f939b86081070c14c2c757b1
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>
Accumulation logic now respects max packet count as well as buffer size.
Additionally, packets will get shipped if they have been sitting around
for more than 1ms. This parameter is tunable from the module parameters
location.
CRs-Fixed: 772705
Change-Id: I1b5cb597ef6adfe19df590582f9a6cae091c5977
Signed-off-by: Harout Hedeshian <harouth@codeaurora.org>