We have two cases.
Either the socket is in TCP_ESTABLISHED state and connect() filled
in the inet socket cork flow, or we looked up the route here and
used an on-stack flow.
Track which one it was, and use it to obtain src/dst addrs.
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables ethtool to set the loopback mode on a given interface.
By configuring the interface in loopback mode in conjunction with a policy
route / rule, a userland application can stress the egress / ingress path
exposing the flows of the change in progress and potentially help developer(s)
understand the impact of those changes without even sending a packet out
on the network.
Following set of commands illustrates one such example -
a) ip -4 addr add 192.168.1.1/24 dev eth1
b) ip -4 rule add from all iif eth1 lookup 250
c) ip -4 route add local 0/0 dev lo proto kernel scope host table 250
d) arp -Ds 192.168.1.100 eth1
e) arp -Ds 192.168.1.200 eth1
f) sysctl -w net.ipv4.ip_nonlocal_bind=1
g) sysctl -w net.ipv4.conf.all.accept_local=1
# Assuming that the machine has 8 cores
h) taskset 000f netserver -L 192.168.1.200
i) taskset 00f0 netperf -t TCP_CRR -L 192.168.1.100 -H 192.168.1.200 -l 30
Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
TCP Cubic keeps a metric that estimates the amount of delayed
acknowledgements to use in adjusting the window. If an abnormally
large number of packets are acknowledged at once, then the update
could wrap and reach zero. This kind of ACK could only
happen when there was a large window and huge number of
ACK's were lost.
This patch limits the value of delayed ack ratio. The choice of 32
is just a conservative value since normally it should be range of
1 to 4 packets.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
I don't know why %pI6 doesn't compress, but the format specifier is
kernel-standard, so use it.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This allows us to acquire the exact route keying information from the
protocol, however that might be managed.
It handles all of the possibilities, from the simplest case of storing
the key in inet->cork.fl to the more complex setup SCTP has where
individual transports determine the flow.
Signed-off-by: David S. Miller <davem@davemloft.net>
Operation order is now transposed, we first create the child
socket then we try to hook up the route.
Signed-off-by: David S. Miller <davem@davemloft.net>
DESCRIPTION
This patch tries to restore the initial init and cleanup
sequences that was before namspace patch.
Netns also requires action when net devices unregister
which has never been implemented. I.e this patch also
covers when a device moves into a network namespace,
and has to be released.
IMPLEMENTATION
The number of calls to register_pernet_device have been
reduced to one for the ip_vs.ko
Schedulers still have their own calls.
This patch adds a function __ip_vs_service_cleanup()
and an enable flag for the netfilter hooks.
The nf hooks will be enabled when the first service is loaded
and never disabled again, except when a namespace exit starts.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
If the sync daemons run in a name space while it crashes
or get killed, there is no way to stop them except for a reboot.
When all patches are there, ip_vs_core will handle register_pernet_(),
i.e. ip_vs_sync_init() and ip_vs_sync_cleanup() will be removed.
Kernel threads should not increment the use count of a socket.
By calling sk_change_net() after creating a socket this is avoided.
sock_release cant be used intead sk_release_kernel() should be used.
Thanks Eric W Biederman for your advices.
Signed-off-by: Hans Schillstrom <hans@schillstrom.com>
[horms@verge.net.au: minor edit to changelog]
Signed-off-by: Simon Horman <horms@verge.net.au>
This is just like inet_csk_route_req() except that it operates after
we've created the new child socket.
In this way we can use the new socket's cork flow for proper route
key storage.
This will be used by DCCP and TCP child socket creation handling.
Signed-off-by: David S. Miller <davem@davemloft.net>
Several future simplifications are possible now because of this.
For example, the sctp_addr unions can simply refer directly to
the flowi information.
Signed-off-by: David S. Miller <davem@davemloft.net>
All invokers of ip_queue_xmit() must make certain that the
socket is locked. All of SCTP, TCP, DCCP, and L2TP now make
sure this is the case.
Therefore we can use the cork flow during output route lookup in
ip_queue_xmit() when the socket route check fails.
Signed-off-by: David S. Miller <davem@davemloft.net>
These two functions must be invoked only when the socket is locked
(because socket identity modifications are made non-atomically).
Therefore we can use the cork flow for output route lookups.
Signed-off-by: David S. Miller <davem@davemloft.net>
This is to make sure that an l2tp socket's inet cork flow is
fully filled in, when it's encapsulated in UDP.
Signed-off-by: David S. Miller <davem@davemloft.net>
l2tp_xmit_skb() must take the socket lock. It makes use of ip_queue_xmit()
which expects to execute in a socket atomic context.
Since we execute this function in software interrupts, we cannot use the
usual lock_sock()/release_sock() sequence, instead we have to use
bh_lock_sock() and see if a user has the socket locked, and if so drop
the packet.
Signed-off-by: David S. Miller <davem@davemloft.net>
Both l2tp_ip_connect() and l2tp_ip_sendmsg() must take the socket
lock. They both modify socket state non-atomically, and in particular
l2tp_ip_sendmsg() increments socket private counters without using
atomic operations.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this is invoked from inet_stream_connect() the socket is locked
and therefore this usage is safe.
Signed-off-by: David S. Miller <davem@davemloft.net>
Since this is invoked from inet_stream_connect() the socket is locked
and therefore this usage is safe.
Signed-off-by: David S. Miller <davem@davemloft.net>
After that all the upstream kernel drivers now use phys_id,
and the old ethtool_ops interface (phys_id) can be removed.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In function is_bidirectional_neigh the code that find out the one hop
neighbor is duplicated.
Signed-off-by: Daniele Furlan <daniele.furlan@gmail.com>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
It is slightly irritating that comments after a long line span over
multiple lines without any code. It is easier to put them before the
actual code and reduce the number of lines which the eye has to read.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
To be coherent, all the functions/variables/constants have been renamed
to the TranslationTable style
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
The hard_if_event is called by the notifier with rtnl_lock and tries to
remove sysfs entries when a NETDEV_UNREGISTER event is received. This
will automatically take the s_active lock.
The s_active lock is also used when a new interface is added to a meshif
through sysfs. In that situation we cannot wait for the rntl_lock before
creating the actual batman-adv interface to prevent a deadlock. It is
still possible to try to get the rtnl_lock and immediately abort the
current operation when the trylock call failed.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
hardif_list_lock is unneccessary because we already ensure that no
multiple admin operations can take place through rtnl_lock.
hardif_list_lock only adds additional overhead and complexity.
Critical functions now check whether they are called with rtnl_lock
using ASSERT_RTNL.
It indirectly fixes the problem that orig_hash_del_if() expects that
only one interface is deleted from hardif_list at a time, but
hardif_remove_interfaces() removes all at once and then calls
orig_hash_del_if().
Reported-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
The bridge loop detection for batman-adv allows the bat0 interface
to be bridged into an ethernet segment which other batman-adv nodes
are connected to. In order to also allow multiple VLANs on top of
the bat0 interface to be bridged into the ethernet segment this
patch extends the aforementioned bridge loop detection.
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
ip_setup_cork() explicitly initializes every member of
inet_cork except flags, addr, and opt. So we can simply
set those three members to zero instead of using a
memset() via an empty struct assignment.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
When we fast path datagram sends to avoid locking by putting
the inet_cork on the stack we use up lots of space that isn't
necessary.
This is because inet_cork contains a "struct flowi" which isn't
used in these code paths.
Split inet_cork to two parts, "inet_cork" and "inet_cork_full".
Only the latter of which has the "struct flowi" and is what is
stored in inet_sock.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
A length of zero (after subtracting two for the type and len fields) for
the DCCPO_{CHANGE,CONFIRM}_{L,R} options will cause an underflow due to
the subtraction. The subsequent code may read past the end of the
options value buffer when parsing. I'm unsure of what the consequences
of this might be, but it's probably not good.
Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable@kernel.org
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
This adds basic support for the new WoWLAN
configuration in mac80211. The behaviour is
completely offloaded to the driver though,
with two new callbacks (suspend/resume).
Options for the driver include a complete
reconfiguration after wakeup, and exposing
all the triggers it wants to support.
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This is based on (but now quite far from) the
original work from Luis and Eliad. Add support
for configuring WoWLAN triggers, and getting
the configuration out again. Changes from the
original patchset are too numerous to list,
but one important change needs highlighting:
the suspend() callback is passed NULL for the
trigger configuration if userspace has not
configured WoWLAN at all.
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The only user of WIPHY_FLAG_SUPPORTS_SEPARATE_DEFAULT_KEYS was removed
and consequently, this flag can be removed, too. In addition, a single
capability flag was not enough to indicate this capability clearly since
the device behavior may be different based on which operating mode is
being used.
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Commit dbd2fd656f added a mechanism for
user space to indicate whether a default key is being configured for
only unicast or only multicast frames instead of all frames. This
commit added a driver capability flag for indicating whether separate
default keys are supported and validation of the set_key command based
on that capability.
However, this single capability flag is not enough to cover possible
difference based on mode (AP/IBSS/STA) and the way this change was
introduced resulted in a regression with drivers that do not indicate
the new capability (i.e.., more or less any non-mac80211 driver using
cfg80211) when using a recent wpa_supplicant snapshot.
Fix the regression by removing the new check which is not strictly
speaking needed. The new separate default key functionality is needed
only for RSN IBSS which has a separate capability indication.
Cc: stable@kernel.org
Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Whenever the driver's queue depth reaches the max, the queues are
stopped by the driver till the driver can accept the frames.
At the mean time dynamic_ps_timer can be expired due to not
receiving packet from upper layer which could restart the transmission
at the end of ps work. Due to the mismatch with driver state,
mac80211 is unneccesarity buffering all the frames till the driver
wakes up the queue.
Check whether there is no transmit or the tx queues were stopped by some
reasons. If any of the queue was stopped, the postpond ps timer and
do not restart netif_tx.
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The queue mapping of internal mgmt packets is set to VO. Set the TID
value to match the queue mapping. Otherwise drivers that only look at
the TID might get confused.
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
netif_tx_start_all_queues is used to allow the upper layer
to transmit frames but it does not restart transmission.
To restart the trasmission use netif_tx_wake_all_queues.
Not doing so, sometimes stalls the transmission and the
application has to be restarted to proceed further.
This issue was originally found while sending udp traffic
in higer bandwidth in open environment without bgscan.
Cc: stable@kernel.org
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
This patch adds a multiple message send syscall and is the send
version of the existing recvmmsg syscall. This is heavily
based on the patch by Arnaldo that added recvmmsg.
I wrote a microbenchmark to test the performance gains of using
this new syscall:
http://ozlabs.org/~anton/junkcode/sendmmsg_test.c
The test was run on a ppc64 box with a 10 Gbit network card. The
benchmark can send both UDP and RAW ethernet packets.
64B UDP
batch pkts/sec
1 804570
2 872800 (+ 8 %)
4 916556 (+14 %)
8 939712 (+17 %)
16 952688 (+18 %)
32 956448 (+19 %)
64 964800 (+20 %)
64B raw socket
batch pkts/sec
1 1201449
2 1350028 (+12 %)
4 1461416 (+22 %)
8 1513080 (+26 %)
16 1541216 (+28 %)
32 1553440 (+29 %)
64 1557888 (+30 %)
We see a 20% improvement in throughput on UDP send and 30%
on raw socket send.
[ Add sparc syscall entries. -DaveM ]
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Force dev_alloc_name() to be called from register_netdevice() by
dev_get_valid_name(). That allows to remove multiple explicit
dev_alloc_name() calls.
The possibility to call dev_alloc_name in advance remains.
This also fixes veth creation regresion caused by
84c49d8c3e
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A new list was added to replace the socket based one. This new list
doesn't depent on sock and then fits better inside l2cap_core.c code.
It also rename l2cap_chan_alloc() to l2cap_chan_create() and
l2cap_chan_free() to l2cap_chan_destroy)
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
When the user doesn't specify a psm we have the choose one for the
channel. Now we do this inside l2cap_add_psm().
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
The intention is to get rid of the l2cap_sk_list usage inside
l2cap_core.c. l2cap_sk_list will soon be replaced by a list that does not
depend on socket usage.
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>