Add support for IPv6 to VRF device driver. Implemenation parallels what
has been done for IPv4.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add operations to retrieve cached IPv6 dst entry from l3mdev device
and lookup IPv6 source address.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As originally written rt6_uncached_list_flush_dev makes no sense when
called with dev == NULL as it attempts to flush all uncached routes
regardless of network namespace when dev == NULL. Which is simply
incorrect behavior.
Furthermore at the point rt6_ifdown is called with dev == NULL no more
network devices exist in the network namespace so even if the code in
rt6_uncached_list_flush_dev were to attempt something sensible it
would be meaningless.
Therefore remove support in rt6_uncached_list_flush_dev for handling
network devices where dev == NULL, and only call rt6_uncached_list_flush_dev
when rt6_ifdown is called with a network device.
Fixes: 8d0b94afdc ("ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister")
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Reviewed-by: Martin KaFai Lau <kafai@fb.com>
Tested-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
VLANs 0 and 4095 are reserved and shouldn't be used, add checks to
switchdev similar to the bridge. Also make sure ids above 4095 cannot
be passed either.
Fixes: 47f8328bb1 ("switchdev: add new switchdev bridge setlink")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Elad Raz <eladr@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Sathya Perla says:
====================
be2net: patch set
Patch 1 fixes a FW image compatibility check in the driver that
prevents certain FW images from being flashed on BE3 (not BE3-R)
adapters.
Patch 2 fixes a spin_lock not being released in a failure case in
be_cmd_notify_wait().
Patch 3 includes a workaround to pad packets that are only 32b long or less
to be applicabe to BE3 too. This workaround was currently applied only to
Skyhawk and Lancer chips. Such packets are causing BE3's TX path to stall
on a SR-IOV config.
Patch 4 fixes the be_cmd_get_profile_config() routine to set the pf_num
field in the cmd request. The FW requires this field to be set for it to
return the specific function's descriptors. If not set, the FW returns
the descriptors of all the functions on the device. If the first descriptor
is not what is being queried for, the driver will read wrong data.
This patch fixes this issue by using the GET_CNTL_ATTRIB cmd to query the
real pci_func_num of a function and then uses it in the GET_PROFILE_CONFIG
cmd.
Patch 5 completes an earlier fix that removed the vlan promisc capability
for VFs. The earlier fix did not update the removal of this capability from
the profile descriptor of the VF. This causes the VF driver to request this
capability when it tries to create it's interface at probe time. This could
potentailly cause the VF probe to fail if the FW enforces strict checking of
the flags based on what was provisoned by the PF. This strict checking is
not being done by FW currently but will be fixed in a future version. This
patch fixes this issue by updating the VF's profile descriptor so that they
match the interface capability flags provisioned by the PF.
Pls consider adding these patches to the net tree. Thanks!
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit 435452aa88 ("Prevent VFs from enabling VLAN promiscuous mode")
fixed the PF driver to not include the VLAN promisc capability while
provisioning the interface for a VF. But the fix did not remove this
capability from the profile descriptor of the VF. This causes the VF
driver to request this capability when it tries to create it's interface
at probe time. This could potentailly cause the VF probe to fail if the
FW enforces strict checking of the flags based on what was provisoned
by the PF. This strict checking is not being done by FW currently but
will be fixed in a future version. This patch fixes this issue by updating
the VF's profile descriptor so that they match the interface capability
flags provisioned by the PF.
Fixes: 435452aa88 ("Prevent VFs from enabling VLAN promiscuous mode")
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The FW requires the pf_num field in the cmd hdr to be set for it to return
the specific function's descriptors in the GET_PROFILE_CONFIG cmd. If not
set, the FW returns the descriptors of all the functions on the device.
If the first descriptor is not what is being queried for, the driver will
read wrong data. This patch fixes this issue by using the GET_CNTL_ATTRIB
cmd to query the real pci_func_num of a function and then uses it in the
GET_PROFILE_CONFIG cmd.
Signed-off-by: Somnath Kotur <somnath.kotur@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
On BE3 chips in SRIOV configs, the TX path stalls when a packet less
than 32B is received from the host. A workaround to pad such packets
already exists for the Skyhawk and Lancer chips. Use the same workaround
for BE3 chips too.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The mcc/mbox lock is not being released when be_cmd_copy() returns
an error.
Signed-off-by: Suresh Reddy <suresh.reddy@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In the BE3 FW image, unlike Skyhawk's, the "asic_type_rev" field doesn't
track the asic_rev of chip it is compatible with. When asic_type_rev
is 0 the image is compatible only with pre-BE3-R chips (asic_rev < 0x10).
Fix the current compatibility check to take care of this.
We hit this issue when we try to flash old BE3 images (used prior to the
release of BE3-R) on pre-BE3-R adapters.
Fixes: a6e6ff6eee ("be2net: simplify UFI compatibility checking")
Signed-off-by: Kalesh AP <kalesh.purayil@avagotech.com>
Signed-off-by: Sathya Perla <sathya.perla@avagotech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
commit afae5ad78b
"net/fsl_pq_mdio: streamline probing of MDIO nodes"
added support for different types of MDIO devices:
1) Gianfar MDIO nodes that only map the MII registers
2) Gianfar MDIO nodes that map the full MDIO register set
3) eTSEC2 MDIO nodes (which map the full MDIO register set)
4) QE MDIO nodes (which map only the MII registers)
However, the implementation for types 1 and 4 would mistakenly assume
a mapping of the full MDIO register set, thereby computing the address
for the TBI register starting from the containing structure.
The TBI register would therefore be accessed at a wrong (much bigger)
address, not giving the expected result at all.
This patch restores the correct behavior we had prior to the above one.
The consequences of this bug are apparent when trying to access a PHY
with the same address as the value contained in the initial value of
the TBI register (normally 0); in that case you'll get answers from the
internal TBI device (even though MDIO/MDC pins are actually *also*
toggling on the physical bus!).
Beware that you also need to add a fake tbi node to your device tree
with an unused address.
Notice how this fix is related to commit
220669495b
"powerpc: Add TBI PHY node to first MDIO bus"
which fixed the behavior in kernel 3.3, which was later broken by the
above commit on kernel 3.7.
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Timur Tabi <timur@tabi.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
When configuring the MDIO subsystem it is also necessary to configure
the TBI register. Make sure the TBI is contained within the mapped
register range in order to:
a) make sure the address is computed correctly
b) make users aware that we're actually accessing that register
In case of error, print a message but continue anyway.
Signed-off-by: Gerlando Falauto <gerlando.falauto@keymile.com>
Cc: Timur Tabi <timur@tabi.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot says:
====================
net: dsa: mv88e6xxx: fix hardware bridging
DSA and its drivers currently hook the NETDEV_CHANGEUPPER net_device event in
order to configure the VLAN map of every port.
This VLAN map is a feature of these switch chips to hardcode and restrict which
output ports a given input port can egress frames to.
A Linux bridge is a simple untagged VLAN propagated by the bridge code itself.
With a proper 802.1Q support, a driver does not need this hook anymore, and
will simply program the related VLAN object.
This patchset improves the hardware bridging code in the mv88e6xxx driver with
a strict 802.1Q mode.
Ideally, the equivalent must be done for Broadcom Starfighter 2 and Rocker,
before completely getting rid of this hook.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Playing with the VLAN map of every port to implement "hardware bridging"
in the 88E6352 driver was a hack until full 802.1Q was supported.
Indeed with 802.1Q port mode "Disabled" or "Fallback", this feature is
used to restrict which output ports an input port can egress frames to.
A Linux bridge is an untagged VLAN. With full 802.1Q support, we don't
need this hack anymore and can use the "Secure" strict 802.1Q port mode.
With this mode, the port-based VLAN map still needs to be configured,
but all the logic is VTU-centric. This means that the switch only cares
about rules described in its hardware VLAN table, which is exactly what
Linux bridge expects and what we want.
Note also that the hardware bridging was broken with the previous
flexible "Fallback" 802.1Q port mode. Here's an example:
Port0 and Port1 belong to the same bridge. If Port0 sends crafted tagged
frames with VID 200 to Port1, Port1 receives it. Even if Port1 is in
hardware VLAN 200, but not Port0, Port1 will still receive it, because
Fallback mode doesn't care about invalid VID or non-member source port.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A DSA driver may not provide the port_join_bridge and port_leave_bridge
functions, so don't warn in such case.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since we configure a switch chip through a Linux bridge, and a bridge is
implemented as a VLAN, there is no need for per-port FID anymore.
This patch gets rid of this and simplifies the driver code since we can
now directly map all 4095 FIDs available to all VLANs.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
With 88E6352 and similar switch chips, each port has a map to restrict
which output port this input port can egress frames to.
The current driver code implements hardware bridging using this feature,
and assigns to a bridge group the FID of its first member.
Now that 802.1Q is fully implemented in this driver, a Linux bridge
which is a simple untagged VLAN, already gets its own FID.
This patch gets rid of the per-bridge FID and explicits the usage of the
port based VLAN map feature.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Consider the following "duelling syn" sequence between two peers A and B:
A B
SYN1 -->
<-- SYN2
SYN2ACK -->
Note that the SYN/ACK has already been sent out by TCP before
rds_tcp_accept_one() gets invoked as part of callbacks.
If the inet_addr(A) is numerically less than inet_addr(B),
the arbitration scheme in rds_tcp_accept_one() will prefer the
TCP connection triggered by SYN1, and will send a CLOSE for the
SYN2 (just after the SYN2ACK was sent).
Since B also follows the same arbitration scheme, it will send the SYN-ACK
for SYN1 that will set up a healthy ESTABLISHED connection on both sides.
B will also get a CLOSE for SYN2, which should result in the cleanup
of the TCP state machine for SYN2, but it should not trigger any
stale RDS-TCP callbacks (such as ->writespace, ->state_change etc),
that would disrupt the progress of the SYN2 based RDS-TCP connection.
Thus the arbitration scheme in rds_tcp_accept_one() should restore
rds_tcp callbacks for the winner before setting them up for the
new accept socket, and also make sure that conn->c_outgoing
is set to 0 so that we do not trigger any reconnect attempts on the
passive side of the tcp socket in the future, in conformance with
commit c82ac7e69e ("net/rds: RDS-TCP: only initiate reconnect attempt
on outgoing TCP socket.")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The IP address passed to rds_bind() should be vetted by the
transport's ->laddr_check() for a previously bound transport.
This needs to be done to avoid cases where, for example,
the application has asked for an IB transport,
but the IP address passed to bind is only usable on
ethernet interfaces.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The only instance of a qlcnic_mbx_ops structure is never modified. Thus
the declaration of the structure and all references to the structure type
can be made const.
In the definition of the qlcnic_mailbox structure, the ops field is no
longer lined up with the other fields. This was left as is, to avoid a lot
of trivial changes on the other lines.
Done with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Sony Chacko <sony.chacko@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Usage of -prev seems buggy. While packet was out our hook cannot be
removed but we have no way to know if the previous one is still valid.
So better not use ->prev at all. Since NF_REPEAT just asks to invoke
same hook function again, just do so, and continue with nf_interate
if we get an ACCEPT verdict.
A side effect of this change is that if nf_reinject(NF_REPEAT) causes
another REPEAT we will now drop the skb instead of a kernel loop.
However, NF_REPEAT loops would be a bug so this should not happen anyway.
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Commit 30686bf7f5 ("mac80211: convert HW flags to unsigned long
bitmap") accidentally removed the newline delimiter from the hwflags
debugfs file. Fix this by adding back the newline between the HW flags.
Cc: stable@vger.kernel.org [4.2]
Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com>
[fix commit log]
Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
The commit "drm/vmwgfx: Fix up user_dmabuf refcounting", while fixing a
kernel crash introduced a NULL pointer dereference on older hardware.
Fix this.
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
Commit 7a5692e6e5 ("arch/powerpc: provide zero_bytemask() for
big-endian") added a call to __fls() in our word-at-a-time.h. That was
fine for the kernel build but missed the fact that we also use
word-at-a-time.h in a userspace test.
Pulling in the kernel version of __fls() gets messy, so just define our
own, it's unlikely to change often.
Fixes: 7a5692e6e5 ("arch/powerpc: provide zero_bytemask() for big-endian")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.
Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix a smatch warning:
drivers/atm/iphase.c:1178 rx_pkt() warn: curly braces intended?
The code is correct, the indention is misleading. In case the allocation
of skb fails, we want to skip to the end.
Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Smatch complains about returning hard coded error codes, silence this
warning.
drivers/atm/iphase.c:115 ia_enque_rtn_q() warn: returning -1 instead of -ENOMEM is sloppy
Signed-off-by: Tillmann Heidsieck <theidsieck@leenox.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch makes ip6_route_info_create return err pointer instead of
returning the rt pointer by reference as suggested by Dave
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fix the building error reported by Jiri Pirko <jiri@resnulli.us>
drivers/net/ethernet/hisilicon/hns/hnae.h:465:2: error: unknown type
name 'phy_interface_t'
phy_interface_t phy_if;
^
the full build log is on https://lists.01.org/pipermail/kbuild-all.
Signed-off-by: huangdaode <huangdaode@hisilicon.com>
Signed-off-by: yankejian <yankejian@huawei.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
timewait or request sockets are small and do not contain sk->sk_tsflags
Without this fix, we might read garbage, and crash later in
__skb_complete_tx_timestamp()
-> sock_queue_err_skb()
(These pseudo sockets do not have an error queue either)
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric W. Biederman says:
====================
net: Pass net into defragmentation
This is the next installment of my work to pass struct net through the
output path so the code does not need to guess how to figure out which
network namespace it is in, and ultimately routes can have output
devices in another network namespace.
In netfilter and af_packet we defragment packets in the output path,
and there is the usual amount of confusion about how to compute which
net we are processing the packets in. This patchset clears that
confusion up by explicitly passing in struct net in ip_defrag,
ip_check_defrag, and nf_ct_frag6_gather.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The function nf_ct_frag6_gather is called on both the input and the
output paths of the networking stack. In particular ipv6_defrag which
calls nf_ct_frag6_gather is called from both the the PRE_ROUTING chain
on input and the LOCAL_OUT chain on output.
The addition of a net parameter makes it explicit which network
namespace the packets are being reassembled in, and removes the need
for nf_ct_frag6_gather to guess.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The function ip_defrag is called on both the input and the output
paths of the networking stack. In particular conntrack when it is
tracking outbound packets from the local machine calls ip_defrag.
So add a struct net parameter and stop making ip_defrag guess which
network namespace it needs to defragment packets in.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_call_ra_chain is called early in the forwarding chain from
ip_forward and ip_mr_input, which makes skb->dev the correct
expression to get the input network device and dev_net(skb->dev) a
correct expression for the network namespace the packet is being
processed in.
Compute the network namespace and store it in a variable to make the
code clearer.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Recent TCP listener patches exposed a prior af_packet bug :
match_fanout_group() blindly assumes it is always safe
to cast sk to a packet socket to compare fanout with af_packet_priv
But SYNACK packets can be sent while attached to request_sock, which
are smaller than a "struct sock".
We can read non existent memory and crash.
Fixes: c0de08d042 ("af_packet: don't emit packet on orig fanout group")
Fixes: ca6fb06518 ("tcp: attach SYNACK messages to request sockets instead of listener")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
RTA_ALIGNTO is currently define as 4. It has to be 4U to prevent warning
for RTA_ALIGN and RTA_DATA expansions when -Wconversion gcc option is
enabled.
This follows NLMSG_ALIGNTO definition in <include/uapi/linux/netlink.h>.
Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
iwlwifi
* some debugfs improvements
* fix signedness in beacon statistics
* deinline some functions to reduce size when device tracing is enabled
* filter beacons out in AP mode when no stations are associated
* deprecate firmwares version -12
* fix a runtime PM vs. legacy suspend race
* one-liner fix for a ToF bug
* clean-ups in the rx code
* small debugging improvement
* fix WoWLAN with new firmware versions
* more clean-ups towards multiple RX queues;
* some rate scaling fixes and improvements;
* some time-of-flight fixes;
* other generic improvements and clean-ups;
brcmfmac
* rework code dealing with multiple interfaces
* allow logging firmware console using debug level
* support for BCM4350, BCM4365, and BCM4366 PCIE devices
* fixed for legacy P2P and P2P device handling
* correct set and get tx-power
ath9k
* add support for Outside Context of a BSS (OCB) mode
mwifiex
* add USB multichannel feature
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
iQEcBAABAgAGBQJWF9ciAAoJEG4XJFUm622bVaAH/3Fi4CaKrDF6L8lxSRWUZzft
Ie2X0FC+d5knpS7dOd7iI02MuEuKCg3f6dmtDrCDFBqFohvfO5NkG4XU81jdIiWM
Xkyxlgcy/1TuILNjQfNh/2nhjpvvHDCyptl+jimeT2VR2ITD/Vj3IOAMA5l4khyx
OeWmgW7dT9xLwYYy20ql5QLGkbxwJlHawUw/d+3yiS+AHO+6dVGJL2OtpyrlPP/F
0KpSj0lZY9UNRL+i6FbONDCBYeG+q/lA5G5nGXBF6zEeZ6BcuWNRcBBGr2n/6uMy
gQMAunqBIunfYkfpEKYEPF5zoyO/wCmvPLxx56iS8okGSVw4KzQ2DtQ0leFbjBw=
=1po3
-----END PGP SIGNATURE-----
Merge tag 'wireless-drivers-next-for-davem-2015-10-09' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
Major changes:
iwlwifi
* some debugfs improvements
* fix signedness in beacon statistics
* deinline some functions to reduce size when device tracing is enabled
* filter beacons out in AP mode when no stations are associated
* deprecate firmwares version -12
* fix a runtime PM vs. legacy suspend race
* one-liner fix for a ToF bug
* clean-ups in the rx code
* small debugging improvement
* fix WoWLAN with new firmware versions
* more clean-ups towards multiple RX queues;
* some rate scaling fixes and improvements;
* some time-of-flight fixes;
* other generic improvements and clean-ups;
brcmfmac
* rework code dealing with multiple interfaces
* allow logging firmware console using debug level
* support for BCM4350, BCM4365, and BCM4366 PCIE devices
* fixed for legacy P2P and P2P device handling
* correct set and get tx-power
ath9k
* add support for Outside Context of a BSS (OCB) mode
mwifiex
* add USB multichannel feature
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch allows configuring how the source address of ICMP
redirect messages is selected; by default the old behaviour is
retained, while setting icmp_redirects_use_orig_daddr force the
usage of the destination address of the packet that caused the
redirect.
The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
following scenario:
Two machines are set up with VRRP to act as routers out of a subnet,
they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
x.x.x.254/24.
If a host in said subnet needs to get an ICMP redirect from the VRRP
router, i.e. to reach a destination behind a different gateway, the
source IP in the ICMP redirect is chosen as the primary IP on the
interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.
The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
and will continue to use the wrong next-op.
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Some drivers need to implement both switchdev vlan ops and
vid_add/kill ndos. For that to work in bridge code, we need to try
switchdev op first when adding/deleting vlan id.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
In bnx2_init_board, missing free temp_stats_blk on error path when
some operations do failed. Just add the 'kfree' operation.
Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet says:
====================
tcp: better smp listener behavior
As promised in last patch series, we implement a better SO_REUSEPORT
strategy, based on cpu hints if given by the application.
We also moved sk_refcnt out of the cache line containing the lookup
keys, as it was considerably slowing down smp operations because
of false sharing. This was simpler than converting listen sockets
to conventional RCU (to avoid sk_refcnt dirtying)
Could process 6.0 Mpps SYN instead of 4.2 Mpps on my test server.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Reducing tcp_timewait_sock from 280 bytes to 272 bytes
allows SLAB to pack 15 objects per page instead of 14 (on x86)
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One 32bit hole is following skc_refcnt, use it.
skc_incoming_cpu can also be an union for request_sock rcv_wnd.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
sk->sk_refcnt is dirtied for every TCP/UDP incoming packet.
This is a performance issue if multiple cpus hit a common socket,
or multiple sockets are chained due to SO_REUSEPORT.
By moving sk_refcnt 8 bytes further, first 128 bytes of sockets
are mostly read. As they contain the lookup keys, this has
a considerable performance impact, as cpus can cache them.
These 8 bytes are not wasted, we use them as a place holder
for various fields, depending on the socket type.
Tested:
SYN flood hitting a 16 RX queues NIC.
TCP listener using 16 sockets and SO_REUSEPORT
and SO_INCOMING_CPU for proper siloing.
Could process 6.0 Mpps SYN instead of 4.2 Mpps
Kernel profile looked like :
11.68% [kernel] [k] sha_transform
6.51% [kernel] [k] __inet_lookup_listener
5.07% [kernel] [k] __inet_lookup_established
4.15% [kernel] [k] memcpy_erms
3.46% [kernel] [k] ipt_do_table
2.74% [kernel] [k] fib_table_lookup
2.54% [kernel] [k] tcp_make_synack
2.34% [kernel] [k] tcp_conn_request
2.05% [kernel] [k] __netif_receive_skb_core
2.03% [kernel] [k] kmem_cache_alloc
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
SO_INCOMING_CPU as added in commit 2c8c56e15d was a getsockopt() command
to fetch incoming cpu handling a particular TCP flow after accept()
This commits adds setsockopt() support and extends SO_REUSEPORT selection
logic : If a TCP listener or UDP socket has this option set, a packet is
delivered to this socket only if CPU handling the packet matches the specified
one.
This allows to build very efficient TCP servers, using one listener per
RX queue, as the associated TCP listener should only accept flows handled
in softirq by the same cpu.
This provides optimal NUMA behavior and keep cpu caches hot.
Note that __inet_lookup_listener() still has to iterate over the list of
all listeners. Following patch puts sk_refcnt in a different cache line
to let this iteration hit only shared and read mostly cache lines.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's useful to allow users to set fwmark for an individual packet,
without changing the socket state. The function this patch adds in
sock layer can be used by the protocols that need such a feature.
Signed-off-by: Edward Hyunkoo Jee <edjee@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>