Commit graph

1113 commits

Author SHA1 Message Date
Thomas Falcon
5927e91f50 bonding: Force slave speed check after link state recovery for 802.3ad
[ Upstream commit 12185dfe44360f814ac4ead9d22ad2af7511b2e9 ]

The following scenario was encountered during testing of logical
partition mobility on pseries partitions with bonded ibmvnic
adapters in LACP mode.

1. Driver receives a signal that the device has been
   swapped, and it needs to reset to initialize the new
   device.

2. Driver reports loss of carrier and begins initialization.

3. Bonding driver receives NETDEV_CHANGE notifier and checks
   the slave's current speed and duplex settings. Because these
   are unknown at the time, the bond sets its link state to
   BOND_LINK_FAIL and handles the speed update, clearing
   AD_PORT_LACP_ENABLE.

4. Driver finishes recovery and reports that the carrier is on.

5. Bond receives a new notification and checks the speed again.
   The speeds are valid but miimon has not altered the link
   state yet.  AD_PORT_LACP_ENABLE remains off.

Because the slave's link state is still BOND_LINK_FAIL,
no further port checks are made when it recovers. Though
the slave devices are operational and have valid speed
and duplex settings, the bond will not send LACPDU's. The
simplest fix I can see is to force another speed check
in bond_miimon_commit. This way the bond will update
AD_PORT_LACP_ENABLE if needed when transitioning from
BOND_LINK_FAIL to BOND_LINK_UP.

CC: Jarod Wilson <jarod@redhat.com>
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Thomas Falcon <tlfalcon@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-09-06 10:18:05 +02:00
YueHaibing
a0f592c507 bonding: Add vlan tx offload to hw_enc_features
[ Upstream commit d595b03de2cb0bdf9bcdf35ff27840cc3a37158f ]

As commit 30d8177e8ac7 ("bonding: Always enable vlan tx offload")
said, we should always enable bonding's vlan tx offload, pass the
vlan packets to the slave devices with vlan tci, let them to handle
vlan implementation.

Now if encapsulation protocols like VXLAN is used, skb->encapsulation
may be set, then the packet is passed to vlan device which based on
bonding device. However in netif_skb_features(), the check of
hw_enc_features:

	 if (skb->encapsulation)
                 features &= dev->hw_enc_features;

clears NETIF_F_HW_VLAN_CTAG_TX/NETIF_F_HW_VLAN_STAG_TX. This results
in same issue in commit 30d8177e8ac7 like this:

vlan_dev_hard_start_xmit
  -->dev_queue_xmit
    -->validate_xmit_skb
      -->netif_skb_features //NETIF_F_HW_VLAN_CTAG_TX is cleared
      -->validate_xmit_vlan
        -->__vlan_hwaccel_push_inside //skb->tci is cleared
...
 --> bond_start_xmit
   --> bond_xmit_hash //BOND_XMIT_POLICY_ENCAP34
     --> __skb_flow_dissect // nhoff point to IP header
        -->  case htons(ETH_P_8021Q)
             // skb_vlan_tag_present is false, so
             vlan = __skb_header_pointer(skb, nhoff, sizeof(_vlan),
             //vlan point to ip header wrongly

Fixes: b2a103e6d0 ("bonding: convert to ndo_fix_features")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-25 10:53:06 +02:00
Cong Wang
2eb5ddc2bf bonding: validate ip header before check IPPROTO_IGMP
[ Upstream commit 9d1bc24b52fb8c5d859f9a47084bf1179470e04c ]

bond_xmit_roundrobin() checks for IGMP packets but it parses
the IP header even before checking skb->protocol.

We should validate the IP header with pskb_may_pull() before
using iph->protocol.

Reported-and-tested-by: syzbot+e5be16aa39ad6e755391@syzkaller.appspotmail.com
Fixes: a2fd940f4c ("bonding: fix broken multicast with round-robin mode")
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:34:55 +02:00
YueHaibing
ec109e6a9a bonding: Always enable vlan tx offload
[ Upstream commit 30d8177e8ac776d89d387fad547af6a0f599210e ]

We build vlan on top of bonding interface, which vlan offload
is off, bond mode is 802.3ad (LACP) and xmit_hash_policy is
BOND_XMIT_POLICY_ENCAP34.

Because vlan tx offload is off, vlan tci is cleared and skb push
the vlan header in validate_xmit_vlan() while sending from vlan
devices. Then in bond_xmit_hash, __skb_flow_dissect() fails to
get information from protocol headers encapsulated within vlan,
because 'nhoff' is points to IP header, so bond hashing is based
on layer 2 info, which fails to distribute packets across slaves.

This patch always enable bonding's vlan tx offload, pass the vlan
packets to the slave devices with vlan tci, let them to handle
vlan implementation.

Fixes: 278339a42a ("bonding: propogate vlan_features to bonding master")
Suggested-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-07-10 09:56:38 +02:00
Jarod Wilson
7a22a4ea67 bonding: fix arp_validate toggling in active-backup mode
[ Upstream commit a9b8a2b39ce65df45687cf9ef648885c2a99fe75 ]

There's currently a problem with toggling arp_validate on and off with an
active-backup bond. At the moment, you can start up a bond, like so:

modprobe bonding mode=1 arp_interval=100 arp_validate=0 arp_ip_targets=192.168.1.1
ip link set bond0 down
echo "ens4f0" > /sys/class/net/bond0/bonding/slaves
echo "ens4f1" > /sys/class/net/bond0/bonding/slaves
ip link set bond0 up
ip addr add 192.168.1.2/24 dev bond0

Pings to 192.168.1.1 work just fine. Now turn on arp_validate:

echo 1 > /sys/class/net/bond0/bonding/arp_validate

Pings to 192.168.1.1 continue to work just fine. Now when you go to turn
arp_validate off again, the link falls flat on it's face:

echo 0 > /sys/class/net/bond0/bonding/arp_validate
dmesg
...
[133191.911987] bond0: Setting arp_validate to none (0)
[133194.257793] bond0: bond_should_notify_peers: slave ens4f0
[133194.258031] bond0: link status definitely down for interface ens4f0, disabling it
[133194.259000] bond0: making interface ens4f1 the new active one
[133197.330130] bond0: link status definitely down for interface ens4f1, disabling it
[133197.331191] bond0: now running without any active interface!

The problem lies in bond_options.c, where passing in arp_validate=0
results in bond->recv_probe getting set to NULL. This flies directly in
the face of commit 3fe68df97c, which says we need to set recv_probe =
bond_arp_recv, even if we're not using arp_validate. Said commit fixed
this in bond_option_arp_interval_set, but missed that we can get to that
same state in bond_option_arp_validate_set as well.

One solution would be to universally set recv_probe = bond_arp_recv here
as well, but I don't think bond_option_arp_validate_set has any business
touching recv_probe at all, and that should be left to the arp_interval
code, so we can just make things much tidier here.

Fixes: 3fe68df97c ("bonding: always set recv_probe to bond_arp_rcv in arp monitor")
CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-05-16 19:45:18 +02:00
Konstantin Khorenko
cd43ccbfb5 bonding: show full hw address in sysfs for slave entries
[ Upstream commit 18bebc6dd3281955240062655a4df35eef2c46b3 ]

Bond expects ethernet hwaddr for its slave, but it can be longer than 6
bytes - infiniband interface for example.

 # cat /sys/devices/<skipped>/net/ib0/address
 80:00:02:08:fe:80:00:00:00:00:00:00:7c:fe:90:03:00:be:5d:e1

 # cat /sys/devices/<skipped>/net/ib0/bonding_slave/perm_hwaddr
 80:00:02:08:fe:80

So print full hwaddr in sysfs "bonding_slave/perm_hwaddr" as well.

Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2019-05-16 19:45:01 +02:00
Sabrina Dubroca
d8e18cccd2 bonding: fix event handling for stacked bonds
[ Upstream commit 92480b3977fd3884649d404cbbaf839b70035699 ]

When a bond is enslaved to another bond, bond_netdev_event() only
handles the event as if the bond is a master, and skips treating the
bond as a slave.

This leads to a refcount leak on the slave, since we don't remove the
adjacency to its master and the master holds a reference on the slave.

Reproducer:
  ip link add bondL type bond
  ip link add bondU type bond
  ip link set bondL master bondU
  ip link del bondL

No "Fixes:" tag, this code is older than git history.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-04-27 09:33:59 +02:00
Toni Peltonen
25b077464a bonding: fix 802.3ad state sent to partner when unbinding slave
[ Upstream commit 3b5b3a3331d141e8f2a7aaae3a94dfa1e61ecbe4 ]

Previously when unbinding a slave the 802.3ad implementation only told
partner that the port is not suitable for aggregation by setting the port
aggregation state from aggregatable to individual. This is not enough. If the
physical layer still stays up and we only unbinded this port from the bond there
is nothing in the aggregation status alone to prevent the partner from sending
traffic towards us. To ensure that the partner doesn't consider this
port at all anymore we should also disable collecting and distributing to
signal that this actor is going away. Also clear AD_STATE_SYNCHRONIZATION to
ensure partner exits collecting + distributing state.

I have tested this behaviour againts Arista EOS switches with mlx5 cards
(physical link stays up even when interface is down) and simulated
the same situation virtually Linux <-> Linux with two network namespaces
running two veth device pairs. In both cases setting aggregation to
individual doesn't alone prevent traffic from being to sent towards this
port given that the link stays up in partners end. Partner still keeps
it's end in collecting + distributing state and continues until timeout is
reached. In most cases this means we are losing the traffic partner sends
towards our port while we wait for timeout. This is most visible with slow
periodic time (LACP rate slow).

Other open source implementations like Open VSwitch and libreswitch, and
vendor implementations like Arista EOS, seem to disable collecting +
distributing to when doing similar port disabling/detaching/removing change.
With this patch kernel implementation would behave the same way and ensure
partner doesn't consider our actor viable anymore.

Signed-off-by: Toni Peltonen <peltzi@peltzi.fi>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-21 14:09:52 +01:00
Paolo Abeni
b1f4be0ee4 bonding: avoid defaulting hard_header_len to ETH_HLEN on slave removal
[ Upstream commit 19cdead3e2ef8ed765c5d1ce48057ca9d97b5094 ]

On slave list updates, the bonding driver computes its hard_header_len
as the maximum of all enslaved devices's hard_header_len.
If the slave list is empty, e.g. on last enslaved device removal,
ETH_HLEN is used.

Since the bonding header_ops are set only when the first enslaved
device is attached, the above can lead to header_ops->create()
being called with the wrong skb headroom in place.

If bond0 is configured on top of ipoib devices, with the
following commands:

ifup bond0
for slave in $BOND_SLAVES_LIST; do
	ip link set dev $slave nomaster
done
ping -c 1 <ip on bond0 subnet>

we will obtain a skb_under_panic() with a similar call trace:
	skb_push+0x3d/0x40
	push_pseudo_header+0x17/0x30 [ib_ipoib]
	ipoib_hard_header+0x4e/0x80 [ib_ipoib]
	arp_create+0x12f/0x220
	arp_send_dst.part.19+0x28/0x50
	arp_solicit+0x115/0x290
	neigh_probe+0x4d/0x70
	__neigh_event_send+0xa7/0x230
	neigh_resolve_output+0x12e/0x1c0
	ip_finish_output2+0x14b/0x390
	ip_finish_output+0x136/0x1e0
	ip_output+0x76/0xe0
	ip_local_out+0x35/0x40
	ip_send_skb+0x19/0x40
	ip_push_pending_frames+0x33/0x40
	raw_sendmsg+0x7d3/0xb50
	inet_sendmsg+0x31/0xb0
	sock_sendmsg+0x38/0x50
	SYSC_sendto+0x102/0x190
	SyS_sendto+0xe/0x10
	do_syscall_64+0x67/0x180
	entry_SYSCALL64_slow_path+0x25/0x25

This change addresses the issue avoiding updating the bonding device
hard_header_len when the slaves list become empty, forbidding to
shrink it below the value used by header_ops->create().

The bug is there since commit 54ef313714 ("[PATCH] bonding: Handle large
hard_header_len") but the panic can be triggered only since
commit fc791b633515 ("IB/ipoib: move back IB LL address into the hard
header").

Reported-by: Norbert P <noe@physik.uzh.ch>
Fixes: 54ef313714 ("[PATCH] bonding: Handle large hard_header_len")
Fixes: fc791b633515 ("IB/ipoib: move back IB LL address into the hard header")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-11-10 07:41:40 -08:00
Mahesh Bandewar
2691921c7d bonding: avoid possible dead-lock
[ Upstream commit d4859d749aa7090ffb743d15648adb962a1baeae ]

Syzkaller reported this on a slightly older kernel but it's still
applicable to the current kernel -

======================================================
WARNING: possible circular locking dependency detected
4.18.0-next-20180823+ #46 Not tainted
------------------------------------------------------
syz-executor4/26841 is trying to acquire lock:
00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652

but task is already holding lock:
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (rtnl_mutex){+.+.}:
       __mutex_lock_common kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
       mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
       rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
       bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
       bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
       process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
       process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
       worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
       kthread+0x35a/0x420 kernel/kthread.c:246
       ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415

-> #0 ((wq_completion)bond_dev->name){+.+.}:
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe

other info that might help us debug this:

Chain exists of:
  (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex

 Possible unsafe locking scenario:

       CPU0                    CPU1
       ----                    ----
  lock(rtnl_mutex);
                               lock((work_completion)(&(&nnw->work)->work));
                               lock(rtnl_mutex);
  lock((wq_completion)bond_dev->name);

 *** DEADLOCK ***

1 lock held by syz-executor4/26841:

stack backtrace:
CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:77 [inline]
 dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
 print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
 check_prev_add kernel/locking/lockdep.c:1862 [inline]
 check_prevs_add kernel/locking/lockdep.c:1975 [inline]
 validate_chain kernel/locking/lockdep.c:2416 [inline]
 __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
 lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
 flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
 drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
 destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
 __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
 bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
 register_netdevice+0x337/0x1100 net/core/dev.c:8410
 bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
 rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
 rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
 netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
 rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
 netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
 netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
 netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
 sock_sendmsg_nosec net/socket.c:622 [inline]
 sock_sendmsg+0xd5/0x120 net/socket.c:632
 ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
 __sys_sendmsg+0x11d/0x290 net/socket.c:2153
 __do_sys_sendmsg net/socket.c:2162 [inline]
 __se_sys_sendmsg net/socket.c:2160 [inline]
 __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457089
Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-10-20 09:52:36 +02:00
Xiangning Yu
b397cdd854 bonding: re-evaluate force_primary when the primary slave name changes
[ Upstream commit eb55bbf865d9979098c6a7a17cbdb41237ece951 ]

There is a timing issue under active-standy mode, when bond_enslave() is
called, bond->params.primary might not be initialized yet.

Any time the primary slave string changes, bond->force_primary should be
set to true to make sure the primary becomes the active slave.

Signed-off-by: Xiangning Yu <yuxiangning@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-07-03 11:21:25 +02:00
Debabrata Banerjee
8ff9368148 bonding: do not allow rlb updates to invalid mac
[ Upstream commit 4fa8667ca3989ce14cf66301fa251544fbddbdd0 ]

Make sure multicast, broadcast, and zero mac's cannot be the output of rlb
updates, which should all be directed arps. Receive load balancing will be
collapsed if any of these happen, as the switch will broadcast.

Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26 08:48:48 +02:00
Xin Long
654fecaea1 bonding: do not set slave_dev npinfo before slave_enable_netpoll in bond_enslave
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ]

After Commit 8a8efa22f5 ("bonding: sync netpoll code with bridge"), it
would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
if bond->dev->npinfo was set.

However now slave_dev npinfo is set with bond->dev->npinfo before calling
slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
It causes that the lower dev of this slave dev can't set its npinfo.

One way to reproduce it:

  # modprobe bonding
  # brctl addbr br0
  # brctl addif br0 eth1
  # ifconfig bond0 192.168.122.1/24 up
  # ifenslave bond0 eth2
  # systemctl restart netconsole
  # ifenslave bond0 br0
  # ifconfig eth2 down
  # systemctl restart netconsole

The netpoll won't really work.

This patch is to remove that slave_dev npinfo setting in bond_enslave().

Fixes: 8a8efa22f5 ("bonding: sync netpoll code with bridge")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-29 07:50:04 +02:00
Xin Long
6773caa6ac bonding: process the err returned by dev_set_allmulti properly in bond_enslave
[ Upstream commit 9f5a90c107741b864398f4ac0014711a8c1d8474 ]

When dev_set_promiscuity(1) succeeds but dev_set_allmulti(1) fails,
dev_set_promiscuity(-1) should be done before going to the err path.
Otherwise, dev->promiscuity will leak.

Fixes: 7e1a1ac1fb ("bonding: Check return of dev_set_promiscuity/allmulti")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:26 +02:00
Xin Long
efc484e07b bonding: move dev_mc_sync after master_upper_dev_link in bond_enslave
[ Upstream commit ae42cc62a9f07f1f6979054ed92606b9c30f4a2e ]

Beniamino found a crash when adding vlan as slave of bond which is also
the parent link:

  ip link add bond1 type bond
  ip link set bond1 up
  ip link add link bond1 vlan1 type vlan id 80
  ip link set vlan1 master bond1

The call trace is as below:

  [<ffffffffa850842a>] queued_spin_lock_slowpath+0xb/0xf
  [<ffffffffa8515680>] _raw_spin_lock+0x20/0x30
  [<ffffffffa83f6f07>] dev_mc_sync+0x37/0x80
  [<ffffffffc08687dc>] vlan_dev_set_rx_mode+0x1c/0x30 [8021q]
  [<ffffffffa83efd2a>] __dev_set_rx_mode+0x5a/0xa0
  [<ffffffffa83f7138>] dev_mc_sync_multiple+0x78/0x80
  [<ffffffffc084127c>] bond_enslave+0x67c/0x1190 [bonding]
  [<ffffffffa8401909>] do_setlink+0x9c9/0xe50
  [<ffffffffa8403bf2>] rtnl_newlink+0x522/0x880
  [<ffffffffa8403ff7>] rtnetlink_rcv_msg+0xa7/0x260
  [<ffffffffa8424ecb>] netlink_rcv_skb+0xab/0xc0
  [<ffffffffa83fe498>] rtnetlink_rcv+0x28/0x30
  [<ffffffffa8424850>] netlink_unicast+0x170/0x210
  [<ffffffffa8424bf8>] netlink_sendmsg+0x308/0x420
  [<ffffffffa83cc396>] sock_sendmsg+0xb6/0xf0

This is actually a dead lock caused by sync slave hwaddr from master when
the master is the slave's 'slave'. This dead loop check is actually done
by netdev_master_upper_dev_link. However, Commit 1f718f0f4f ("bonding:
populate neighbour's private on enslave") moved it after dev_mc_sync.

This patch is to fix it by moving dev_mc_sync after master_upper_dev_link,
so that this loop check would be earlier than dev_mc_sync. It also moves
if (mode == BOND_MODE_8023AD) into if (!bond_uses_primary) clause as an
improvement.

Note team driver also has this issue, I will fix it in another patch.

Fixes: 1f718f0f4f ("bonding: populate neighbour's private on enslave")
Reported-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:26 +02:00
Xin Long
92e782de7e bonding: fix the err path for dev hwaddr sync in bond_enslave
[ Upstream commit 5c78f6bfae2b10ff70e21d343e64584ea6280c26 ]

vlan_vids_add_by_dev is called right after dev hwaddr sync, so on
the err path it should unsync dev hwaddr. Otherwise, the slave
dev's hwaddr will never be unsync when this err happens.

Fixes: 1ff412ad77 ("bonding: change the bond's vlan syncing functions with the standard ones")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:26 +02:00
Nithin Sujir
9e1aa8a09b bonding: Don't update slave->link until ready to commit
[ Upstream commit 797a93647a48d6cb8a20641a86a71713a947f786 ]

In the loadbalance arp monitoring scheme, when a slave link change is
detected, the slave->link is immediately updated and slave_state_changed
is set. Later down the function, the rtnl_lock is acquired and the
changes are committed, updating the bond link state.

However, the acquisition of the rtnl_lock can fail. The next time the
monitor runs, since slave->link is already updated, it determines that
link is unchanged. This results in the bond link state permanently out
of sync with the slave link.

This patch modifies bond_loadbalance_arp_mon() to handle link changes
identical to bond_ab_arp_{inspect/commit}(). The new link state is
maintained in slave->new_link until we're ready to commit at which point
it's copied into slave->link.

NOTE: miimon_{inspect/commit}() has a more complex state machine
requiring the use of the bond_{propose,commit}_link_state() functions
which maintains the intermediate state in slave->link_new_state. The arp
monitors don't require that.

Testing: This bug is very easy to reproduce with the following steps.
1. In a loop, toggle a slave link of a bond slave interface.
2. In a separate loop, do ifconfig up/down of an unrelated interface to
create contention for rtnl_lock.
Within a few iterations, the bond link goes out of sync with the slave
link.

Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-04-13 19:50:12 +02:00
Eric Dumazet
e4d8f491c9 bonding: refine bond_fold_stats() wrap detection
[ Upstream commit 142c6594acbcc32391af9c15f8cd65c6c177698f ]

Some device drivers reset their stats at down/up events, possibly
fooling bonding stats, since they operate with relative deltas.

It is nearly not possible to fix drivers, since some of them compute the
tx/rx counters based on per rx/tx queue stats, and the queues can be
reconfigured (ethtool -L) between the down/up sequence.

Lets avoid accumulating 'negative' values that render bonding stats
useless.

It is better to lose small deltas, assuming the bonding stats are
fetched at a reasonable frequency.

Fixes: 5f0c5f73e5 ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <alexander.levin@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-03-22 09:23:22 +01:00
Hangbin Liu
3bb6245e14 bonding: discard lowest hash bit for 802.3ad layer3+4
[ Upstream commit b5f862180d7011d9575d0499fa37f0f25b423b12 ]

After commit 07f4c90062 ("tcp/dccp: try to not exhaust ip_local_port_range
in connect()"), we will try to use even ports for connect(). Then if an
application (seen clearly with iperf) opens multiple streams to the same
destination IP and port, each stream will be given an even source port.

So the bonding driver's simple xmit_hash_policy based on layer3+4 addressing
will always hash all these streams to the same interface. And the total
throughput will limited to a single slave.

Change the tcp code will impact the whole tcp behavior, only for bonding
usage. Paolo Abeni suggested fix this by changing the bonding code only,
which should be more reasonable, and less impact.

Fix this by discarding the lowest hash bit because it contains little entropy.
After the fix we can re-balance between slaves.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-11-24 08:32:24 +01:00
Mahesh Bandewar
f357a79839 bonding: Fix bonding crash
[ Upstream commit 24b27fc4cdf9e10c5e79e5923b6b7c2c5c95096c ]

Following few steps will crash kernel -

  (a) Create bonding master
      > modprobe bonding miimon=50
  (b) Create macvlan bridge on eth2
      > ip link add link eth2 dev mvl0 address aa:0:0:0:0:01 \
	   type macvlan
  (c) Now try adding eth2 into the bond
      > echo +eth2 > /sys/class/net/bond0/bonding/slaves
      <crash>

Bonding does lots of things before checking if the device enslaved is
busy or not.

In this case when the notifier call-chain sends notifications, the
bond_netdev_event() assumes that the rx_handler /rx_handler_data is
registered while the bond_enslave() hasn't progressed far enough to
register rx_handler for the new slave.

This patch adds a rx_handler check that can be performed right at the
beginning of the enslave code to avoid getting into this situation.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-09-30 10:18:36 +02:00
Beniamino Galvani
0020fa536c bonding: set carrier off for devices created through netlink
[ Upstream commit 005db31d5f5f7c31cfdc43505d77eb3ca5cf8ec6 ]

Commit e826eafa65 ("bonding: Call netif_carrier_off after
register_netdevice") moved netif_carrier_off() from bond_init() to
bond_create(), but the latter is called only for initial default
devices and ones created through sysfs:

 $ modprobe bonding
 $ echo +bond1 > /sys/class/net/bonding_masters
 $ ip link add bond2 type bond
 $ grep "MII Status" /proc/net/bonding/*
 /proc/net/bonding/bond0:MII Status: down
 /proc/net/bonding/bond1:MII Status: down
 /proc/net/bonding/bond2:MII Status: up

Ensure that carrier is initially off also for devices created through
netlink.

Signed-off-by: Beniamino Galvani <bgalvani@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-08-16 09:30:47 +02:00
Eric Dumazet
8178211eb7 bonding: fix bond_get_stats()
[ Upstream commit fe30937b65354c7fec244caebbdaae68e28ca797 ]

bond_get_stats() can be called from rtnetlink (with RTNL held)
or from /proc/net/dev seq handler (with RCU held)

The logic added in commit 5f0c5f73e5 ("bonding: make global bonding
stats more reliable") kind of assumed only one cpu could run there.

If multiple threads are reading /proc/net/dev, stats can be really
messed up after a while.

A second problem is that some fields are 32bit, so we need to properly
handle the wrap around problem.

Given that RTNL is not always held, we need to use
bond_for_each_slave_rcu().

Fixes: 5f0c5f73e5 ("bonding: make global bonding stats more reliable")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-20 15:42:04 +09:00
Jay Vosburgh
07cc96fb15 bonding: Fix ARP monitor validation
[ Upstream commit 21a75f0915dde8674708b39abfcda113911c49b1 ]

The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

	This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

	This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

	Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

	This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac71 ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:06 -08:00
Karl Heiss
1338c946e6 bonding: Prevent IPv6 link local address on enslaved devices
[ Upstream commit 03d84a5f83a67e692af00a3d3901e7820e3e84d5 ]

Commit 1f718f0f4f ("bonding: populate neighbour's private on enslave")
undoes the fix provided by commit c2edacf80e ("bonding / ipv6: no addrconf
for slaves separately from master") by effectively setting the slave flag
after the slave has been opened.  If the slave comes up quickly enough, it
will go through the IPv6 addrconf before the slave flag has been set and
will get a link local IPv6 address.

In order to ensure that addrconf knows to ignore the slave devices on state
change, set IFF_SLAVE before dev_open() during bonding enslavement.

Fixes: 1f718f0f4f ("bonding: populate neighbour's private on enslave")
Signed-off-by: Karl Heiss <kheiss@gmail.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Reviewed-by: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-01-31 11:29:00 -08:00
Jay Vosburgh
40baec2257 bonding: fix panic on non-ARPHRD_ETHER enslave failure
Since commit 7d5cd2ce529b, when bond_enslave fails on devices that
are not ARPHRD_ETHER, if needed, it resets the bonding device back to
ARPHRD_ETHER by calling ether_setup.

	Unfortunately, ether_setup clobbers dev->flags, clearing IFF_UP
if the bond device is up, leaving it in a quasi-down state without
having actually gone through dev_close.  For bonding, if any periodic
work queue items are active (miimon, arp_interval, etc), those will
remain running, as they are stopped by bond_close.  At this point, if
the bonding module is unloaded or the bond is deleted, the system will
panic when the work function is called.

	This panic is resolved by calling dev_close on the bond itself
prior to calling ether_setup.

Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Fixes: 7d5cd2ce52 ("bonding: correctly handle bonding type change on enslave failure")
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-07 13:17:32 -05:00
Mahesh Bandewar
52bc671681 bonding: simplify / unify event handling code for 3ad mode.
Old logic of updating state-machine is not required since
ad_update_actor_keys() does it implicitly. The only loss is
the notification differentiation between speed vs. duplex
change. Now only one unified notification is printed.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:52:24 -05:00
Mahesh Bandewar
7bb11dc9f5 bonding: unify all places where actor-oper key needs to be updated.
actor_admin, and actor_oper key is changed at multiple locations in
the code. This patch brings all those updates into one location in
an attempt to avoid possible inconsistent updates causing LACP state
machine to go in weird state.

The unified place is ad_update_actor_key() with simple state-machine
logic -
  (a) If port is "duplex" then only it can participate in LACP
  (b) Speed change reinitializes the LACP state-machine.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:52:24 -05:00
Mahesh Bandewar
b25c2e7d3c bonding: Simplify __get_duplex function.
Eliminate 'else' clause by simply initializing variable

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-02 22:52:24 -05:00
Eric Dumazet
e87eb4051e bonding: support encapsulated ipv6 TSO
If using a sixtofour device on top of a bonding device,
skb segmentation of TCP traffic is done right before calling
bonding xmit, because bonding only enables TSO for IPv4.

This patch improves single flow performance by about 120 % on my hosts,
because segmentation is deferred right before calling slave xmit.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-15 23:29:28 -07:00
Eric Dumazet
4b1b865e4e bonding: use l4 hash if available
If skb carries a l4 hash, no need to perform a flow dissection.

Performance is slightly better :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39012e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39393e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.39988e+06

After patch :

lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.43579e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44304e+06
lpaa5:~# ./super_netperf 200 -H lpaa6 -t TCP_RR -l 100
2.44312e+06

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-17 21:01:05 -07:00
Linus Torvalds
26d2177e97 Changes for 4.3
- Create drivers/staging/rdma
 - Move amso1100 driver to staging/rdma and schedule for deletion
 - Move ipath driver to staging/rdma and schedule for deletion
 - Add hfi1 driver to staging/rdma and set TODO for move to regular tree
 - Initial support for namespaces to be used on RDMA devices
 - Add RoCE GID table handling to the RDMA core caching code
 - Infrastructure to support handling of devices with differing
   read and write scatter gather capabilities
 - Various iSER updates
 - Kill off unsafe usage of global mr registrations
 - Update SRP driver
 - Misc. mlx4 driver updates
 - Support for the mr_alloc verb
 - Support for a netlink interface between kernel and user space cache
   daemon to speed path record queries and route resolution
 - Ininitial support for safe hot removal of verbs devices
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJV7v8wAAoJELgmozMOVy/d2dcP/3PXnGFPgFGJODKE6VCZtTvj
 nooNXRKXjxv470UT5DiAX7SNcBxzzS7Zl/Lj+831H9iNXUyzuH31KtBOAZ3W03vZ
 yXwCB2caOStSldTRSUUvPe2aIFPnyNmSpC4i6XcJLJMCFijKmxin5pAo8qE44BQU
 yjhT+wC9P6LL5wZXsn/nFIMLjOFfu0WBFHNp3gs5j59paxlx5VeIAZk16aQZH135
 m7YCyicwrS8iyWQl2bEXRMon2vlCHlX2RHmOJ4f/P5I0quNcGF2+d8Yxa+K1VyC5
 zcb3OBezz+wZtvh16yhsDfSPqHWirljwID2VzOgRSzTJWvQjju8VkwHtkq6bYoBW
 egIxGCHcGWsD0R5iBXLYr/tB+BmjbDObSm0AsR4+JvSShkeVA1IpeoO+19162ixE
 n6CQnk2jCee8KXeIN4PoIKsjRSbIECM0JliWPLoIpuTuEhhpajftlSLgL5hf1dzp
 HrSy6fXmmoRj7wlTa7DnYIC3X+ffwckB8/t1zMAm2sKnIFUTjtQXF7upNiiyWk4L
 /T1QEzJ2bLQckQ9yY4v528SvBQwA4Dy1amIQB7SU8+2S//bYdUvhysWPkdKC4oOT
 WlqS5PFDCI31MvNbbM3rUbMAD8eBAR8ACw9ZpGI/Rffm5FEX5W3LoxA8gfEBRuqt
 30ZYFuW8evTL+YQcaV65
 =EHLg
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma

Pull inifiniband/rdma updates from Doug Ledford:
 "This is a fairly sizeable set of changes.  I've put them through a
  decent amount of testing prior to sending the pull request due to
  that.

  There are still a few fixups that I know are coming, but I wanted to
  go ahead and get the big, sizable chunk into your hands sooner rather
  than waiting for those last few fixups.

  Of note is the fact that this creates what is intended to be a
  temporary area in the drivers/staging tree specifically for some
  cleanups and additions that are coming for the RDMA stack.  We
  deprecated two drivers (ipath and amso1100) and are waiting to hear
  back if we can deprecate another one (ehca).  We also put Intel's new
  hfi1 driver into this area because it needs to be refactored and a
  transfer library created out of the factored out code, and then it and
  the qib driver and the soft-roce driver should all be modified to use
  that library.

  I expect drivers/staging/rdma to be around for three or four kernel
  releases and then to go away as all of the work is completed and final
  deletions of deprecated drivers are done.

  Summary of changes for 4.3:

   - Create drivers/staging/rdma
   - Move amso1100 driver to staging/rdma and schedule for deletion
   - Move ipath driver to staging/rdma and schedule for deletion
   - Add hfi1 driver to staging/rdma and set TODO for move to regular
     tree
   - Initial support for namespaces to be used on RDMA devices
   - Add RoCE GID table handling to the RDMA core caching code
   - Infrastructure to support handling of devices with differing read
     and write scatter gather capabilities
   - Various iSER updates
   - Kill off unsafe usage of global mr registrations
   - Update SRP driver
   - Misc  mlx4 driver updates
   - Support for the mr_alloc verb
   - Support for a netlink interface between kernel and user space cache
     daemon to speed path record queries and route resolution
   - Ininitial support for safe hot removal of verbs devices"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (136 commits)
  IB/ipoib: Suppress warning for send only join failures
  IB/ipoib: Clean up send-only multicast joins
  IB/srp: Fix possible protection fault
  IB/core: Move SM class defines from ib_mad.h to ib_smi.h
  IB/core: Remove unnecessary defines from ib_mad.h
  IB/hfi1: Add PSM2 user space header to header_install
  IB/hfi1: Add CSRs for CONFIG_SDMA_VERBOSITY
  mlx5: Fix incorrect wc pkey_index assignment for GSI messages
  IB/mlx5: avoid destroying a NULL mr in reg_user_mr error flow
  IB/uverbs: reject invalid or unknown opcodes
  IB/cxgb4: Fix if statement in pick_local_ip6adddrs
  IB/sa: Fix rdma netlink message flags
  IB/ucma: HW Device hot-removal support
  IB/mlx4_ib: Disassociate support
  IB/uverbs: Enable device removal when there are active user space applications
  IB/uverbs: Explicitly pass ib_dev to uverbs commands
  IB/uverbs: Fix race between ib_uverbs_open and remove_one
  IB/uverbs: Fix reference counting usage of event files
  IB/core: Make ib_dealloc_pd return void
  IB/srp: Create an insecure all physical rkey only if needed
  ...
2015-09-09 08:33:31 -07:00
Tom Herbert
cd79a2382a flow_dissector: Add flags argument to skb_flow_dissector functions
The flags argument will allow control of the dissection process (for
instance whether to parse beyond L3).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-09-01 15:06:22 -07:00
David S. Miller
06fb4e701b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-08-30 21:45:01 -07:00
Matan Barak
e999869548 net/bonding: Export bond_option_active_slave_get_rcu
Some consumers of the netdev events API would like to know who is the
active slave when a NETDEV_CHANGEUPPER or NETDEV_BONDING_FAILOVER
events occur. For example, when managing RoCE GIDs, GIDs based on the
bond's ips should only be set on the port which corresponds to active
slave netdevice.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-08-30 18:08:50 -04:00
Nikolay Aleksandrov
b0d4943eec bonding: fix bond_poll_controller bh_enable warning
The problem is rcu_read_unlock_bh() which triggers a warning when irqs are
disabled. ndo_poll_controller should run with irqs disabled always so we
can drop the rcu_read_lock_bh.

[   98.502922] bond0: making interface eth1 the new active one
[   98.503039] ------------[ cut here ]------------
[   98.503039] WARNING: CPU: 0 PID: 1744 at kernel/softirq.c:150 __local_bh_enable_ip+0x96/0xc0()
[   98.503039] Modules linked in: bonding(OE) rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache netconsole ppdev joydev parport_pc serio_raw parport i2c_piix4 video acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc virtio_net e1000 ata_generic pcnet32 mii virtio_pci virtio_ring virtio pata_acpi
[   98.503039] CPU: 0 PID: 1744 Comm: ifenslave Tainted: G           OE   4.2.0-rc7+ #56
[   98.503039] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
[   98.503039]  0000000000000000 00000000e96ba230 ffff880020c236b8 ffffffff8183f105
[   98.503039]  0000000000000000 0000000000000000 ffff880020c236f8 ffffffff810a9496
[   98.503039]  ffff88002ea99e08 0000000000000200 ffffffffa02a8e06 ffff88002ea99e08
[   98.503039] Call Trace:
[   98.503039]  [<ffffffff8183f105>] dump_stack+0x4c/0x65
[   98.503039]  [<ffffffff810a9496>] warn_slowpath_common+0x86/0xc0
[   98.503039]  [<ffffffffa02a8e06>] ? bond_poll_controller+0x146/0x250 [bonding]
[   98.503039]  [<ffffffff810a95ca>] warn_slowpath_null+0x1a/0x20
[   98.503039]  [<ffffffff810ae376>] __local_bh_enable_ip+0x96/0xc0
[   98.503039]  [<ffffffffa02a8e2f>] bond_poll_controller+0x16f/0x250 [bonding]
[   98.503039]  [<ffffffffa02a8cf3>] ? bond_poll_controller+0x33/0x250 [bonding]
[   98.503039]  [<ffffffff810feaed>] ? trace_hardirqs_off+0xd/0x10
[   98.503039]  [<ffffffff81848afb>] ? _raw_spin_unlock_irqrestore+0x5b/0x60
[   98.503039]  [<ffffffff816ec48e>] netpoll_poll_dev+0x6e/0x350
[   98.503039]  [<ffffffff816eb977>] ? netpoll_start_xmit+0x137/0x1d0
[   98.503039]  [<ffffffff816b2e8b>] ? __alloc_skb+0x5b/0x210
[   98.503039]  [<ffffffff816ec89d>] netpoll_send_skb_on_dev+0x12d/0x2a0
[   98.503039]  [<ffffffff816eccde>] netpoll_send_udp+0x2ce/0x430
[   98.503039]  [<ffffffffa0190850>] write_msg+0xb0/0xf0 [netconsole]
[   98.503039]  [<ffffffff81116b63>] call_console_drivers.constprop.25+0x133/0x260
[   98.503039]  [<ffffffff81117934>] console_unlock+0x2f4/0x580
[   98.503039]  [<ffffffff81117ea5>] ? vprintk_emit+0x2e5/0x630
[   98.503039]  [<ffffffff81117ee5>] vprintk_emit+0x325/0x630
[   98.503039]  [<ffffffff81118379>] vprintk_default+0x29/0x40
[   98.503039]  [<ffffffff8183de4f>] printk+0x55/0x6b
[   98.503039]  [<ffffffff816c754c>] __netdev_printk+0x16c/0x260
[   98.503039]  [<ffffffff816c7a12>] netdev_info+0x62/0x80
[   98.503039]  [<ffffffffa02ab464>] bond_change_active_slave+0x134/0x6a0 [bonding]
[   98.503039]  [<ffffffffa02aba95>] bond_select_active_slave+0xc5/0x310 [bonding]
[   98.503039]  [<ffffffffa02aeb78>] bond_enslave+0x1088/0x10c0 [bonding]
[   98.503039]  [<ffffffffa02af46b>] bond_do_ioctl+0x37b/0x400 [bonding]
[   98.503039]  [<ffffffff81101d8d>] ? trace_hardirqs_on+0xd/0x10
[   98.503039]  [<ffffffff816dc437>] ? rtnl_lock+0x17/0x20
[   98.503039]  [<ffffffff816e5fd1>] dev_ifsioc+0x331/0x3e0
[   98.503039]  [<ffffffff816e62dc>] dev_ioctl+0xec/0x6c0
[   98.503039]  [<ffffffff816a6c6a>] sock_do_ioctl+0x4a/0x60
[   98.503039]  [<ffffffff816a7300>] sock_ioctl+0x1c0/0x250
[   98.503039]  [<ffffffff81271bfe>] do_vfs_ioctl+0x2ee/0x540
[   98.503039]  [<ffffffff810fd943>] ? up_read+0x23/0x40
[   98.503039]  [<ffffffff81070993>] ? __do_page_fault+0x1d3/0x420
[   98.503039]  [<ffffffff8127e246>] ? __fget_light+0x66/0x90
[   98.503039]  [<ffffffff81271ec9>] SyS_ioctl+0x79/0x90
[   98.503039]  [<ffffffff8184936e>] entry_SYSCALL_64_fastpath+0x12/0x76
[   98.503039] ---[ end trace 00cfa804b0670051 ]---

Fixes: 616f45416c ("bonding: implement bond_poll_controller()")
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-28 22:25:43 -07:00
Phil Sutter
1e6f20ca6c net: bonding: convert to using IFF_NO_QUEUE
Signed-off-by: Phil Sutter <phil@nwl.cc>
Cc: Jay Vosburgh <j.vosburgh@gmail.com>
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-18 11:55:06 -07:00
David S. Miller
182ad468e7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/cavium/Kconfig

The cavium conflict was overlapping dependency
changes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-13 16:23:11 -07:00
Venkat Venkatsubra
b02e3e948d bonding: Gratuitous ARP gets dropped when first slave added
When the first slave is added (such as during bootup) the first
gratuitous ARP gets dropped. We don't see this drop during a failover.
The packet gets dropped in qdisc (noop_enqueue).

The fix is to delay the sending of gratuitous ARPs till the bond dev's
carrier is present.

It can also be worked around by setting num_grat_arp to more than 1.

Signed-off-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-08-12 14:37:33 -07:00
Nikolay Aleksandrov
0f7bffd9e5 bonding: add tlb_dynamic_lb netlink support
tlb_dynamic_lb could be set only via sysfs, this patch allows it to be
set via netlink.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-31 15:35:55 -07:00
Nikolay Aleksandrov
205845a347 bonding: convert num_grat_arp to the new bonding option API
num_grat_arp wasn't converted to the new bonding option API, so do this
now and remove the specific sysfs store option in order to use the
standard one. num_grat_arp is the same as num_unsol_na so add it as an
alias with the same option settings. An important difference is the option
name which is matched in bond_sysfs_store_option().

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Acked-by: Veaceslav Falico <vfalico@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-27 01:05:24 -07:00
David S. Miller
c5e40ee287 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/bridge/br_mdb.c

br_mdb.c conflict was a function call being removed to fix a bug in
'net' but whose signature was changed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-23 00:41:16 -07:00
dingtianhong
a951bc1e6b bonding: correct the MAC address for "follow" fail_over_mac policy
The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:

1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
   bond0 has the same mac address with eth0, it is MAC1.

2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
   eth1 is backup, eth1 has MAC2.

3) ifconfig eth0 down
   eth1 became active slave, bond will swap MAC for eth0 and eth1,
   so eth1 has MAC1, and eth0 has MAC2.

4) ifconfig eth1 down
   there is no active slave, and eth1 still has MAC1, eth2 has MAC2.

5) ifconfig eth0 up
   the eth0 became active slave again, the bond set eth0 to MAC1.

Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.

This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 20:29:40 -07:00
Nikolay Aleksandrov
7d5cd2ce52 bonding: correctly handle bonding type change on enslave failure
If the bond is enslaving a device with different type it will be setup
by it, but if after being setup the enslave fails the bond doesn't
switch back its type and also keeps pointers to foreign structures that can
be long gone. Thus revert back any type changes if the enslave failed and
the bond had to change its type.
Example:
 Before patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default
    link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
    link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
(notice the MASTER flag is gone)

 After patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
    link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: e36b9d16c6 ("bonding: clean muticast addresses when device changes type")
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 16:23:06 -07:00
Nikolay Aleksandrov
06f6d1094a bonding: fix destruction of bond with devices different from arphrd_ether
When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[  908.963051] eql: event: 9
[  908.963052] eql: IFF_SLAVE
[  908.963054] eql: event: 2
[  908.963056] eql: IFF_SLAVE
[  908.963058] eql: event: 6
[  908.963059] eql: IFF_SLAVE
[  908.963110] bond0: Releasing active interface eql
[  908.976168] bond0: Destroying bond bond0
[  908.976266] bond0 (unregistering): Released all slaves
[  908.984097] ------------[ cut here ]------------
[  908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[  908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[  908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]

[  908.984168] CPU: 0 PID: 1787 Comm: rmmod Tainted: G        W  O
4.2.0-rc2+ #8
[  908.984170] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[  908.984172]  0000000000000000 ffffffff81732d41 ffffffff81525b34
ffff8800358dfda8
[  908.984175]  ffffffff8106c521 ffff88003595af78 ffff88003595af40
ffff88003e3a4280
[  908.984178]  ffffffffa058d040 0000000000000000 ffffffff8106c59a
ffffffff8172ebd0
[  908.984181] Call Trace:
[  908.984188]  [<ffffffff81525b34>] ? dump_stack+0x40/0x50
[  908.984193]  [<ffffffff8106c521>] ? warn_slowpath_common+0x81/0xb0
[  908.984196]  [<ffffffff8106c59a>] ? warn_slowpath_fmt+0x4a/0x50
[  908.984199]  [<ffffffff81218352>] ? remove_proc_entry+0x112/0x160
[  908.984205]  [<ffffffffa05850e6>] ? bond_destroy_proc_dir+0x26/0x30
[bonding]
[  908.984208]  [<ffffffffa057540e>] ? bond_net_exit+0x8e/0xa0 [bonding]
[  908.984217]  [<ffffffff8142f407>] ? ops_exit_list.isra.4+0x37/0x70
[  908.984225]  [<ffffffff8142f52d>] ?
unregister_pernet_operations+0x8d/0xd0
[  908.984228]  [<ffffffff8142f58d>] ?
unregister_pernet_subsys+0x1d/0x30
[  908.984232]  [<ffffffffa0585269>] ? bonding_exit+0x23/0xdba [bonding]
[  908.984236]  [<ffffffff810e28ba>] ? SyS_delete_module+0x18a/0x250
[  908.984241]  [<ffffffff81086f99>] ? task_work_run+0x89/0xc0
[  908.984244]  [<ffffffff8152b732>] ?
entry_SYSCALL_64_fastpath+0x16/0x75
[  908.984247] ---[ end trace 7c006ed4abbef24b ]---

Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit
f939981492 ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.

Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Fixes: a64d49c3dd ("bonding: Manage /proc/net/bonding/ entries from
                      the netdev events")
Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:56:11 -07:00
Nikolay Aleksandrov
22f94e6256 bonding: trivial: remove unused variables
Get rid of these:
drivers/net/bonding//bond_main.c: In function ‘bond_update_slave_arr’:
drivers/net/bonding//bond_main.c:3754:6: warning: variable
‘slaves_in_agg’ set but not used [-Wunused-but-set-variable]
  int slaves_in_agg;
      ^
  CC [M]  drivers/net/bonding//bond_3ad.o
drivers/net/bonding//bond_3ad.c: In function
‘ad_marker_response_received’:
drivers/net/bonding//bond_3ad.c:1870:61: warning: parameter ‘marker’
set but not used [-Wunused-but-set-parameter]
 static void ad_marker_response_received(struct bond_marker *marker,
                                                             ^
drivers/net/bonding//bond_3ad.c:1871:19: warning: parameter ‘port’ set
but not used [-Wunused-but-set-parameter]
      struct port *port)
                   ^

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-20 12:50:32 -07:00
Mazhar Rana
b5a983f314 bonding: "primary_reselect" with "failure" is not working properly
When "primary_reselect" is set to "failure", primary interface should
not become active until current active slave is down. But if we set first
member of bond device as a "primary" interface and "primary_reselect"
is set to "failure" then whenever primary interface's link get back(up)
it become active slave even if current active slave is still up.

With this patch, "bond_find_best_slave" will not traverse members if
primary interface is not candidate for failover/reselection and current
active slave is still up.

Signed-off-by: Mazhar Rana <mazhar.rana@cyberoam.com>
Signed-off-by: Jay Vosburgh <j.vosburgh@gmail.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-07-08 16:06:08 -07:00
Mahesh Bandewar
4cd6b47544 bonding: Display LACP info only to CAP_NET_ADMIN capable user
Actor and Partner details can be accessed via proc-fs, sys-fs
entries or netlink interface. These interfaces are world readable
at this moment. The earlier patch-series made the LACP communication
secure to avoid nuisance attack from within the same L2 domain but
it did not prevent "someone unprivileged" looking at that information
on host and perform the same act.

This patch essentially avoids spitting those entries if the user
in question does not have enough privileges.

Signed-off-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-23 03:11:52 -07:00
Nikolay Aleksandrov
46ea297ed6 bonding: export slave's partner_oper_port_state via sysfs and netlink
Export the partner_oper_port_state of each port via sysfs and netlink.
In 802.3ad mode it is valuable for the user to be able to check the
partner_oper state, it is already exported via bond's proc entry.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:40:24 -07:00
Nikolay Aleksandrov
254cb6dbfd bonding: export slave's actor_oper_port_state via sysfs and netlink
Export the actor_oper_port_state of each port via sysfs and netlink.
In 802.3ad mode it is valuable for the user to be able to check the
actor_oper state, it is already exported via bond's proc entry.

Signed-off-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-15 16:40:24 -07:00
Tom Herbert
c3f8324188 net: Add full IPv6 addresses to flow_keys
This patch adds full IPv6 addresses into flow_keys and uses them as
input to the flow hash function. The implementation supports either
IPv4 or IPv6 addresses in a union, and selector is used to determine
how may words to input to jhash2.

We also add flow_get_u32_dst and flow_get_u32_src functions which are
used to get a u32 representation of the source and destination
addresses. For IPv6, ipv6_addr_hash is called. These functions retain
getting the legacy values of src and dst in flow_keys.

With this patch, Ethertype and IP protocol are now included in the
flow hash input.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-06-04 15:44:30 -07:00