Commit graph

415041 commits

Author SHA1 Message Date
Eric Dumazet
d4589926d7 tcp: refine TSO splits
While investigating performance problems on small RPC workloads,
I noticed linux TCP stack was always splitting the last TSO skb
into two parts (skbs). One being a multiple of MSS, and a small one
with the Push flag. This split is done even if TCP_NODELAY is set,
or if no small packet is in flight.

Example with request/response of 4K/4K

IP A > B: . ack 68432 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 65537:68433(2896) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 68433:69633(1200) ack 69632 win 2783 <nop,nop,timestamp 6524593 6525001>
IP B > A: . ack 68433 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: . 69632:72528(2896) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP B > A: P 72528:73728(1200) ack 69633 win 2768 <nop,nop,timestamp 6525001 6524593>
IP A > B: . ack 72528 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: . 69633:72529(2896) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>
IP A > B: P 72529:73729(1200) ack 73728 win 2783 <nop,nop,timestamp 6524593 6525001>

We can avoid this split by including the Nagle tests at the right place.

Note : If some NIC had trouble sending TSO packets with a partial
last segment, we would have hit the problem in GRO/forwarding workload already.

tcp_minshall_update() is moved to tcp_output.c and is updated as we might
feed a TSO packet with a partial last segment.

This patch tremendously improves performance, as the traffic now looks
like :

IP A > B: . ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP A > B: P 94209:98305(4096) ack 98304 win 2783 <nop,nop,timestamp 6834277 6834685>
IP B > A: . ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP B > A: P 98304:102400(4096) ack 98305 win 2768 <nop,nop,timestamp 6834686 6834277>
IP A > B: . ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP A > B: P 98305:102401(4096) ack 102400 win 2783 <nop,nop,timestamp 6834279 6834686>
IP B > A: . ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP B > A: P 102400:106496(4096) ack 102401 win 2768 <nop,nop,timestamp 6834687 6834279>
IP A > B: . ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP A > B: P 102401:106497(4096) ack 106496 win 2783 <nop,nop,timestamp 6834280 6834687>
IP B > A: . ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>
IP B > A: P 106496:110592(4096) ack 106497 win 2768 <nop,nop,timestamp 6834688 6834280>

Before :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280774

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     205719.049006 task-clock                #    9.278 CPUs utilized
         8,449,968 context-switches          #    0.041 M/sec
         1,935,997 CPU-migrations            #    0.009 M/sec
           160,541 page-faults               #    0.780 K/sec
   548,478,722,290 cycles                    #    2.666 GHz                     [83.20%]
   455,240,670,857 stalled-cycles-frontend   #   83.00% frontend cycles idle    [83.48%]
   272,881,454,275 stalled-cycles-backend    #   49.75% backend  cycles idle    [66.73%]
   166,091,460,030 instructions              #    0.30  insns per cycle
                                             #    2.74  stalled cycles per insn [83.39%]
    29,150,229,399 branches                  #  141.699 M/sec                   [83.30%]
     1,943,814,026 branch-misses             #    6.67% of all branches         [83.32%]

      22.173517844 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   16851063           0.0
IpExtOutOctets                  23878580777        0.0

After patch :

lpq83:~# nstat >/dev/null;perf stat ./super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K
280877

 Performance counter stats for './super_netperf 200 -t TCP_RR -H lpq84 -l 20 -- -r 4K,4K':

     107496.071918 task-clock                #    4.847 CPUs utilized
         5,635,458 context-switches          #    0.052 M/sec
         1,374,707 CPU-migrations            #    0.013 M/sec
           160,920 page-faults               #    0.001 M/sec
   281,500,010,924 cycles                    #    2.619 GHz                     [83.28%]
   228,865,069,307 stalled-cycles-frontend   #   81.30% frontend cycles idle    [83.38%]
   142,462,742,658 stalled-cycles-backend    #   50.61% backend  cycles idle    [66.81%]
    95,227,712,566 instructions              #    0.34  insns per cycle
                                             #    2.40  stalled cycles per insn [83.43%]
    16,209,868,171 branches                  #  150.795 M/sec                   [83.20%]
       874,252,952 branch-misses             #    5.39% of all branches         [83.37%]

      22.175821286 seconds time elapsed

lpq83:~# nstat | egrep "IpOutRequests|IpExtOutOctets"
IpOutRequests                   11239428           0.0
IpExtOutOctets                  23595191035        0.0

Indeed, the occupancy of tx skbs (IpExtOutOctets/IpOutRequests) is higher :
2099 instead of 1417, thus helping GRO to be more efficient when using FQ packet
scheduler.

Many thanks to Neal for review and ideas.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Cc: Van Jacobson <vanj@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Tested-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:15:25 -05:00
stephen hemminger
477bb93320 net: remove dead code for add/del multiple
These function to manipulate multiple addresses are not used anywhere
in current net-next tree. Some out of tree code maybe using these but
too bad; they should submit their code upstream..

Also, make __hw_addr_flush local since only used by dev_addr_lists.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:14:04 -05:00
David S. Miller
6ea09d8a09 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
Please pull this batch of updates for the 3.14 stream...

For the Bluetooth bits, Gustavo says:

"This is the first batch of patches intended for 3.14. There is
nothing big here.  Most of the code are refactors, clean up, small
fixes, plus some new device id support."

And...

"More patches to 3.14. Here we have the support for Low Energy
Connection Oriented Channels (LE CoC). Basically, as the name says,
this adds supports for connection oriented channels in the same way
we already have them for BR/EDR connections so profiles/protocols
that work on top of BR/EDR can now work on LE plus a plenty of new
possibilities for LE."

For the ath10k bits, Kalle says:

"Janusz and Marek implemented DFS support to ath10k, but the code is
not enabled yet due to missing cfg80211/mac80211 patches (it will be
enabled in the next pull request). Michal did some device reset fixes
and made it possible for ath10k to share an interrupt with another
device. And lots of smaller fixes from different people."

For the iwlwifi bits, Emmanuel says:

"I have here a big rework of the rate control by Eyal. This is obviously
the biggest part of this batch.
I also have enhancement of protection flags by Avri and a few bits for
WoWLAN by Eliad and Luca. Johannes cleans up the debugfs plus a few
fixes. I provided a few things for Bluetooth coexistence.
Besides this we have an implementation for low priority scan."

Along with all that, there are big batches of updates to mwifiex and
ath9k, Jeff Kirsher's FSF address fix patches, and a handful of other
bits here and there.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:08:17 -05:00
David S. Miller
7089fdd814 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
The following patchset contains two Netfilter fixes for your net
tree, they are:

* Fix endianness in nft_reject, the NFTA_REJECT_TYPE netlink attributes
  was not converted to network byte order as needed by all nfnetlink
  subsystems, from Eric Leblond.

* Restrict SYNPROXY target to INPUT and FORWARD chains, this avoid a
  possible crash due to misconfigurations, from Patrick McHardy.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:06:20 -05:00
Sasha Levin
37ab4fa784 net: unix: allow bind to fail on mutex lock
This is similar to the set_peek_off patch where calling bind while the
socket is stuck in unix_dgram_recvmsg() will block and cause a hung task
spew after a while.

This is also the last place that did a straightforward mutex_lock(), so
there shouldn't be any more of these patches.

Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 15:04:42 -05:00
Steffen Klassert
d1fc502476 MAINTAINERS: Update the IPsec maintainer entry
Add the IPsec git trees and some pure IPsec modules
to the IPsec section in the MAINTAINERS file.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:59:11 -05:00
Eric Dumazet
e47eb5dfb2 udp: ipv4: do not use sk_dst_lock from softirq context
Using sk_dst_lock from softirq context is not supported right now.

Instead of adding BH protection everywhere,
udp_sk_rx_dst_set() can instead use xchg(), as suggested
by David.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 9750223102 ("udp: ipv4: must add synchronization in udp_sk_rx_dst_set()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:50:58 -05:00
Linus Torvalds
0eda4020ae GPIO fixes for the v3.13 development cycle:
- Driver bug fixes for SH PFC, TWL4030, MSM and RCAR.
 
 - Update the MAINTAINERS
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.15 (GNU/Linux)
 
 iQIcBAABAgAGBQJSrsP1AAoJEEEQszewGV1zyn4P/j/tepOI0iPnEeUqc/Ah9lxd
 lhZJytVH1mHpaf/4HJgmRayZrwria4fX6NEUwoIcZyuETLstvk2RnI6VdmuCRqBL
 Oys/xxS3B55CIo36RHV1qADojPoNdK4lAbKKfklyJBnsDK7ozh+vPiZPANs0tzRS
 ELaceaa1jX3O+ht3LvCbl8lguxu+6FpzzpQuFt/FCJTB4HuyTreg+wB7VWyoqsC7
 Ry6BCKBjN8lewT8fvVkAdAeHaERkkCNxjBPDj4xU14hynWtgWikR0SOaFF7f5n2T
 8dJKAqPIiVQP1Lb1WSJ6zEY1mCA7WjsOgudaDnF3vqPIMBaZKBM96cQHpI/JMdPU
 VzrxC+k5BWxJC/XxE/AsES+Z949ctXFjiyj7b6SmcIU1rV46KpUuDLXwkbXSWs3G
 FHWKIunHnHEmwqIDi+nDSCb7kdgxnboUZyiZJFr7s5XIJTqi7wv+wzSoIRP8N9w3
 jNjWmrezJvP6YLdNBKh5jVDhY8XLcXr/U1QxP59tveO1isNwRvFJJG3zlvLwNhpG
 ZGbqkygwoo3AvXc87ZLXFHIEf/vpOWr7ObWpuhF9OXoeWqmTaqD84XxUJSb7myPF
 /XNIhjFLQ6kzHe5xivizLFatEXA5Oy9Rz2zTiWFZgbTciBRXU8fkoI56jcfoRwWG
 yS/PxsVjr0FqrYsBvMJH
 =V2Cs
 -----END PGP SIGNATURE-----

Merge tag 'gpio-v3.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

Pull GPIO fixes from Linus Walleij:
 "All but one are long-standing bug fixes that are also tagged for
  stable

   - Driver bug fixes for SH PFC, TWL4030, MSM and RCAR.

   - Update the MAINTAINERS"

* tag 'gpio-v3.13-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: rcar: Fix level interrupt handling
  gpio: msm: Fix irq mask/unmask by writing bits instead of numbers
  gpio: twl4030: Fix regression for twl gpio LED output
  sh-pfc: Fix PINMUX_GPIO macro
  MAINTAINERS: update GPIO maintainers entry
2013-12-17 11:47:40 -08:00
Linus Torvalds
a5905a9205 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull two Ceph fixes from Sage Weil:
 "One of these is fixing a regression from the d_flags file type patch
  that went into -rc1 that broke instantiation of inodes and dentries
  (we were doing dentries first).  The other is just an off-by-one
  corner case"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: Avoid data inconsistency due to d-cache aliasing in readpage()
  ceph: initialize inode before instantiating dentry
2013-12-17 11:46:51 -08:00
Jason Wang
50dc875f2e netvsc: don't flush peers notifying work during setting mtu
There's a possible deadlock if we flush the peers notifying work during setting
mtu:

[   22.991149] ======================================================
[   22.991173] [ INFO: possible circular locking dependency detected ]
[   22.991198] 3.10.0-54.0.1.el7.x86_64.debug #1 Not tainted
[   22.991219] -------------------------------------------------------
[   22.991243] ip/974 is trying to acquire lock:
[   22.991261]  ((&(&net_device_ctx->dwork)->work)){+.+.+.}, at: [<ffffffff8108af95>] flush_work+0x5/0x2e0
[   22.991307]
but task is already holding lock:
[   22.991330]  (rtnl_mutex){+.+.+.}, at: [<ffffffff81539deb>] rtnetlink_rcv+0x1b/0x40
[   22.991367]
which lock already depends on the new lock.

[   22.991398]
the existing dependency chain (in reverse order) is:
[   22.991426]
-> #1 (rtnl_mutex){+.+.+.}:
[   22.991449]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[   22.991477]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[   22.991501]        [<ffffffff81673659>] mutex_lock_nested+0x89/0x4f0
[   22.991529]        [<ffffffff815392b7>] rtnl_lock+0x17/0x20
[   22.991552]        [<ffffffff815230b2>] netdev_notify_peers+0x12/0x30
[   22.991579]        [<ffffffffa0340212>] netvsc_send_garp+0x22/0x30 [hv_netvsc]
[   22.991610]        [<ffffffff8108d251>] process_one_work+0x211/0x6e0
[   22.991637]        [<ffffffff8108d83b>] worker_thread+0x11b/0x3a0
[   22.991663]        [<ffffffff81095e5d>] kthread+0xed/0x100
[   22.991686]        [<ffffffff81681c6c>] ret_from_fork+0x7c/0xb0
[   22.991715]
-> #0 ((&(&net_device_ctx->dwork)->work)){+.+.+.}:
[   22.991715]        [<ffffffff810de817>] check_prevs_add+0x967/0x970
[   22.991715]        [<ffffffff810dfdd9>] __lock_acquire+0xb19/0x1260
[   22.991715]        [<ffffffff810e0d12>] lock_acquire+0xa2/0x1f0
[   22.991715]        [<ffffffff8108afde>] flush_work+0x4e/0x2e0
[   22.991715]        [<ffffffff8108e1b5>] __cancel_work_timer+0x95/0x130
[   22.991715]        [<ffffffff8108e303>] cancel_delayed_work_sync+0x13/0x20
[   22.991715]        [<ffffffffa03404e4>] netvsc_change_mtu+0x84/0x200 [hv_netvsc]
[   22.991715]        [<ffffffff815233d4>] dev_set_mtu+0x34/0x80
[   22.991715]        [<ffffffff8153bc2a>] do_setlink+0x23a/0xa00
[   22.991715]        [<ffffffff8153d054>] rtnl_newlink+0x394/0x5e0
[   22.991715]        [<ffffffff81539eac>] rtnetlink_rcv_msg+0x9c/0x260
[   22.991715]        [<ffffffff8155cdd9>] netlink_rcv_skb+0xa9/0xc0
[   22.991715]        [<ffffffff81539dfa>] rtnetlink_rcv+0x2a/0x40
[   22.991715]        [<ffffffff8155c41d>] netlink_unicast+0xdd/0x190
[   22.991715]        [<ffffffff8155c807>] netlink_sendmsg+0x337/0x750
[   22.991715]        [<ffffffff8150d219>] sock_sendmsg+0x99/0xd0
[   22.991715]        [<ffffffff8150d63e>] ___sys_sendmsg+0x39e/0x3b0
[   22.991715]        [<ffffffff8150eba2>] __sys_sendmsg+0x42/0x80
[   22.991715]        [<ffffffff8150ebf2>] SyS_sendmsg+0x12/0x20
[   22.991715]        [<ffffffff81681d19>] system_call_fastpath+0x16/0x1b

This is because we hold the rtnl_lock() before ndo_change_mtu() and try to flush
the work in netvsc_change_mtu(), in the mean time, netdev_notify_peers() may be
called from worker and also trying to hold the rtnl_lock. This will lead the
flush won't succeed forever. Solve this by not canceling and flushing the work,
this is safe because the transmission done by NETDEV_NOTIFY_PEERS was
synchronized with the netif_tx_disable() called by netvsc_change_mtu().

Reported-by: Yaju Cao <yacao@redhat.com>
Tested-by: Yaju Cao <yacao@redhat.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:45:42 -05:00
Linus Torvalds
4ddebaf42d Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc fixes from Ben Herrenschmidt:
 "Uli's patch fixes a regression in ptrace caused by a mis-merge of a
  previous LE patch.  The rest are all more endian fixes, all fairly
  trivial, found during testing of 3.13-rc's"

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
  powerpc/powernv: Fix OPAL LPC access in Little Endian
  powerpc/powernv: Fix endian issue in opal_xscom_read
  powerpc: Fix endian issues in crash dump code
  powerpc/pseries: Fix endian issues in MSI code
  powerpc/pseries: Fix PCIE link speed endian issue
  powerpc/pseries: Fix endian issues in nvram code
  powerpc/pseries: Fix endian issues in /proc/ppc64/lparcfg
  powerpc: Fix topology core_id endian issue on LE builds
  powerpc: Fix endian issue in setup-common.c
  powerpc: PTRACE_PEEKUSR always returns FPR0
2013-12-17 11:43:46 -08:00
David S. Miller
b80b376c44 Merge branch 'phy_power'
Sebastian Hesselbarth says:

====================
net: phy: Ethernet PHY powerdown optimization

This is v2 of the ethernet PHY power optimization patches to reduce
power consumption of network PHYs with link that are either unused or
the corresponding netdev is down.

Compared to the last version, this patch set drops a patch to disable
unused PHYs after late initcall, as it is not compatible with a modular
mdio bus [1]. I'll investigate different ways to have a modular mdio bus
driver get notified when driver loading is done.

Again, a branch with v2 applied to v3.13-rc2 can also be found at
https://github.com/shesselba/linux-dove.git topic/ethphy-power-v2

[1] http://www.spinics.net/lists/arm-kernel/msg293028.html
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:57 -05:00
Sebastian Hesselbarth
be9dad1f9f net: phy: suspend phydev when going to HALTED
When phydev is going to HALTED state, we can try to suspend it to
safe more power. phy_suspend helper will check if PHY can be suspended,
so just call it when entering HALTED state.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:44 -05:00
Sebastian Hesselbarth
1211ce5307 net: phy: resume/suspend PHYs on attach/detach
This ensures PHYs are resumed on attach and suspended on detach.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:44 -05:00
Sebastian Hesselbarth
481b5d938b net: phy: provide phy_resume/phy_suspend helpers
This adds helper functions to resume and suspend a given phy_device
by calling the corresponding driver callbacks if available.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:44 -05:00
Sebastian Hesselbarth
0898b448b3 net: phy: marvell: provide genphy suspend/resume
Marvell PHYs support generic PHY suspend/resume, so provide those
callbacks to all marvell specific drivers.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:44 -05:00
Sebastian Hesselbarth
58911151aa net: mv643xx_eth: properly start/stop phy device
When using phydev, it should be phy_start/phy_stop'ed properly. This
driver doesn't do that, so add the corresponding calls to port_start/
stop respectively.

Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:42:43 -05:00
wangweidong
be78cfcb25 sctp: Reorder 'struc association' members to reduce its size
Members of 'struct association' are not in appropriate order to
reuse compiler added padding on 64bit architectures. In this patch
we reorder those struct members and help reduce the size of the
structure from 2776 bytes to 2720 bytes on 64 bit architectures.

Signed-off-by: Wang Weidong <wangweidong1@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:32:43 -05:00
David S. Miller
e437931010 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates

This series contains updates to i40e only (again).

Jesse provides a fix for when tx_rings structure is NULL and we do not want
to panic. Then refactors the flow control set up and disables L2 flow control
by default.  Provides some trivial fixes as well as prevent compiler warnings.
Then to align to similar behaviour in ixgbe, use the total number of CPUs in
the system to suggest the number of transmit and receive queue pairs.

Shannon provides a i40e ethtool fix to get some more reasonable information
reports back out to the ethtool.  In addition, fixes PF reset after offline
test, where it reorders the test to put the register test last as it is the
only one that needs a reset, and we wait to trigger the reset until after we
clear the testing bit.  Lastly provides basic support for handling suspend
and resume for now, later on Wake-On-LAN support will be added.

Anjali provides changes to tell the stack about our actual number of queues
in order for RFS/RPS/XFS to work correctly.  Then provides several patches to
implement dynamically changing the queue count for the main VSI.  Adds
basic support for get/set channels for RSS so that the number of receive and
transmit queue pair can be changed via ethtool.  Cleans up the use of
rtnl_lock in the reset patch since it runs from a work time.

Neerav Parikh cleans up the VF interface to remove FCoE code as this
feature will not be supported on VF interfaces.

v2:
  - submitted patch 1 to net (since it was a fix needed for net), so dropped
    from this series (this patch will get added to net-next when Dave syncs
    his trees)
  - Dropped patches 4 & 11 from previous submission because of feedback
    received from Ben Hutchings and Sergei Shtylyov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:31:17 -05:00
Josh Boyer
f447ef4a56 cpupower: Fix segfault due to incorrect getopt_long arugments
If a user calls 'cpupower set --perf-bias 15', the process will end with
a SIGSEGV in libc because cpupower-set passes a NULL optarg to the atoi
call.  This is because the getopt_long structure currently has all of
the options as having an optional_argument when they really have a
required argument.  We change the structure to use required_argument to
match the short options and it resolves the issue.

This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1000439

Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Cc: Thomas Renninger <trenn@suse.de>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-12-17 11:29:30 -08:00
David S. Miller
bc4d0f615a Merge branch 'ovs_hash'
Francesco Fusco says:

====================
ovs: introduce arch-specific fast hashing improvements

From: Daniel Borkmann <dborkman@redhat.com>

We are introducing a fast hash function (see patch1) that can be
used in the context of OpenVSwitch to reduce the hashing footprint
(patch2). For details, please see individual patches!

v1->v2:
 - Make hash generic and place it under lib
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:44 -05:00
Francesco Fusco
500f808726 net: ovs: use CRC32 accelerated flow hash if available
Currently OVS uses jhash2() for calculating flow hashes in its
internal flow_hash() function. The performance of the flow_hash()
function is critical, as the input data can be hundreds of bytes
long.

OVS is largely deployed in x86_64 based datacenters.  Therefore,
we argue that the performance critical fast path of OVS should
exploit underlying CPU features in order to reduce the per packet
processing costs. We replace jhash2 with the hash implementation
provided by the kernel hash lib, which exploits the crc32l
instruction to achieve high performance

Our patch greatly reduces the hash footprint from ~200 cycles of
jhash2() to around ~90 cycles in case of ovs_flow_hash_crc()
(measured with rdtsc over maximum length flow keys on an i7 Intel
CPU).

Additionally, we wrote a microbenchmark to stress the flow table
performance. The benchmark inserts random flows into the flow
hash and then performs lookups. Our hash deployed on a CRC32
capable CPU reduces the lookup for 1000 flows, 100 masks from
~10,100us to ~6,700us, for example.

Thus, simply use the newly introduced arch_fast_hash2() as a
drop-in replacement.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:17 -05:00
Francesco Fusco
71ae8aac3e lib: introduce arch optimized hash library
We introduce a new hashing library that is meant to be used in
the contexts where speed is more important than uniformity of the
hashed values. The hash library leverages architecture specific
implementation to achieve high performance and fall backs to
jhash() for the generic case.

On Intel-based x86 architectures, the library can exploit the crc32l
instruction, part of the Intel SSE4.2 instruction set, if the
instruction is supported by the processor. This implementation
is twice as fast as the jhash() implementation on an i7 processor.

Additional architectures, such as Arm64 provide instructions for
accelerating the computation of CRC, so they could be added as well
in follow-up work.

Signed-off-by: Francesco Fusco <ffusco@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: Thomas Graf <tgraf@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-12-17 14:27:17 -05:00
Eyal Shapira
657322084d iwlwifi: mvm: set highest rate in VHT MCS Set
Set the Tx/Rx highest long GI rates in the VHT Supported MCS Set
field according to the chip capabilties.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:58 +02:00
Eyal Shapira
4623a26575 iwlwifi: mvm: rs: disable MCS9 Tx workaround
MCS9 introduces some corner cases in the current rs
algorithm which may lead to non optimal throughput and
instability in the throughput. Until all the corner
cases are resolved disable MCS9 for Tx as a workaround
which yields better throughput results as MCS8 is much
more stable.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:58 +02:00
Ilan Peer
be2056fce8 iwlwifi: mvm: Do not allow AP MAC context update if not active
Fix a regression introduced in "iwlwifi: mvm: fix ht protection flags"
where an AP/IBSS MAC context could have been updated even before the
context was added to the FW. This fix avoids the following warning:

WARNING: ... at .../drivers/net/wireless/iwlwifi/mvm/mac-ctxt.c:1132
    iwl_mvm_mac_ctxt_changed+0x75/0x90 [iwlmvm]()
Changing inactive MAC 0c:8b:fd:01:1a:30/3
Modules linked in: [...]
Call Trace:
[<c16041fd>] dump_stack+0x41/0x52
[<c1041074>] warn_slowpath_common+0x84/0xa0
[<f80d8f45>] ? iwl_mvm_mac_ctxt_changed+0x75/0x90 [iwlmvm]
[<f80d8f45>] ? iwl_mvm_mac_ctxt_changed+0x75/0x90 [iwlmvm]
[<c1041133>] warn_slowpath_fmt+0x33/0x40
[<f80d8f45>] iwl_mvm_mac_ctxt_changed+0x75/0x90 [iwlmvm]
[<f80d517a>] iwl_mvm_bss_info_changed+0x22a/0x4b0 [iwlmvm]
[<c160831d>] ? mutex_unlock+0xd/0x10
[<f80d4678>] ? iwl_mvm_configure_filter+0x58/0x70 [iwlmvm]
[<f80d4f50>] ? iwl_mvm_mac_tx+0xc0/0xc0 [iwlmvm]
[<f8132d83>] ieee80211_bss_info_change_notify+0xa3/0x1d0 [mac80211]
[<f8149247>] ? ieee80211_del_virtual_monitor+0x127/0x1f0 [mac80211]
[<f8149cac>] ieee80211_do_open+0x12c/0xeb0 [mac80211]
[<c106c6de>] ? __raw_notifier_call_chain+0x1e/0x30
[<c106c70f>] ? raw_notifier_call_chain+0x1f/0x30
[<f814aa8d>] ieee80211_open+0x5d/0x60 [mac80211]
[<c1500c7b>] __dev_open+0xab/0x140
[<c160b39a>] ? _raw_spin_unlock_bh+0x2a/0x30
[<c1500f41>] __dev_change_flags+0x81/0x160
[<c10ab6fc>] ? __lock_is_held+0x3c/0x60
[<c15010d1>] dev_change_flags+0x21/0x60

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reported-by: Johannes Berg <johannes.berg@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:57 +02:00
Emmanuel Grumbach
c2d284d547 iwlwifi: mvm: fixup Makefile
debufs.o appeared twice in the Makefile. Fix that.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:56 +02:00
Eyal Shapira
4e4b815c08 iwlwifi: mvm: rs: refactor rate scale action decision
Extract the scale action decision to a different function
in preparation of modifying it. While at it also convert
the scale action values from hardcoded values to a clear enum.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:56 +02:00
Eyal Shapira
4107dbd277 iwlwifi: mvm: rs: remove unnecessary debug logs
The logs are emitted in a flow in which there were retries
and the rates in the rate table entry didn't match the active
or search table. This doesn't indicate a problem and is
expected in most cases where there will be retries for some
reason. Remove the logs.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:55 +02:00
Eyal Shapira
e07be6d3fe iwlwifi: mvm: rs: improve rates table algo
The new logic will attempt more rates with less retries
per rate. Also when starting off with MIMO it will
fallback to SISO with the same MCS and only then to Legacy.
Previously we fell back directly to Legacy.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:54 +02:00
Eyal Shapira
ca6f38ff4b iwlwifi: mvm: rs: avoid recalc of supported legacy rate mask
The supported legacy rate mask is initialized when rs
is initialized based on the remote peer supported rates.
There's no need to re mask it repeatedly with the supported
remote peer rates.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:54 +02:00
Eyal Shapira
8fc7c58c03 iwlwifi: mvm: rs: refactor building the LQ command
Simplify the code a bit more by extracting the rates table
building logic into a separate function and handle setting
a fixed rate for debug in a separate flow.
Also avoid using and saving ucode rate format in different
places. Instead use rs_rate struct and convert to ucode format
only when filling the rates table in the LQ command.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:53 +02:00
Alexander Bondar
752096727c iwlwifi: mvm: Add and examine TLV flag for P2P client uAPSD support
Current firmware doesn't handle well uAPSD in P2P Client.
When it will be fixed, the firmware will set a TLV flag to notify
the driver that uAPSD is supported in P2P client mode.
Check this flag when sending power command for P2P client.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:52 +02:00
Arik Nemtsov
128cb89e1d iwlwifi: trans: turn set_pmi into an optional callback
It is not currently implemented for SDIO, and not required for other slave
buses as well.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:52 +02:00
Arik Nemtsov
efbf6e3ba2 iwlwifi: trans: clear FW_ERROR status in common code
Clear the FW_ERROR status before the common start_fw transport code.
Remove the transport specific clears.

After these patches the FW_ERROR flag is only set and cleared by common
transport code.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:51 +02:00
Arik Nemtsov
2a988e9808 iwlwifi: trans: prevent reprobe on repeated FW errors before restart
In case a sync command timeouts or Tx is stuck while a FW error
interrupt arrives, we might call iwl_op_mode_nic_error twice before
a restart has been initiated. This will cause a reprobe. Unify calls
to this function at the transport level and only call it on the first
FW error in a given by checking the transport FW error flag.

While at it, remove the privately defined iwl_nic_error from PCIE code
and use the common callback instead.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:50 +02:00
Eliad Peller
8c678ed466 iwlwifi: mvm: check iwl_nvm_init return value
iwl_nvm_init() return value wasn't checked in some
path, which resulted in the following panic (if
there was some issue with the nvm):

Unable to handle kernel NULL pointer dereference at virtual address 00000004
pgd = d0460000
[00000004] *pgd=00000000
Internal error: Oops: 5 [#1] PREEMPT SMP
Modules linked in: iwlmvm(+) iwlwifi mac80211 cfg80211 compat [last unloaded: compat]
PC is at iwl_mvm_mac_setup_register+0x12c/0x460 [iwlmvm]
LR is at 0x2710
pc : [<bf50dd4c>]    lr : [<00002710>]    psr: 20800013
sp : d00cfe18  ip : 0000081e  fp : d006b908
r10: d0711408  r9 : bf532e64  r8 : d006b5bc
r7 : d01af000  r6 : bf39cefc  r5 : d006ab00  r4 : d006b5a4
r3 : 00000001  r2 : 00000000  r1 : d006a120  r0 : d006b860

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:50 +02:00
Emmanuel Grumbach
03b67e982c iwlwifi: remove pointer to transport from op_mode
This pointer was not used anywhere.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:49 +02:00
Eyal Shapira
a0c38b2119 iwlwifi: mvm: rs: move rs_program_fix_rate to cleanup ifdefs
Move rs_program_fix_rate right before it's caller where we're
already in the context of an ifdef CPTCFG_MAC80211_DEBUGFS so
we can get rid of the extra ifdefs surrounding the original
location.

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:48 +02:00
Arik Nemtsov
3fc07953ba iwlwifi: trans: prevent tx and cmds during FW error
Stop Tx and commands from arriving to the transport layer when a FW
error has occurred. A HW recovery should take place before. Remove
transport specific checks of the same nature (note that not all
transports were protected).

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:48 +02:00
Arik Nemtsov
eb7ff77edd iwlwifi: trans: use a unified transport status
The same bits are employed in all transport layers. Put the status
field in the common transport layer. This allows us to employ them
in common transport code.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:47 +02:00
Arik Nemtsov
a408284367 iwlwifi: trans: divide stop_hw into stop_device/op_mode_leave
The stop_hw trans callback is not well defined. It is missing in many
cleanup flows and the division of labor between stop_device/stop_hw
is cumbersome. Remove stop_hw and use stop_device to perform both.
Implement this for all current transports.

PCIE needs some extra configuration the op-mode is leaving to configure
RF kill. Expose this explicitly as a new op_mode_leave trans callback.
Take the call to stop_device outside iwl_run_mvm_init_ucode, this
makes more sense and WARN when we want to run the INIT firmware while
it has run already.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:46 +02:00
Johannes Berg
8b206d198c iwlwifi: mvm: clarify smps_requests documentation
The documentation for smps_requests is unclear, rewrite it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:46 +02:00
Eliad Peller
dcbc3e1a3b iwlwifi: mvm: configure phy_ctxt with min_def
Configure the phy context to the minimum required
bandwidth, given by ctx->min_def.

Tuning to a narrower bandwidth should reduce the
noise level and consume less power.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:45 +02:00
Eliad Peller
e59647eaad iwlwifi: mvm: add multicast filtering support
Configure the fw to filter multicast according to
the addresses given by mac80211.

Note that bssid should be given even if we want
to pass all the multicast frames.

Signed-off-by: Eliad Peller <eliad@wizery.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:44 +02:00
Emmanuel Grumbach
ee9c6cb07a iwlwifi: mvm: move iwl_mvm_set_tx_power to PHY area
This is more related to phy contexts than to mac context.
Move this function to there.

Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:44 +02:00
Alexander Bondar
e45a941deb iwlwifi: mvm: add per-vif power debugfs hooks
This allows to tweak the power parameters per vif.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:43 +02:00
Alexander Bondar
51498cf6d8 iwlwifi: mvm: Enable power save on a single P2P client interface
Enable power save on P2P client interface only if it is the
only bound interface.
Avoid using uAPSD if P2P client is associated to GO that uses
opportunistic power save. This is due to current FW limitation.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:42 +02:00
Alexander Bondar
92d8556250 iwlwifi: mvm: Disable power save for monitor interface
When monitor interface is activated device power save needs
to be disabled.
Re-consider power management status on other active
interfaces when monitor interface is bound or unbound.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:42 +02:00
Alexander Bondar
1c2abf724b iwlwifi: mvm: Change power management dependency on multi MAC
FW still does not support power management on multiple MAC interfaces.
Currently the driver enforce this limitation by disabling PM if second
interface is added. Change this behavior to allow PM on a single interface
even if other interfaces exist but not bound to any specific PHY.
PM will be enabled if only one single interface is bound.

Signed-off-by: Alexander Bondar <alexander.bondar@intel.com>
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
2013-12-17 19:39:41 +02:00