Commit graph

484262 commits

Author SHA1 Message Date
Marcelo Leitner
00c83b01d5 Fix race condition between vxlan_sock_add and vxlan_sock_release
Currently, when trying to reuse a socket, vxlan_sock_add will grab
vn->sock_lock, locate a reusable socket, inc refcount and release
vn->sock_lock.

But vxlan_sock_release() will first decrement refcount, and then grab
that lock. refcnt operations are atomic but as currently we have
deferred works which hold vs->refcnt each, this might happen, leading to
a use after free (specially after vxlan_igmp_leave):

  CPU 1                            CPU 2

deferred work                    vxlan_sock_add
  ...                              ...
                                   spin_lock(&vn->sock_lock)
                                   vs = vxlan_find_sock();
  vxlan_sock_release
    dec vs->refcnt, reaches 0
    spin_lock(&vn->sock_lock)
                                   vxlan_sock_hold(vs), refcnt=1
                                   spin_unlock(&vn->sock_lock)
    hlist_del_rcu(&vs->hlist);
    vxlan_notify_del_rx_port(vs)
    spin_unlock(&vn->sock_lock)

So when we look for a reusable socket, we check if it wasn't freed
already before reusing it.

Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Fixes: 7c47cedf43 ("vxlan: move IGMP join/leave to work queue")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:57:08 -05:00
Cyrille Pitchen
51f8301485 net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:51:59 -05:00
David S. Miller
efef793926 Merge branch 'mlx4-next'
Or Gerlitz says:

====================
mlx4 driver update

This series from Matan, Jenny, Dotan and myself is mostly about adding
support to a new performance optimized flow steering mode (patches 4-10).

The 1st two patches are small fixes (one for VXLAN and one for SRIOV),
and the third patch is a fix to avoid hard-lockup situation when many
(hunderds) processes holding user-space QPs/CQs get events.

Matan and Or.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:40 -05:00
Matan Barak
7d077cd34e net/mlx4: Add support for A0 steering
Add the required firmware commands for A0 steering and a way to enable
that. The firmware support focuses on INIT_HCA, QUERY_HCA, QUERY_PORT,
QUERY_DEV_CAP and QUERY_FUNC_CAP commands. Those commands are used
to configure and query the device.

The different A0 DMFS (steering) modes are:

Static - optimized performance, but flow steering rules are
limited. This mode should be choosed explicitly by the user
in order to be used.

Dynamic - this mode should be explicitly choosed by the user.
In this mode, the FW works in optimized steering mode as long as
it can and afterwards automatically drops to classic (full) DMFS.

Disable - this mode should be explicitly choosed by the user.
The user instructs the system not to use optimized steering, even if
the FW supports Dynamic A0 DMFS (and thus will be able to use optimized
steering in Default A0 DMFS mode).

Default - this mode is implicitly choosed. In this mode, if the FW
supports Dynamic A0 DMFS, it'll work in this mode. Otherwise, it'll
work at Disable A0 DMFS mode.

Under SRIOV configuration, when the A0 steering mode is enabled,
older guest VF drivers who aren't using the RX QP allocation flag
(MLX4_RESERVE_A0_QP) will get a QP from the general range and
fail when attempting to register a steering rule. To avoid that,
the PF context behaviour is changed once on A0 static mode, to
require support for the allocation flag in VF drivers too.

In order to enable A0 steering, we use log_num_mgm_entry_size param.
If the value of the parameter is not positive, we treat the absolute
value of log_num_mgm_entry_size as a bit field. Setting bit 2 of this
bit field enables static A0 steering.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:36 -05:00
Matan Barak
431df8c7e9 net/mlx4: Refactor QUERY_PORT
Currently QUERY_PORT is done as a part of QUERY_DEV_CAP firmware command.

Since we would like to use it without querying all device capabilities,
extract this part to be a function of its own.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
579d059bd2 net/mlx4_core: Add explicit error message when rule doesn't meet configuration
When a given flow steering rule is invalid in respect to the current
steering configuration, print the correct error message to the system log.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
d57febe1a4 net/mlx4: Add A0 hybrid steering
A0 hybrid steering is a form of high performance flow steering.
By using this mode, mlx4 cards use a fast limited table based steering,
in order to enable fast steering of unicast packets to a QP.

In order to implement A0 hybrid steering we allocate resources
from different zones:
(1) General range
(2) Special MAC-assigned QPs [RSS, Raw-Ethernet] each has its own region.

When we create a rss QP or a raw ethernet (A0 steerable and BF ready) QP,
we try hard to allocate the QP from range (2). Otherwise, we try hard not
to allocate from this  range. However, when the system is pushed to its
limits and one needs every resource, the allocator uses every region it can.

Meaning, when we run out of raw-eth qps, the allocator allocates from the
general range (and the special-A0 area is no longer active). If we run out
of RSS qps, the mechanism tries to allocate from the raw-eth QP zone. If that
is also exhausted, the allocator will allocate from the general range
(and the A0 region is no longer active).

Note that if a raw-eth qp is allocated from the general range, it attempts
to allocate the range such that bits 6 and 7 (blueflame bits) in the
QP number are not set.

When the feature is used in SRIOV, the VF has to notify the PF what
kind of QP attributes it needs. In order to do that, along with the
"Eth QP blueflame" bit, we reserve a new "A0 steerable QP". According
to the combination of these bits, the PF tries to allocate a suitable QP.

In order to maintain backward compatibility (with older PFs), the PF
notifies which QP attributes it supports via QUERY_FUNC_CAP command.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
7a89399ffa net/mlx4: Add mlx4_bitmap zone allocator
The zone allocator is a mechanism which manages a few mlx4_bitmaps.

When allocating a resource, the user indicates the desired zone of
which this resource will be allocated from. If possible, the resource
will be allocated from this zone. Otherwise, the resource will be
allocated from a less-than, equal-to, higher-than priority zone,
according to the desired zone's properties with that respective
allocation order.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Dotan Barak
ab256e5ad0 net/mlx4: Add a check if there are too many reserved QPs
The number of reserved QPs is affected both from the firmware and
from the driver's requirements. This patch adds a check that
validates that this number is indeed feasable.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Eugenia Emantayev
ddae0349fd net/mlx4: Change QP allocation scheme
When using BF (Blue-Flame), the QPN overrides the VLAN, CV, and SV fields
in the WQE. Thus, BF may only be used for QPNs with bits 6,7 unset.

The current Ethernet driver code reserves a Tx QP range with 256b alignment.

This is wrong because if there are more than 64 Tx QPs in use,
QPNs >= base + 65 will have bits 6/7 set.

This problem is not specific for the Ethernet driver, any entity that
tries to reserve more than 64 BF-enabled QPs should fail. Also, using
ranges is not necessary here and is wasteful.

The new mechanism introduced here will support reservation for
"Eth QPs eligible for BF" for all drivers: bare-metal, multi-PF, and VFs
(when hypervisors support WC in VMs). The flow we use is:

1. In mlx4_en, allocate Tx QPs one by one instead of a range allocation,
   and request "BF enabled QPs" if BF is supported for the function

2. In the ALLOC_RES FW command, change param1 to:
a. param1[23:0]  - number of QPs
b. param1[31-24] - flags controlling QPs reservation

Bit 31 refers to Eth blueflame supported QPs. Those QPs must have
bits 6 and 7 unset in order to be used in Ethernet.

Bits 24-30 of the flags are currently reserved.

When a function tries to allocate a QP, it states the required attributes
for this QP. Those attributes are considered "best-effort". If an attribute,
such as Ethernet BF enabled QP, is a must-have attribute, the function has
to check that attribute is supported before trying to do the allocation.

In a lower layer of the code, mlx4_qp_reserve_range masks out the bits
which are unsupported. If SRIOV is used, the PF validates those attributes
and masks out unsupported attributes as well. In order to notify VFs which
attributes are supported, the VF uses QUERY_FUNC_CAP command. This command's
mailbox is filled by the PF, which notifies which QP allocation attributes
it supports.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:35 -05:00
Matan Barak
3dca0f42c7 net/mlx4_core: Use tasklet for user-space CQ completion events
Previously, we've fired all our completion callbacks straight from our ISR.

Some of those callbacks were lightweight (for example, mlx4_en's and
IPoIB napi callbacks), but some of them did more work (for example,
the user-space RDMA stack uverbs' completion handler). Besides that,
doing more than the minimal work in ISR is generally considered wrong,
it could even lead to a hard lockup of the system. Since when a lot
of completion events are generated by the hardware, the loop over those
events could be so long, that we'll get into a hard lockup by the system
watchdog.

In order to avoid that, add a new way of invoking completion events
callbacks. In the interrupt itself, we add the CQs which receive completion
event to a per-EQ list and schedule a tasklet. In the tasklet context
we loop over all the CQs in the list and invoke the user callback.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Or Gerlitz
383677da43 net/mlx4_core: Mask out host side virtualization features for guests
When VFs (guests in this context) issue the QUERY_DEV_CAP command, they
need not be told that host side virtualization features such as VST, FSM
(MAC anti-spoofing) and running > 80 VFs are supported by the device.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Or Gerlitz
c58942f252 net/mlx4_en: Set csum level for encapsulated packets
This was dropped by mistake for the napi_gro_frags flow, fix that.

Fixes: dd65beac48 ('net/mlx4_en: Extend usage of napi_gro_frags')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:47:34 -05:00
Sriharsha Basavapatna
630f4b7056 be2net: Export tunnel offloads only when a VxLAN tunnel is created
The encapsulated offload flags shouldn't be unconditionally exported
to the stack. The stack expects offloading to work across all tunnel
types when those flags are set. This would break other tunnels (like
GRE) since be2net currently supports tunnel offload for VxLAN only.

Also, with VxLANs Skyhawk-R can offload only 1 UDP dport. If more
than 1 UDP port is added, we should disable offloads in that case too.

Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@emulex.com>
Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:37:24 -05:00
Kevin Hao
0a4b5a2488 gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
We need to use dma_mapping_error() to check the dma address returned
by dma_map_single/page(). Otherwise we would get warning like this:
  WARNING: at lib/dma-debug.c:1140
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.18.0-rc2-next-20141029 #196
  task: c0834300 ti: effe6000 task.ti: c0874000
  NIP: c02b2c98 LR: c02b2c98 CTR: c030abc4
  REGS: effe7d70 TRAP: 0700   Not tainted  (3.18.0-rc2-next-20141029)
  MSR: 00021000 <CE,ME>  CR: 22044022  XER: 20000000

  GPR00: c02b2c98 effe7e20 c0834300 00000098 00021000 00000000 c030b898 00000003
  GPR08: 00000001 00000000 00000001 749eec9d 22044022 1001abe0 00000020 ef278678
  GPR16: ef278670 ef278668 ef278660 070a8040 c087f99c c08cdc60 00029000 c0840d44
  GPR24: c08be6e8 c0840000 effe7e78 ef041340 00000600 ef114e10 00000000 c08be6e0
  NIP [c02b2c98] check_unmap+0x51c/0x9e4
  LR [c02b2c98] check_unmap+0x51c/0x9e4
  Call Trace:
  [effe7e20] [c02b2c98] check_unmap+0x51c/0x9e4 (unreliable)
  [effe7e70] [c02b31d8] debug_dma_unmap_page+0x78/0x8c
  [effe7ed0] [c03d1640] gfar_clean_rx_ring+0x208/0x488
  [effe7f40] [c03d1a9c] gfar_poll_rx_sq+0x3c/0xa8
  [effe7f60] [c04f8714] net_rx_action+0xc0/0x178
  [effe7f90] [c00435a0] __do_softirq+0x100/0x1fc
  [effe7fe0] [c0043958] irq_exit+0xa4/0xc8
  [effe7ff0] [c000d14c] call_do_irq+0x24/0x3c
  [c0875e90] [c00048a0] do_IRQ+0x8c/0xf8
  [c0875eb0] [c000ed10] ret_from_except+0x0/0x18

For TX, we need to unmap the pages which has already been mapped and
free the skb before return.

For RX, move the dma mapping and error check to gfar_new_skb(). We
would reuse the original skb in the rx ring when either allocating
skb failure or dma mapping error.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Reviewed-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:27:14 -05:00
Hariprasad Shenai
666224d4d5 cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
Remove use of calls into t4_fw_hello() with MASTER_MUST, which results in
FW_HELLO_CMD_MASTERFORCE being set. The firmware doesn't support this and of
course any existing PF Drivers will totally go for a toss.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-11 14:25:17 -05:00
David S. Miller
52c9b12d38 Merge branch 'fec-next'
Fugang Duan says:

====================
net: fec: driver code clean and bug fix

The patch serial include code clean and bug fix:
Patch#1: avoid dummy operation during suspend/resume test.
Patch#2: bug fix for i.MX6SX SOC that clean all interrupt events during MAC initial process.
Patch#3: before phy device link status is up, only enable MDIO bus interrupt.

V2:
- Modify the comment form from David's suggestion.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 23:37:06 -05:00
Nimrod Andy
0c5a3aef9f net: fec: only enable mdio interrupt before phy device link up
Before phy device link up, we only enable FEC mdio interrupt, which
is more reasonable.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 23:37:01 -05:00
Nimrod Andy
e17f7fecdd net: fec: clear all interrupt events to support i.MX6SX
For i.MX6SX FEC controller, there have interrupt mask and event
field extension. To support all SOCs FEC, we clear all interrupt
events during MAVC initial process.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 23:37:01 -05:00
Nimrod Andy
858eeb7d9c net: fec: reset fep link status in suspend function
On some i.MX6 serial boards, phy power and refrence clock are supplied
or controlled by SOC. When do suspend/resume test, the power and clock
are disabled, so phy device link down.

For current driver, fep->link is still up status, which cause extra operation
like below code. To avoid the dumy operation, we set fep->link to down when
phy device is real down.
...
if (fep->link) {
	napi_disable(&fep->napi);
	netif_tx_lock_bh(ndev);
	fec_stop(ndev);
	netif_tx_unlock_bh(ndev);
	napi_enable(&fep->napi);
	fep->link = phy_dev->link;
	status_change = 1;
}
...

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 23:37:01 -05:00
Alexei Starovoitov
198bf1b046 net: sock: fix access via invalid file descriptor
0day robot reported the following crash:
[   21.233581] BUG: unable to handle kernel NULL pointer dereference at 0000000000000007
[   21.234709] IP: [<ffffffff8156ebda>] sk_attach_bpf+0x39/0xc2

It's due to bpf_prog_get() returning ERR_PTR.
Check it properly.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Fixes: 89aa075832 ("net: sock: allow eBPF programs to be attached to sockets")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 23:34:27 -05:00
Gu Zheng
f95b414edb net: introduce helper macro for_each_cmsghdr
Introduce helper macro for_each_cmsghdr as a wrapper of the enumerating
cmsghdr from msghdr, just cleanup.

Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 22:41:55 -05:00
Stephen Rothwell
dd0bcc0bc8 cxgb4/cxgb4vf: global named must be unique
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:52:39 -05:00
David S. Miller
22f10923dd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/amd/xgbe/xgbe-desc.c
	drivers/net/ethernet/renesas/sh_eth.c

Overlapping changes in both conflict cases.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:48:20 -05:00
Joe Perches
785c20a08b irda: Convert function pointer arrays and uses to const
Making things const is a good thing.

(x86-64 defconfig with all irda)
$ size net/irda/built-in.o*
   text	   data	    bss	    dec	    hex	filename
 109276	   1868	    244	 111388	  1b31c	net/irda/built-in.o.new
 108828	   2316	    244	 111388	  1b31c	net/irda/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:33:16 -05:00
Joe Perches
22bbf5f3e4 llc: Make llc_sap_action_t function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Net change:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61193	  12758	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
Joe Perches
9b37306935 llc: Make llc_conn_ev_qfyr_t function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Net change from original:

$ size net/llc/built-in.o.*
   text	   data	    bss	    dec	    hex	filename
  61065	  12886	   1344	  75295	  1261f	net/llc/built-in.o.new
  47113	  27030	   1344	  75487	  126df	net/llc/built-in.o.old

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
Joe Perches
14b7d95fd2 llc: Make function pointer arrays const
It's better when function pointer arrays aren't modifiable.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:21:24 -05:00
David S. Miller
b95bf1e0ea Merge branch 'kill_arch_fast_hash'
Daniel Borkmann says:

====================
Kill arch_fast_hash

Due to the size of changes I have based this against net-next,
also given 3.18 is already out. I've split this into 3 parts,
the first two to remove existing users (so they can optionally
go to stable) and the last one to kill the remaining library bits.

Let me know if there are any issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:52 -05:00
Daniel Borkmann
0cb6c969ed net, lib: kill arch_fast_hash library bits
As there are now no remaining users of arch_fast_hash(), lets kill
it entirely.

This basically reverts commit 71ae8aac3e ("lib: introduce arch
optimized hash library") and follow-up work, that is f.e., commit
237217546d ("lib: hash: follow-up fixups for arch hash"),
commit e3fec2f74f ("lib: Add missing arch generic-y entries for
asm-generic/hash.h") and last but not least commit 6a02652df5
("perf tools: Fix include for non x86 architectures").

Cc: Francesco Fusco <fusco@ntop.org>
Cc: Thomas Graf <tgraf@suug.ch>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:46 -05:00
Daniel Borkmann
87545899b5 net: replace remaining users of arch_fast_hash with jhash
This patch effectively reverts commit 500f808726 ("net: ovs: use CRC32
accelerated flow hash if available"), and other remaining arch_fast_hash()
users such as from nfsd via commit 6282cd5655 ("NFSD: Don't hand out
delegations for 30 seconds after recalling them.") where it has been used
as a hash function for bloom filtering.

While we think that these users are actually not much of concern, it has
been requested to remove the arch_fast_hash() library bits that arose
from [1] entirely as per recent discussion [2]. The main argument is that
using it as a hash may introduce bias due to its linearity (see avalanche
criterion) and thus makes it less clear (though we tried to document that)
when this security/performance trade-off is actually acceptable for a
general purpose library function.

Lets therefore avoid any further confusion on this matter and remove it to
prevent any future accidental misuse of it. For the time being, this is
going to make hashing of flow keys a bit more expensive in the ovs case,
but future work could reevaluate a different hashing discipline.

  [1] https://patchwork.ozlabs.org/patch/299369/
  [2] https://patchwork.ozlabs.org/patch/418756/

Cc: Neil Brown <neilb@suse.de>
Cc: Francesco Fusco <fusco@ntop.org>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:45 -05:00
Daniel Borkmann
7f19fc5e0b netlink: use jhash as hashfn for rhashtable
For netlink, we shouldn't be using arch_fast_hash() as a hashing
discipline, but rather jhash() instead.

Since netlink sockets can be opened by any user, a local attacker
would be able to easily create collisions with the DPDK-derived
arch_fast_hash(), which trades off performance for security by
using crc32 CPU instructions on x86_64.

While it might have a legimite use case in other places, it should
be avoided in netlink context, though. As rhashtable's API is very
flexible, we could later on still decide on other hashing disciplines,
if legitimate.

Reference: http://thread.gmane.org/gmane.linux.kernel/1844123
Fixes: e341694e3e ("netlink: Convert netlink_lookup() to use RCU protected hash table")
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:17:45 -05:00
David S. Miller
7d339890e7 Merge branch 'isdn-next'
Tilman Schmidt says:

====================
ISDN patches for net-next

Here's a series of patches for the Gigaset ISDN driver and one for
the ISDN CAPI subsystem.  Please merge as appropriate.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:14 -05:00
Tilman Schmidt
330078abd8 isdn/capi: correct argument types of command_2_index
Utility function command_2_index is always called with arguments of
type u8. Adapt its declaration accordingly.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:10 -05:00
Tilman Schmidt
f6a68e691f isdn/gigaset: enable Kernel CAPI support by default
Kernel CAPI has been the recommended ISDN subsystem for the Gigaset
driver since kernel release 2.6.34.2. It provides full backwards
compatibility to the old I4L subsystem thanks to the capidrv module.
I4L has been marked as deprecated for more than seven years.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:10 -05:00
Tilman Schmidt
f99a6fde9a isdn/gigaset: elliminate unnecessary argument from send_cb()
No need to pass a member of the cardstate structure as a separate
argument if the entire structure is already passed.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:09 -05:00
Tilman Schmidt
f650dd2805 isdn/gigaset: clarify gigaset_modem_fill control structure
Replace the flag-controlled retry loop by explicit goto statements
in the error branches to make the control structure easier to
understand.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:09 -05:00
Tilman Schmidt
ff66fc3ab0 isdn/gigaset: drop duplicate declaration
Function gigaset_skb_sent was declared twice, identically, in gigaset.h.

Signed-off-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 15:06:09 -05:00
Richard Alpe
340b6e59fb tipc: fix broadcast wakeup contention after congestion
commit 908344cdda ("tipc: fix bug in multicast congestion handling")
introduced a race in the broadcast link wakeup functionality.

This patch eliminates this broadcast link wakeup race caused by
operation on the wakeup list without proper locking. If this race
hit and corrupted the list all subsequent wakeup messages would be
lost, resulting in a considerable memory leak.

Signed-off-by: Richard Alpe <richard.alpe@ericsson.com>
Signed-off-by: Erik Hugne <erik.hugne@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 14:45:33 -05:00
Govindarajulu Varadarajan
4f675eb2a7 enic: add support for set/get rss hash key
This patch adds support for setting/getting rss hash key using ethtool.

v2:
respin patch to support RSS hash function changes.

Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 14:41:48 -05:00
David S. Miller
7dbea3e866 Merge branch 'napi_page_frags'
Alexander Duyck says:

====================
net: Alloc NAPI page frags from their own pool

This patch series implements a means of allocating page fragments without
the need for the local_irq_save/restore in __netdev_alloc_frag.  By doing
this I am able to decrease packet processing time by 11ns per packet in my
test environment.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:32:02 -05:00
Alexander Duyck
45abfb1069 ethernet/broadcom: Use napi_alloc_skb instead of netdev_alloc_skb_ip_align
This patch replaces the calls to netdev_alloc_skb_ip_align in the
copybreak paths.

Cc: Gary Zambrano <zambrano@broadcom.com>
Cc: Florian Fainelli <f.fainelli@gmail.com>
Cc: Ariel Elior <ariel.elior@qlogic.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:58 -05:00
Alexander Duyck
e2338f86b3 ethernet/realtek: use napi_alloc_skb instead of netdev_alloc_skb_ip_align
This replaces most of the calls to netdev_alloc_skb_ip_align in the Realtek
drivers.  The one instance I didn't replace in 8139cp.c is because it was
called as a part of init and as such is not always accessed from the
softirq context.

Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:57 -05:00
Alexander Duyck
e0e31216ef cxgb: Use napi_alloc_skb instead of netdev_alloc_skb_ip_align
In order to use napi_alloc_skb I needed to pass a pointer to struct adapter
instead of struct pci_dev.  This allowed me to access &adapter->napi.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:57 -05:00
Alexander Duyck
67fd893ee0 ethernet/intel: Use napi_alloc_skb
This change replaces calls to netdev_alloc_skb_ip_align with
napi_alloc_skb.  The advantage of napi_alloc_skb is currently the fact that
the page allocation doesn't make use of any irq disable calls.

There are few spots where I couldn't replace the calls as the buffer
allocation routine is called as a part of init which is outside of the
softirq context.

Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:57 -05:00
Alexander Duyck
fd11a83dd3 net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb
This change pulls the core functionality out of __netdev_alloc_skb and
places them in a new function named __alloc_rx_skb.  The reason for doing
this is to make these bits accessible to a new function __napi_alloc_skb.
In addition __alloc_rx_skb now has a new flags value that is used to
determine which page frag pool to allocate from.  If the SKB_ALLOC_NAPI
flag is set then the NAPI pool is used.  The advantage of this is that we
do not have to use local_irq_save/restore when accessing the NAPI pool from
NAPI context.

In my test setup I saw at least 11ns of savings using the napi_alloc_skb
function versus the netdev_alloc_skb function, most of this being due to
the fact that we didn't have to call local_irq_save/restore.

The main use case for napi_alloc_skb would be for things such as copybreak
or page fragment based receive paths where an skb is allocated after the
data has been received instead of before.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:57 -05:00
Alexander Duyck
ffde7328a3 net: Split netdev_alloc_frag into __alloc_page_frag and add __napi_alloc_frag
This patch splits the netdev_alloc_frag function up so that it can be used
on one of two page frag pools instead of being fixed on the
netdev_alloc_cache.  By doing this we can add a NAPI specific function
__napi_alloc_frag that accesses a pool that is only used from softirq
context.  The advantage to this is that we do not need to call
local_irq_save/restore which can be a significant savings.

I also took the opportunity to refactor the core bits that were placed in
__alloc_page_frag.  First I updated the allocation to do either a 32K
allocation or an order 0 page.  This is based on the changes in commmit
d9b2938aa where it was found that latencies could be reduced in case of
failures.  Then I also rewrote the logic to work from the end of the page to
the start.  By doing this the size value doesn't have to be used unless we
have run out of space for page fragments.  Finally I cleaned up the atomic
bits so that we just do an atomic_sub_and_test and if that returns true then
we set the page->_count via an atomic_set.  This way we can remove the extra
conditional for the atomic_read since it would have led to an atomic_inc in
the case of success anyway.

Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:31:57 -05:00
David S. Miller
6e5f59aacb Merge branch 'for-davem-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
More iov_iter work for the networking from Al Viro.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-10 13:17:23 -05:00
Flavio Leitner
6c702fab62 dummy: use MODULE_VERSION
Use MODULE_VERSION() now that dummy driver has a version.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 21:51:06 -05:00
Jiri Pirko
6ea3b446b9 net: sched: cls: use nla_nest_cancel instead of nlmsg_trim
To cancel nesting, this function is more convenient.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-12-09 21:49:58 -05:00