Commit graph

501243 commits

Author SHA1 Message Date
Mitch Williams
8b011ebb5c i40evf: ignore bogus messages from FW
Occasionally on shutdown, the FW will hand us a bunch of messages filled
with zeros, which can cause us to spin trying to handle them. Just
ignore these and get on with shutting down.

Change-ID: I347e9648f7153ad5a7b7e0847b87f7aad5f3e0da
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 20:50:19 -08:00
Mitch Williams
f4a718810c i40evf: reset on module unload
When the module is being unloaded, don't wait for the PF to politely
handle all of our admin queue requests, as that might take forever with
a lot of VFs enabled. Instead, just stop everything and request a VF
reset.

When the original shutdown code was written, VF resets were unreliable,
so we avoided them. But with production hardware and firmware, and the
1.x PF driver, this is no longer the case.

This fixes a potential multi-minute delay on driver unload, VF disable,
or system shutdown.

Change-ID: Ib43d6d860ef6b9b8f26e8dce0615a0302608c7d9
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 20:32:57 -08:00
Mitch Williams
3ba9bcb4b6 i40e: add locking around VF reset
During VF deallocation, we need to lock out the VF reset code. However,
we cannot depend on simply masking the interrupt, as this does not lock
out the service task, which can still call the reset routine. Instead,
leave the interrupt enabled, but add locking around the VF disable and
reset routines.

For the disable code, we wait to get the lock, as the reset code will
take a finite amount of time to run. For the reset code, we just return
if we fail to get the lock. Since we know that the VFs are being
disabled, we don't need to handle the reset.
This fixes a panic when disabling SR-IOV.

Change-ID: Iea0a6cdef35c331f48c6d5b2f8e6f0e86322e7d8
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 20:11:44 -08:00
Mitch Williams
07574897d3 i40e: Use even more ARQ descriptors
When enabling 64 VFs and loading the VF driver in the host kernel, we
can easily overrun the PF's admin receive queue. Double the size of this
queue, and increase the work limit to allow the PF to handle more
requests in a single pass through the service task.

Change-ID: I0efbbdc61954bffad422a2f33c4b948a59370bf5
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 19:53:32 -08:00
Mitch Williams
1750a22fa9 i40e: delay after VF reset
Delay a minimum of 10ms after VF reset, to allow the hardware's internal
FIFOs to flush.

Change-ID: I8a02ddb28c9f0d7303a1eb21d0b2443e5b4c1cda
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 19:30:29 -08:00
John W Linville
83840e4bd5 i40e: avoid use of uninitialized v_budget in i40e_init_msix
This I40E_FCOE block increments v_budget before it has been initialized,
then v_budget gets overwritten a few lines later.  This patch just
reorders the code hunks in what I believe was the intended sequence.

Coverity: CID 1260099

Signed-off-by: John W Linville <linville@tuxdriver.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 19:01:06 -08:00
Linus Torvalds
bfa76d4957 Linux 3.19 2015-02-08 18:54:22 -08:00
Trond Myklebust
4efdd92c92 SUNRPC: Remove TCP client connection reset hack
Instead we rely on SO_REUSEPORT to provide the reconnection semantics
that we need for NFSv2/v3.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:30 -05:00
Trond Myklebust
de84d89030 SUNRPC: TCP/UDP always close the old socket before reconnecting
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket()
or xs_tcp_setup_socket(), since they do not own the correct locks. Instead,
do it in xs_connect().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:30 -05:00
Trond Myklebust
718ba5b873 SUNRPC: Add helpers to prevent socket create from racing
The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.

This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:29 -05:00
Trond Myklebust
6cc7e90836 SUNRPC: Ensure xs_reset_transport() resets the close connection flags
Otherwise, we may end up looping.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:28 -05:00
Trond Myklebust
76698b2358 SUNRPC: Do not clear the source port in xs_reset_transport
Now that we can reuse bound ports after a close, we never really want to
clear the transport's source port after it has been set. Doing so really
messes up the NFSv3 DRC on the server.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:28 -05:00
Trond Myklebust
3913c78c3a SUNRPC: Handle EADDRINUSE on connect
Now that we're setting SO_REUSEPORT, we still need to handle the
case where a connect() is attempted, but the old socket is still
lingering.
Essentially, all we want to do here is handle the error by waiting
a few seconds and then retrying.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 21:47:27 -05:00
Rickard Strandqvist
cf86da489d i40e: i40e_fcoe.c: Remove unused function
Remove the function i40e_rx_is_fip() that is not used anywhere.

This was partially found by using a static code analysis program
called cppcheck.

Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2015-02-08 18:45:17 -08:00
Linus Torvalds
da2d96d3aa nios2 fixes for v3.19-final
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.5 (GNU/Linux)
 
 iQIVAwUAVNgSUlWoEK+e3syCAQJ2aQ/+I7jK5bmY1zpO0zmGs+3NsTduVCQ1eKfi
 M+WImsl7A6gtTqqeoLbKCG56fp/YbeeD3GZWBrDcnaZupE/ksjROX7bVtTVrkhA+
 FoI1kjle4nqTSDpuiONmok5Alt73L7si24E7c8//4+MhDQiskOF/s7W8vsohPjWr
 QTjEmhlJMvluy3d6ubPbBunhJMY70bJDH/RwldvwIzjyfOZxmQAxnccRiB8wMOjT
 vetWhwEdwoI6MtvlVuN7VjF+6jRIhyocjuhGNC0zd86XXt8KzQiOkSyaYIAEa/Xq
 NK+hmcxVn0OQLerMyqAnJNkX4WfgYXOGPDP3vmyvqlyFLesN6rsmca4bwfFG2/cG
 +OZpM15u7IuhGoaq27aVAmfI7pHtdUwGVeu5RbH3Vv11t0fc7ejrq+SdesIZDJ6u
 C8COlq1bMilALjrANHPugK+sMrPLyG3gO6g70la+k9tFw+hBrBSPz21PC43K/HNv
 /6YNEwcJoSFOC8qufQlA+xDe5OffojhipaPJw/YJEt8GdPhQke8YRqKSAx6sskYw
 37nLQ8oSANHvfirT57qGKrFV5tA1vTDQbzx010ptFv4uGm72zfv0NSvi6CV2oSac
 /VUizq9FoxSfFRv+UWGd3VxepDU+rQj26xcybmip9ykLzj8KNAMynG0mb0LAgw7D
 W55orHribaY=
 =bIjd
 -----END PGP SIGNATURE-----

Merge tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next

Pull nios2 fix from Ley Foon Tan:
 "This fixes incorrect behavior of some user programs"

* tag 'nios2-fixes-v3.19-final' of git://git.rocketboards.org/linux-socfpga-next:
  nios2: fix unhandled signals
2015-02-08 18:45:16 -08:00
Linus Torvalds
cdecbb336e Merge git://git.kvack.org/~bcrl/aio-fixes
Pull aio nested sleep annotation from Ben LaHaise,

* git://git.kvack.org/~bcrl/aio-fixes:
  aio: annotate aio_read_event_ring for sleep patterns
2015-02-08 18:27:58 -08:00
Linus Torvalds
4e02370f64 During testing Sedat Dilek hit a "suspicious RCU usage" splat that pointed
out a real bug. During suspend and resume the tlb_flush tracepoint is
 called when the CPU is going offline. As the CPU has been noted as offline,
 RCU is ignoring that CPU, which means that it can not use RCU protected
 locks. When tracepoints are activated, they require RCU locking, and
 if RCU is ignoring a CPU that runs a tracepoint, there is a chance that
 the tracepoint could cause corruption.
 
 The solution was to change the tracepoint into a TRACE_EVENT_CONDITION()
 which allows us to check a condition to determine if the tracepoint
 should be called or not. If the condition is not met, the rcu protected
 code will not be executed. By adding the condition
 "cpu_online(smp_processor_id())", this will prevent the RCU protected
 code from being executed if the CPU is marked offline.
 
 After adding this, another bug was discovered. As RCU checks rcu callers,
 if a rcu call is not done, there is no check (obviously). We found that
 tracepoints could be added in RCU ignored locations and not have lockdep
 complain until the tracepoint is activated. This missed places where
 tracepoints were added in places they should not have been. To fix this,
 code was added in 3.18 that if lockdep is enabled, any tracepoint will
 still call the rcu checks even if the tracepoint is not enabled. The bug
 here, is that the check does not take the CONDITION into account. As the
 condition may prevent tracepoints from being activated in RCU ignored
 areas (as the one patch does), we get false positives when we enable
 lockdep and hit a tracepoint that the condition prevents it from being
 called in a RCU ignored location. The fix for this is to add the
 CONDITION to the rcu checks, even if the tracepoint is not enabled.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJU1rQfAAoJEEjnJuOKh9ld19UH/juFLZFjpYBgtRmbCZa/54Zk
 i2Fa3U8jQe8MHHEYOjCLT9MQTYo/42btmJhr7kWKtIoUgDEli4lkOpbs+H0qar5y
 Vv9+1cLeNFQzgIE3nwV7cjAw7Jufoyzd1lstDqIQvcmzZnQ5sNyyVeigMcxGv8Ls
 4FyqzG6zCVgiDL4LyYNHdNcMr6qLs3KTFDEqp+kQreeO7R1r3ZEpq3JoWaEUgoPP
 qrYv/rqVosLBUGA0pd7RmiGOxhjeKm15qz1GkiPeeus6DDWC6bvPC8cAc/FfkXH0
 hYpoQghSZVnXGy0LzVsd44gj7tYx1FHEpYy1s8G6d5WJcNOGZ6OoZOdOZMyjPVw=
 =PeL2
 -----END PGP SIGNATURE-----

Merge tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

Pull ftrace fixes from Steven Rostedt:
 "During testing Sedat Dilek hit a "suspicious RCU usage" splat that
  pointed out a real bug.  During suspend and resume the tlb_flush
  tracepoint is called when the CPU is going offline.  As the CPU has
  been noted as offline, RCU is ignoring that CPU, which means that it
  can not use RCU protected locks.  When tracepoints are activated, they
  require RCU locking, and if RCU is ignoring a CPU that runs a
  tracepoint, there is a chance that the tracepoint could cause
  corruption.

  The solution was to change the tracepoint into a
  TRACE_EVENT_CONDITION() which allows us to check a condition to
  determine if the tracepoint should be called or not.  If the condition
  is not met, the rcu protected code will not be executed.  By adding
  the condition "cpu_online(smp_processor_id())", this will prevent the
  RCU protected code from being executed if the CPU is marked offline.

  After adding this, another bug was discovered.  As RCU checks rcu
  callers, if a rcu call is not done, there is no check (obviously).  We
  found that tracepoints could be added in RCU ignored locations and not
  have lockdep complain until the tracepoint is activated.  This missed
  places where tracepoints were added in places they should not have
  been.  To fix this, code was added in 3.18 that if lockdep is enabled,
  any tracepoint will still call the rcu checks even if the tracepoint
  is not enabled.  The bug here, is that the check does not take the
  CONDITION into account.  As the condition may prevent tracepoints from
  being activated in RCU ignored areas (as the one patch does), we get
  false positives when we enable lockdep and hit a tracepoint that the
  condition prevents it from being called in a RCU ignored location.

  The fix for this is to add the CONDITION to the rcu checks, even if
  the tracepoint is not enabled"

* tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  x86/tlb/trace: Do not trace on CPU that is offline
  tracing: Add condition check to RCU lockdep checks
2015-02-08 18:08:14 -08:00
Chung-Ling Tang
a3248d609b nios2: fix unhandled signals
Follow other architectures for user fault handling.

Signed-off-by: Chung-Ling Tang <cltang@codesourcery.com>
Acked-by: Ley Foon Tan <lftan@altera.com>
2015-02-09 09:47:05 +08:00
Rasmus Villemoes
a4870f79c2 vxlan: Wrong type passed to %pIS
src_ip is a pointer to a union vxlan_addr, one member of which is a
struct sockaddr. Passing a pointer to src_ip is wrong; one should pass
the value of src_ip itself. Since %pIS formally expects something of
type struct sockaddr*, let's pass a pointer to the appropriate union
member, though this of course doesn't change the generated code.

Fixes: e4c7ed4153 ("vxlan: add ipv6 support")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 17:14:40 -08:00
Shrikrishna Khare
dd83829ed9 Driver: Vmxnet3: Change the hex constant to its decimal equivalent
The hex constant chosen for VMXNET3_REV1_MAGIC is offensive,
replace it with its decimal equivalent.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Reviewed-by: Shreyas Bhatewara <sbhatewara@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:55:01 -08:00
Eric Dumazet
567e4b7973 net: rfs: add hash collision detection
Receive Flow Steering is a nice solution but suffers from
hash collisions when a mix of connected and unconnected traffic
is received on the host, when flow hash table is populated.

Also, clearing flow in inet_release() makes RFS not very good
for short lived flows, as many packets can follow close().
(FIN , ACK packets, ...)

This patch extends the information stored into global hash table
to not only include cpu number, but upper part of the hash value.

I use a 32bit value, and dynamically split it in two parts.

For host with less than 64 possible cpus, this gives 6 bits for the
cpu number, and 26 (32-6) bits for the upper part of the hash.

Since hash bucket selection use low order bits of the hash, we have
a full hash match, if /proc/sys/net/core/rps_sock_flow_entries is big
enough.

If the hash found in flow table does not match, we fallback to RPS (if
it is enabled for the rxqueue).

This means that a packet for an non connected flow can avoid the
IPI through a unrelated/victim CPU.

This also means we no longer have to clear the table at socket
close time, and this helps short lived flows performance.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:53:57 -08:00
Sabrina Dubroca
096a4cfa58 net: fix a typo in skb_checksum_validate_zero_check
Remove trailing underscore.

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:28:26 -08:00
Sabrina Dubroca
3e97fa7059 gre/ipip: use be16 variants of netlink functions
encap.sport and encap.dport are __be16, use nla_{get,put}_be16 instead
of nla_{get,put}_u16.

Fixes the sparse warnings:

warning: incorrect type in assignment (different base types)
   expected restricted __be32 [addressable] [usertype] o_key
   got restricted __be16 [addressable] [usertype] i_flags
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] sport
   got unsigned short
warning: incorrect type in assignment (different base types)
   expected restricted __be16 [usertype] dport
   got unsigned short
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] sport
warning: incorrect type in argument 3 (different base types)
   expected unsigned short [unsigned] [usertype] value
   got restricted __be16 [usertype] dport

Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 16:28:06 -08:00
Trond Myklebust
4dda9c8a5e SUNRPC: Set SO_REUSEPORT socket option for TCP connections
When using TCP, we need the ability to reuse port numbers after
a disconnection, so that the NFSv3 server knows that we're the same
client. Currently we use a hack to work around the TCP socket's
TIME_WAIT: we send an RST instead of closing, which doesn't
always work...
The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple
TCP connections to the same source address+port combination, and thus
to use ordinary TCP close() instead of the current hack.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2015-02-08 18:52:11 -05:00
Hans de Goede
e77a16355a ACPI / video: Add disable_native_backlight quirk for Samsung 510R
Backlight control through the native intel interface does not work properly
on the Samsung 510R, where as using the acpi_video interface does work, add
a quirk for this.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=1186097
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08 23:49:32 +01:00
Andreas Ruprecht
8dcb52cbca ACPI / PM: Remove unneeded nested #ifdef
In commit 5de21bb998 ("ACPI / PM: Drop CONFIG_PM_RUNTIME from the
ACPI core"), all occurrences of CONFIG_PM_RUNTIME were replaced with
CONFIG_PM. This created the following structure of #ifdef blocks in
the code:

 [...]
 #ifdef CONFIG_PM
 #ifdef CONFIG_PM
 /* always on / undead */
 #ifdef CONFIG_PM_SLEEP
 [...]
 #endif
 #endif
 [...]
 #endif

This patch removes the inner "#ifdef CONFIG_PM" block as it will
always be enabled when the outer block is enabled. This inconsistency
was found using the undertaker-checkpatch tool.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08 23:45:58 +01:00
Andreas Ruprecht
f6a55884d7 USB / PM: Remove unneeded #ifdef and associated dead code
In commit ceb6c9c862 ("USB / PM: Drop CONFIG_PM_RUNTIME from the
USB core"), all occurrences of CONFIG_PM_RUNTIME in the USB core
code were replaced by CONFIG_PM. This created the following structure
of #ifdef blocks in drivers/usb/core/hub.c:

 [...]
 #ifdef CONFIG_PM
 #ifdef CONFIG_PM
 /* always on / undead */
 #else
 /* dead */
 #endif
 [...]

This patch removes unnecessary inner "#ifdef CONFIG_PM" as well as
the corresponding dead #else block. This inconsistency was found using
the undertaker-checkpatch tool.

Signed-off-by: Andreas Ruprecht <rupran@einserver.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2015-02-08 23:42:25 +01:00
Jon Paul Maloy
51a00daf73 tipc: fix bug in socket reception function
In commit c637c10355 ("tipc: resolve race
problem at unicast message reception") we introduced a time limit
for how long the function tipc_sk_eneque() would be allowed to execute
its loop. Unfortunately, the test for when this limit is passed was put
in the wrong place, resulting in a lost message when the test is true.

We fix this by moving the test to before we dequeue the next buffer
from the input queue.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 13:09:25 -08:00
Michael Büsch
662f5533c4 rt6_probe_deferred: Do not depend on struct ordering
rt6_probe allocates a struct __rt6_probe_work and schedules a work handler rt6_probe_deferred.
But rt6_probe_deferred kfree's the struct work_struct instead of struct __rt6_probe_work.
This works, because struct work_struct is the first element of struct __rt6_probe_work.

Change it to kfree struct __rt6_probe_work to not implicitly depend on
struct work_struct being the first element.

This does not affect the generated code.

Signed-off-by: Michael Buesch <m@bues.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 13:00:43 -08:00
Trond Myklebust
bc3203cdca NFS: RDMA Client Sparse Fixes
This patch fixes a sparse warning in the initial submission.
 
 Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJU09WdAAoJENfLVL+wpUDr2HMP/2rvjkncrlbumSUlkMMUPenZ
 PUM6dDAnvvhkRj/YrhmJ94v9SPb4BsR0ay4apn5aqmoEHqEy/LfAsoy1TlcspRO6
 DwIKD8wHQ9RkpIVaEt1IGhApWlWK3R5FqoDTSMW9VLq72QuCsnbX3EXWN9B+hgT7
 UVmcsGe0/5zd9v8RXzWJjyabOsx7rx7gNuuAyULO3RudNMzEpldMf7oHq0RwndOt
 akxSSvRSfMFyfJscfI3F2IJsbXBST+ZzzJ0oRXXWn0RfiWR+Z438Nk9HDuOfGBFv
 LACoELeJxtBEFAjBv6GnBILdSxRi6dTZPpAEzvNR45SJAk4c1gGDqgUbyIYXqeYM
 k8alEMCS+eKvDe7DL6N1Nlxq6oc5pAt7BYYpRzCz+2dd7gfHIOwurBnZwFgxN/rK
 GhtXCaBqqVr4lmu/lbgWjVFN/Jt6TGZ6+FqQhDM70CmJGBWbOEJNE6H50R7SiFpv
 7UapXg9iknCpjexkZS3cyAB0Qu31qvIK64ZzsPhVafmSYh9ed5hlT2WSwphfDki5
 ZQOktuPFtVkOI55Rz+2EpNNYe9nRuL5wBqD4Na3uUiEWgbnDLRmx+sDyzcfPujTN
 trFTwUO082SFK0yZVusZ04vlqjfzxSd4aYSeC5Zzgt0CYJCZd7z0tFrdLpyO/fDn
 PLfvcJyFRBBbaPgiwBHg
 =3fRY
 -----END PGP SIGNATURE-----

Merge tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma

NFS: RDMA Client Sparse Fixes

This patch fixes a sparse warning in the initial submission.

Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

* tag 'nfs-rdma-for-3.20-part-2' of git://git.linux-nfs.org/projects/anna/nfs-rdma:
  xprtrdma: Address sparse complaint in rpcr_to_rdmar()
2015-02-08 10:37:34 -05:00
Takashi Sakamoto
d34890cf41 ALSA: control: fix failure to return numerical ID in 'add' event
Currently when adding a new control, the assigned numerical ID is not
set for event data, thus userspace applications cannot realize it just
by event data.

Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-02-08 15:31:09 +01:00
David S. Miller
f06535c599 Merge branch 'tcp_ack_loops'
Neal Cardwellsays:

====================
tcp: mitigate TCP ACK loops due to out-of-window validation dupacks

This patch series mitigates "ack loop" DoS scenarios by rate-limiting
outgoing duplicate ACKs sent in response to incoming "out of window"
segments.

Background
-----------

There are several cases in which the TCP RFCs specify that a TCP
endpoint should send a pure duplicate ACK in response to a pure
duplicate ACK that appears to be invalid due to being "out of window":

(1) RFC 793 (section 3.9, page 69) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the incoming
    sequence number is invalid due to being outside the receive
    window: "If an incoming segment is not acceptable, an
    acknowledgment should be sent in reply".

(2) RFC 793 (section 3.9, page 72) says: "If the ACK acknowledges
    something not yet sent (SEG.ACK > SND.NXT) then send an ACK".

(3) RFC 1323 (section 4.2.1, page 18) specifies that endpoints should
    send a duplicate ACK in response to an ACK when the PAWS check for
    the incoming timestamp value fails: "If .... SEG.TSval < TS.Recent
    and if TS.Recent is valid ... Send an acknowledgement in reply"

The problem
------------

Normally, this is not a problem. However, a buggy middlebox or
malicious man-in-the-middle can inject a few packets into the
conversation that advance each endpoint's notion of the current window
(sequence, ACK, or timestamp), without either side noticing. In this
case, from then on each side can think the other is sending invalid
segments. Thus an infinite feedback loop of duplicate ACKs can ensue,
as each endpoint receives a duplicate ACK, decides that it is invalid
(due to sequence number, ACK number, or timestamp), and then sends a
dupack in reply, which the other side decides is invalid, responding
with a dupack... ad infinitum. This ping-pong feedback loop can happen
at a very high rate.

This phenomenon can and does happen in practice. It has been seen in
datacenter and Internet contexts at Google, and has been documented by
Anil Agarwal in the Nov 2013 tcpm thread "TCP mismatched sequence
numbers issue", and Avery Fay in the Feb 2015 Linux netdev thread
"Invalid timestamp? causing tight ack loop (hundreds of thousands of
packets / sec)".

This patch series
------------------

This patch series mitigates such ack loops by rate-limiting outgoing
duplicate ACKs sent in response to incoming TCP packets that are for
an existing connection but that are invalid due to any of the reasons
mentioned above: sequence number (1), ACK field (2), or timestamp
value (3). The rate limit for such duplicate ACKs is specified by a
new sysctl, tcp_invalid_ratelimit, which specifies the minimal space
between such outbound duplicate ACKs, in milliseconds. The default is
500 (500ms), and 0 disables the mechanism.

We rate-limit these duplicate ACK responses rather than blocking them
entirely or resetting the connection, because legitimate connections
can rely on dupacks in response to some out-of-window segments. For
example, zero window probes are typically sent with a sequence number
that is below the current window, and ZWPs thus expect to thus elicit
a dupack in response.

Testing: this approach has been in use at Google for a while.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:26 -08:00
Neal Cardwell
4fb17a6091 tcp: mitigate ACK loops for connections as tcp_timewait_sock
Ensure that in state FIN_WAIT2 or TIME_WAIT, where the connection is
represented by a tcp_timewait_sock, we rate limit dupacks in response
to incoming packets (a) with TCP timestamps that fail PAWS checks, or
(b) with sequence numbers that are out of the acceptable window.

We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:13 -08:00
Neal Cardwell
f2b2c582e8 tcp: mitigate ACK loops for connections as tcp_sock
Ensure that in state ESTABLISHED, where the connection is represented
by a tcp_sock, we rate limit dupacks in response to incoming packets
(a) with TCP timestamps that fail PAWS checks, or (b) with sequence
numbers or ACK numbers that are out of the acceptable window.

We do not send a dupack in response to out-of-window packets if it has
been less than sysctl_tcp_invalid_ratelimit (default 500ms) since we
last sent a dupack in response to an out-of-window packet.

There is already a similar (although global) rate-limiting mechanism
for "challenge ACKs". When deciding whether to send a challence ACK,
we first consult the new per-connection rate limit, and then the
global rate limit.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
Neal Cardwell
a9b2c06dbe tcp: mitigate ACK loops for connections as tcp_request_sock
In the SYN_RECV state, where the TCP connection is represented by
tcp_request_sock, we now rate-limit SYNACKs in response to a client's
retransmitted SYNs: we do not send a SYNACK in response to client SYN
if it has been less than sysctl_tcp_invalid_ratelimit (default 500ms)
since we last sent a SYNACK in response to a client's retransmitted
SYN.

This allows the vast majority of legitimate client connections to
proceed unimpeded, even for the most aggressive platforms, iOS and
MacOS, which actually retransmit SYNs 1-second intervals for several
times in a row. They use SYN RTO timeouts following the progression:
1,1,1,1,1,2,4,8,16,32.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
Neal Cardwell
032ee42369 tcp: helpers to mitigate ACK loops by rate-limiting out-of-window dupacks
Helpers for mitigating ACK loops by rate-limiting dupacks sent in
response to incoming out-of-window packets.

This patch includes:

- rate-limiting logic
- sysctl to control how often we allow dupacks to out-of-window packets
- SNMP counter for cases where we rate-limited our dupack sending

The rate-limiting logic in this patch decides to not send dupacks in
response to out-of-window segments if (a) they are SYNs or pure ACKs
and (b) the remote endpoint is sending them faster than the configured
rate limit.

We rate-limit our responses rather than blocking them entirely or
resetting the connection, because legitimate connections can rely on
dupacks in response to some out-of-window segments. For example, zero
window probes are typically sent with a sequence number that is below
the current window, and ZWPs thus expect to thus elicit a dupack in
response.

We allow dupacks in response to TCP segments with data, because these
may be spurious retransmissions for which the remote endpoint wants to
receive DSACKs. This is safe because segments with data can't
realistically be part of ACK loops, which by their nature consist of
each side sending pure/data-less ACKs to each other.

The dupack interval is controlled by a new sysctl knob,
tcp_invalid_ratelimit, given in milliseconds, in case an administrator
needs to dial this upward in the face of a high-rate DoS attack. The
name and units are chosen to be analogous to the existing analogous
knob for ICMP, icmp_ratelimit.

The default value for tcp_invalid_ratelimit is 500ms, which allows at
most one such dupack per 500ms. This is chosen to be 2x faster than
the 1-second minimum RTO interval allowed by RFC 6298 (section 2, rule
2.4). We allow the extra 2x factor because network delay variations
can cause packets sent at 1 second intervals to be compressed and
arrive much closer.

Reported-by: Avery Fay <avery@mixpanel.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 01:03:12 -08:00
Pravin B Shelar
ca539345f8 openvswitch: Initialize unmasked key and uid len
Flow alloc needs to initialize unmasked key pointer. Otherwise
it can crash kernel trying to free random unmasked-key pointer.

general protection fault: 0000 [#1] SMP
3.19.0-rc6-net-next+ #457
Hardware name: Supermicro X7DWU/X7DWU, BIOS  1.1 04/30/2008
RIP: 0010:[<ffffffff8111df0e>] [<ffffffff8111df0e>] kfree+0xac/0x196
Call Trace:
 [<ffffffffa060bd87>] flow_free+0x21/0x59 [openvswitch]
 [<ffffffffa060bde0>] ovs_flow_free+0x21/0x23 [openvswitch]
 [<ffffffffa0605b4a>] ovs_packet_cmd_execute+0x2f3/0x35f [openvswitch]
 [<ffffffffa0605995>] ? ovs_packet_cmd_execute+0x13e/0x35f [openvswitch]
 [<ffffffff811fe6fb>] ? nla_parse+0x4f/0xec
 [<ffffffff8139a2fc>] genl_family_rcv_msg+0x26d/0x2c9
 [<ffffffff8107620f>] ? __lock_acquire+0x90e/0x9aa
 [<ffffffff8139a3be>] genl_rcv_msg+0x66/0x89
 [<ffffffff8139a358>] ? genl_family_rcv_msg+0x2c9/0x2c9
 [<ffffffff81399591>] netlink_rcv_skb+0x3e/0x95
 [<ffffffff81399898>] ? genl_rcv+0x18/0x37
 [<ffffffff813998a7>] genl_rcv+0x27/0x37
 [<ffffffff81399033>] netlink_unicast+0x103/0x191
 [<ffffffff81399382>] netlink_sendmsg+0x2c1/0x310
 [<ffffffff811007ad>] ? might_fault+0x50/0xa0
 [<ffffffff8135c773>] do_sock_sendmsg+0x5f/0x7a
 [<ffffffff8135c799>] sock_sendmsg+0xb/0xd
 [<ffffffff8135cacf>] ___sys_sendmsg+0x1a3/0x218
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8115a9d0>] ? fsnotify+0x32c/0x348
 [<ffffffff8115a720>] ? fsnotify+0x7c/0x348
 [<ffffffff8113e5f5>] ? __fget+0xaa/0xbf
 [<ffffffff8113e54b>] ? get_close_on_exec+0x86/0x86
 [<ffffffff8135cccd>] __sys_sendmsg+0x3d/0x5e
 [<ffffffff8135cd02>] SyS_sendmsg+0x14/0x16
 [<ffffffff81411852>] system_call_fastpath+0x12/0x17

Fixes: 74ed7ab9264("openvswitch: Add support for unique flow IDs.")
CC: Joe Stringer <joestringer@nicira.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-08 00:51:14 -08:00
Chris Rorvick
12865cac38 ALSA: line6: Pass driver name to line6_probe()
Provide a unique name for each driver instead of using "line6usb" for
all of them.  This will allow for different configurations based on the
driver type.

Signed-off-by: Chris Rorvick <chris@rorvick.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-02-08 09:07:07 +01:00
Chris Rorvick
f2bd242fa1 ALSA: line6: Pass toneport pointer to toneport_has_led()
It is unlikely this function would ever be used in a context without a
pointer to a `struct usb_line6_toneport', so grab the device type from
it rather than having the caller do it.

Signed-off-by: Chris Rorvick <chris@rorvick.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-02-08 09:06:08 +01:00
Chris Rorvick
89444601e5 ALSA: line6: Add toneport_has_source_select()
Add a predicate for testing if the device supports source selection to
make the conditional logic around this a bit cleaner.

Signed-off-by: Chris Rorvick <chris@rorvick.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2015-02-08 09:05:56 +01:00
David S. Miller
34afb4eb03 Merge branch 'cxgb4'
Hariprasad Shenai says:

====================
Add support to dump some hw debug info

This patch series adds support to dump sensor info, dump Transport Processor
event trace, dump Upper Layer Protocol RX module command trace, dump mailbox
contents and dump Transport Processor congestion control configuration.

Will send a separate patch series for all the hw stats patches, by moving them
to ethtool.

The patches series is created against 'net-next' tree.
And includes patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review the
change and let us know in case of any review comments.

V2: Dopped all hw stats related patches. Added a new patch which adds support to
dump congestion control table.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:53:03 -08:00
Hariprasad Shenai
bad4379263 cxgb4: Add support in debugfs to dump the congestion control table
Dump Transport Processor modules congestion control configuration

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:52:39 -08:00
Hariprasad Shenai
bf7c781d57 cxgb4: Add support to dump mailbox content in debugfs
Adds support to dump the current contents of mailbox and the driver which owns
it.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:52:39 -08:00
Hariprasad Shenai
797ff0f573 cxgb4: Add support for ULP RX logic analyzer output in debugfs
Dump Upper Layer Protocol RX module command trace

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:52:39 -08:00
Hariprasad Shenai
2d277b3b44 cxgb4: Added support in debugfs to display TP logic analyzer output
Dump Transport Processor event trace.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:52:39 -08:00
Hariprasad Shenai
70a5f3bb5f cxgb4: Add support in debugfs to display sensor information
Dump out various chip sensor information. Currently Chip Temperature
and Core Voltage.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:52:39 -08:00
David S. Miller
bdb2748202 Merge branch 'be2net'
Sathya Perla says:

====================
be2net: patch set

Hi Dave, pls consider applying the following patch-set to the
net-next tree. It has 5 code/style cleanup patches and 4 patches that
add functionality to the driver.

Patch 1 moves routines that were not needed to be in be.h to the respective
src files, to avoid unnecessary compilation.

Patch 2 replaces (1 << x) with BIT(x) macro

Patch 3 refactors code that checks if a FW flash file is compatible
with the adapter. The code is now refactored into 2 routines, the first one
gets the file type from the image file and the 2nd routine checks if the
file type is compatible with the adapter.

Patch 4 adds compatibility checks for flashing a FW image on the new
Skyhawk P2 HW revision.

Patch 5 adds support for a new "offset based" flashing scheme, wherein
the driver informs the FW of the offset at which each component in the flash
file is to be flashed at. This helps flashing components that were
previously not recognized by the running FW.

Patch 6 simplifies the be_cmd_rx_filter() routine, by passing to it the
filter flags already used in the FW cmd, instead of the netdev flags that
were converted to the FW-cmd flags.

Patch 7 introduces helper routines in be_set_rx_mode() and be_vid_config()
to improve code readability.

Patch 8 adds processing of port-misconfig async event sent by the FW.

Patch 9 removes unnecessary swapping of a field in the TX desc.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:51:02 -08:00
Sathya Perla
f986afcbe0 be2net: avoid unncessary swapping of fields in eth_tx_wrb
The 32-bit fields of a tx-wrb are little endian. The driver is currently
using be_dws_le_to_cpu() routine to swap (cpu to le) all the fields of
a tx-wrb. So, the rsvd field is also unnecessarily swapped.

This patch fixes this by individually swapping the required fields.
Also, the type of the fields in eth_tx_wrb{} is now changed to __le32
from u32 to avoid sparse warnings.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:50:59 -08:00
Vasundhara Volam
21252377bb be2net: process port misconfig async event
This patch adds support for processing the port misconfigure async
event generated by the FW. This event is generated typically when an
optical module is incorrectly installed or is faulty.

This patch also moves the port_name field to the adapter struct for
logging the event. As the be_cmd_query_port_name() call is now moved
to be_get_config(), it is modified to use the mailbox instead of MCCQ

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:50:59 -08:00
Sathya Perla
f66b7cfd95 be2net: refactor be_set_rx_mode() and be_vid_config() for readability
This patch re-factors the filter setting (uc-list, mc-list, promisc, vlan)
code in be_set_rx_mode() and be_vid_config() to make it more readable
and reduce code duplication.
This patch adds a separate field to track the state/mode of filtering,
along with moving all the filtering related fields to one place in be
be_adapter structure.

Signed-off-by: Sathya Perla <sathya.perla@emulex.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-02-07 22:50:59 -08:00