Commit graph

470952 commits

Author SHA1 Message Date
David S. Miller
fb5690d245 Merge branch 'fou-next'
Tom Herbert says:

====================
net: foo-over-udp (fou)

This patch series implements foo-over-udp. The idea is that we can
encapsulate different IP protocols in UDP packets. The rationale for
this is that networking devices such as NICs and switches are usually
implemented with UDP (and TCP) specific mechanims for processing. For
instance, many switches and routers will implement a 5-tuple hash
for UDP packets to perform Equal Cost Multipath Routing (ECMP) or
RSS (on NICs). Many NICs also only provide rudimentary checksum
offload (basic TCP and UDP packet), with foo-over-udp we may be
able to leverage these NICs to offload checksums of tunneled packets
(using checksum unnecessary conversion and eventually remote checksum
offload)

An example encapsulation of IPIP over FOU is diagrammed below. As
illustrated, the packet overhead for FOU is the 8 byte UDP header.

+------------------+
|    IPv4 hdr      |
+------------------+
|     UDP hdr      |
+------------------+
|    IPv4 hdr      |
+------------------+
|     TCP hdr      |
+------------------+
|   TCP payload    |
+------------------+

Conceptually, FOU should be able to encapsulate any IP protocol.
The FOU header (UDP hdr.) is essentially an inserted header between the
IP header and transport, so in the case of TCP or UDP encapsulation
the pseudo header would be based on the outer IP header and its length
field must not include the UDP header.

* Receive

In this patch set the RX path for FOU is implemented in a new fou
module. To enable FOU for a particular protocol, a UDP-FOU socket is
opened to the port to receive FOU packets. The socket is mapped to the
IP protocol for the packets. The XFRM mechanism used to receive
encapsulated packets (udp_encap_rcv) for the port. Upon reception, the
UDP is removed and packet is reinjected in the stack for the
corresponding protocol associated with the socket (return -protocol
from udp_encap_rcv function).

GRO is provided with the appropriate fou_gro_receive and
fou_gro_complete. These routines need to know the encapsulation
protocol so we save that in udp_offloads structure with the port
and pass it in the napi_gro_cb structure.

* TX

This patch series implements FOU transmit encapsulation for IPIP, GRE, and
SIT. This done by some common infrastructure in ip_tunnel including an
ip_tunnel_encap to perform FOU encapsulation and common configuration
to enable FOU on IP tunnels. FOU is configured on existing tunnels and
does not create any new interfaces. The transmit and receive paths are
independent, so use of FOU may be assymetric between tunnel endpoints.

* Configuration

The fou module using netlink to configure FOU receive ports. The ip
command can be augmented with a fou subcommand to support this. e.g. to
configure FOU for IPIP on port 5555:

  ip fou add port 5555 ipproto 4

GRE, IPIP, and SIT have been modified with netlink commands to
configure use of FOU on transmit. The "ip link" command will be
augmented with an encap subcommand (for supporting various forms of
secondary encapsulation). For instance, to configure an ipip tunnel
with FOU on port 5555:

  ip link add name tun1 type ipip \
    remote 192.168.1.1 local 192.168.1.2 ttl 225 \
    encap fou encap-sport auto encap-dport 5555

* Notes
  - This patch set does not implement GSO for FOU. The UDP encapsulation
    code assumes TEB, so that will need to be reimplemented.
  - When a packet is received through FOU, the UDP header is not
    actually removed for the skbuf, pointers to transport header
    and length in the IP header are updated (like in ESP/UDP RX). A
    side effect is the IP header will now appear to have an incorrect
    checksum by an external observer (e.g. tcpdump), it will be off
    by sizeof UDP header. If necessary we could adjust the checksum
    to compensate.
  - Performance results are below. My expectation is that FOU should
    entail little overhead (clearly there is some work to do :-) ).
    Optimizing UDP socket lookup for encapsulation ports should help
    significantly.
  - I really don't expect/want devices to have special support for any
    of this. Generic checksum offload mechanisms (NETIF_HW_CSUM
    and use of CHECKSUM_COMPLETE) should be sufficient. RSS and flow
    steering is provided by commonly implemented UDP hashing. GRO/GSO
    seem fairly comparable with LRO/TSO already.

* Performance

Ran netperf TCP_RR and TCP_STREAM tests across various configurations.
This was performed on bnx2x and I disabled TSO/GSO on sender to get
fair comparison for FOU versus non-FOU. CPU utilization is reported
for receive in TCP_STREAM.

  GRE
    IPv4, FOU, UDP checksum enabled
      TCP_STREAM
        24.85% CPU utilization
        9310.6 Mbps
      TCP_RR
        94.2% CPU utilization
        155/249/460 90/95/99% latencies
        1.17018e+06 tps
    IPv4, FOU, UDP checksum disabled
      TCP_STREAM
        31.04% CPU utilization
        9302.22 Mbps
      TCP_RR
        94.13% CPU utilization
        154/239/419 90/95/99% latencies
        1.17555e+06 tps
    IPv4, no FOU
      TCP_STREAM
        23.13% CPU utilization
        9354.58 Mbps
      TCP_RR
        90.24% CPU utilization
        156/228/360 90/95/99% latencies
        1.18169e+06 tps

  IPIP
    FOU, UDP checksum enabled
      TCP_STREAM
        24.13% CPU utilization
        9328 Mbps
      TCP_RR
        94.23
        149/237/429 90/95/99% latencies
        1.19553e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        29.13% CPU utilization
        9370.25 Mbps
      TCP_RR
        94.13% CPU utilization
        149/232/398 90/95/99% latencies
        1.19225e+06 tps
    No FOU
      TCP_STREAM
        10.43% CPU utilization
        5302.03 Mbps
      TCP_RR
        51.53% CPU utilization
        215/324/475 90/95/99% latencies
        864998 tps

  SIT
    FOU, UDP checksum enabled
      TCP_STREAM
        30.38% CPU utilization
        9176.76 Mbps
      TCP_RR
        96.9% CPU utilization
        170/281/581 90/95/99% latencies
        1.03372e+06 tps
    FOU, UDP checksum disabled
      TCP_STREAM
        39.6% CPU utilization
        9176.57 Mbps
      TCP_RR
        97.14% CPU utilization
        167/272/548 90/95/99% latencies
        1.03203e+06 tps
    No FOU
      TCP_STREAM
        11.2% CPU utilization
        4636.05 Mbps
      TCP_RR
        59.51% CPU utilization
        232/346/489 90/95/99% latencies
        813199 tps

v2:
  - Removed encap IP tunnel ioctls, configuration is done by netlink
    only.
  - Don't export fou_create and fou_destroy, they are currently
    intended to be called within fou module only.
  - Filled on tunnel netlink structures and functions for new values.

v3:
  - Fixed change logs for some of the patches.
  - Remove inline from fou_gro_receive and fou_gro_complete, let
    compiler decide on these.

v4:
  - Don't need to cast void in fou_from_sock
  - Removed incorrest htons for port in fou_destroy
  - Some minor cleanup for readability
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:40 -04:00
Tom Herbert
4565e9919c gre: Setup and TX path for gre/UDP foo-over-udp encapsulation
Added netlink attrs to configure FOU encapsulation for GRE, netlink
handling of these flags, and properly adjust MTU for encapsulation.
ip_tunnel_encap is called from ip_tunnel_xmit to actually perform FOU
encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert
473ab820dd ipip: Setup and TX path for ipip/UDP foo-over-udp encapsulation
Add netlink handling for IP tunnel encapsulation parameters and
and adjustment of MTU for encapsulation.  ip_tunnel_encap is called
from ip_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert
14909664e4 sit: Setup and TX path for sit/UDP foo-over-udp encapsulation
Added netlink handling of IP tunnel encapulation paramters, properly
adjust MTU for encapsulation. Added ip_tunnel_encap call to
ipip6_tunnel_xmit to actually perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert
5632848653 net: Changes to ip_tunnel to support foo-over-udp encapsulation
This patch changes IP tunnel to support (secondary) encapsulation,
Foo-over-UDP. Changes include:

1) Adding tun_hlen as the tunnel header length, encap_hlen as the
   encapsulation header length, and hlen becomes the grand total
   of these.
2) Added common netlink define to support FOU encapsulation.
3) Routines to perform FOU encapsulation.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:32 -04:00
Tom Herbert
afe93325bc fou: Add GRO support
Implement fou_gro_receive and fou_gro_complete, and populate these
in the correponsing udp_offloads for the socket. Added ipproto to
udp_offloads and pass this from UDP to the fou GRO routine in proto
field of napi_gro_cb structure.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Tom Herbert
23461551c0 fou: Support for foo-over-udp RX path
This patch provides a receive path for foo-over-udp. This allows
direct encapsulation of IP protocols over UDP. The bound destination
port is used to map to an IP protocol, and the XFRM framework
(udp_encap_rcv) is used to receive encapsulated packets. Upon
reception, the encapsulation header is logically removed (pointer
to transport header is advanced) and the packet is reinjected into
the receive path with the IP protocol indicated by the mapping.

Netlink is used to configure FOU ports. The configuration information
includes the port number to bind to and the IP protocol corresponding
to that port.

This should support GRE/UDP
(http://tools.ietf.org/html/draft-yong-tsvwg-gre-in-udp-encap-02),
as will as the other IP tunneling protocols (IPIP, SIT).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Tom Herbert
ce3e02867e net: Export inet_offloads and inet6_offloads
Want to be able to use these in foo-over-udp offloads, etc.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:15:31 -04:00
Fabian Frederick
b3f2512ecd lib: rhashtable: remove second linux/log2.h inclusion
linux/log2.h was included twice.

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:08:52 -04:00
Francesco Ruggeri
0d0162e7a3 net: allow macvlans to move to net namespace
I cannot move a macvlan interface created on top of a bonding interface
to a different namespace:

% ip netns add dummy0
% ip link add link bond0 mac0 type macvlan
% ip link set mac0 netns dummy0
RTNETLINK answers: Invalid argument
%

The problem seems to be that commit f939981492 ("bonding: Don't allow
bond devices to change network namespaces.") sets NETIF_F_NETNS_LOCAL
on bonding interfaces, and commit 797f87f83b ("macvlan: fix netdev
feature propagation from lower device") causes macvlan interfaces
to inherit its features from the lower device.

NETIF_F_NETNS_LOCAL should not be inherited from the lower device
by a macvlan.
Patch tested on 3.16.

Signed-off-by: Francesco Ruggeri <fruggeri@arista.com>
Acked-by: Cong Wang <cwang@twopensource.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:07:20 -04:00
John Fastabend
4e2840eee6 net: sched: cls_u32: rcu can not be last node
tc_u32_sel 'sel' in tc_u_knode expects to be the last element in the
structure and pads the structure with tc_u32_key fields for each key.

 kzalloc(sizeof(*n) + s->nkeys*sizeof(struct tc_u32_key), GFP_KERNEL)

CC: Eric Dumazet <edumazet@google.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 17:05:45 -04:00
David S. Miller
8f665f6cb7 Merge tag 'master-2014-09-16' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-09-17

Please pull this batch of fixes intended for the 3.17 stream...

Arend van Spriel sends a trio of minor brcmfmac fixes, including a
fix for a Kconfig/build issue, a fix for a crash (null reference),
and a regression fix related to event handling on a P2P interface.

Hante Meuleman follows-up with a brcmfmac fix for a memory leak.

Johannes Stezenbach brings an ath9k_htc fix for a regression related
to hardware decryption offload.

Marcel Holtmann delivers a one-liner to properly mark a device ID
table in rfkill-gpio.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:33:15 -04:00
Eric Dumazet
ab34f64808 net: sched: use __skb_queue_head_init() where applicable
pfifo_fast and htb use skb lists, without needing their spinlocks.
(They instead use the standard qdisc lock)

We can use __skb_queue_head_init() instead of skb_queue_head_init()
to be consistent.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:32:10 -04:00
David S. Miller
0ce77802f3 Merge branch 'bnx2x-next'
Yuval Mintz says:

====================
bnx2x: Support new Multi-function modes

This patch series adds support for 2 new Multi-function modes -
Unified Fabric Port [UFP] as well as nic partitioning 1.5 [NPAR1.5].

With the addition of the new multi-function modes, the series also
revises some of the storage-related multi-function macros.

[Do notice this series has several small issues with checkpatch]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:31:13 -04:00
Yuval Mintz
83bad206f7 bnx2x: Add a fallback multi-function mode NPAR1.5
When using new Multi-function modes it's possible that due to incompatible
configuration management FW will fallback into an existing mode.

Notice that at the moment this fallback is exactly the same as the already
existing switch-independent multi-function mode, but we still use existing
infrastructure to hold this information [in case some small differences will
arise in the future].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:31:08 -04:00
Yuval Mintz
7609647e25 bnx2x: New multi-function mode: UFP
Add support for a new multi-function mode based on the Unified Fabric Port
system specifications.
Support includes configuration of:
  1. Outer vlan tags.
  2. Bandwidth settings.
  3. Virtual link enable/disable.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:31:08 -04:00
Dmitry Kravkov
2e98ffc21c bnx2x: Changes with storage & MAC macros
Rearrange macros to query for storage-only modes in different MF environment.
Improves the readibility and maintainability of the code. E.g.:
	-	if (IS_MF_STORAGE_SD(bp) || IS_MF_FCOE_AFEX(bp))
	+	if (IS_MF_STORAGE_ONLY(bp))

In addition, this removes the need for bnx2x_is_valid_ether_addr().

Signed-off-by: Dmitry Kravkov <Dmitry.Kravkov@qlogic.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: Ariel Elior <Ariel.Elior@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:31:08 -04:00
Neil Horman
8400dd029e 3c59x: Fix bad offset spec in skb_frag_dma_map
Recently aded the use of skb_frag_dma_map to 3c59x, but didn't realize it
automatically included the frag_offset internally, as well as provided an option
to specify an extra offset in the parameter list.  We need to specify an offset
of 0 in the parameter list to avoid skb corruption that results in lost
connections.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Linux Kernel list <linux-kernel@vger.kernel.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
2014-09-19 16:28:57 -04:00
Neil Horman
6f2b6a3005 3c59x: Add dma error checking and recovery
Noted that 3c59x has no checks on transmit for failed DMA mappings, and no
ability to unmap fragments when a single map fails in the middle of a transmit.
This patch provides error checking to ensure that dma mappings work properly,
and unrolls an skb mapping if a fragmented skb transmission has a mapping
failure to prevent leaks.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Linux Kernel list <linux-kernel@vger.kernel.org>
CC: "David S. Miller" <davem@davemloft.net>
CC: Meelis Roos <mroos@linux.ee>
Tested-by: Meelis Roos <mroos@linux.ee>
2014-09-19 16:28:57 -04:00
David S. Miller
77f4f6220a Merge branch 'fec-next'
Florian Fainelli says:

====================
net: phy: Broadcom BCM7xxx PHY workaround update

This patch sets the change to of_phy_connect() that you have seen before,
this time with the full context of why it is useful and applicable here.

Due to some design decision, the internal PHY on Broadcom BCM7xxx chips
is not entirely self contained and does not report its internal revision
through MII_PHYSID2, that is left to external PHY designs.

This forces us to get the PHY revision from the GENET and SF2 switch drivers
because those two peripherals integrate such a PHY and do contain the PHY
revision in their registers.

The approach taken here is hopefully easy to extend to similar needs for
other chips/ as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:13 -04:00
Florian Fainelli
d8ebfed3f1 net: phy: bcm7xxx: utilize PHY revision in config_init
Now that the GENET and SF2 drivers have been updated to communicate us
what is the revision of the BCM7xxx integrated PHY, utilize that
information in the config_init() callback to call into the appropriate
workaround function based on our revision.

While at it, we also print the revision and patch level to help debug
new chips.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
aa9aef77c7 net: dsa: bcm_sf2: communicate integrated PHY revision to PHY driver
The integrated BCM7xxx PHY contains no useful revision information
in its MII_PHYSID2 bits 3:0, that information is instead contained in
the SWITCH_REG_PHY_REVISION register.

Read this register, store its value, and return it by implementing the
dsa_switch::get_phy_flags() callback accordingly. The register layout is
already matching what the BCM7xxx PHY driver is expecting to find.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
6819563e64 net: dsa: allow switch drivers to specify phy_device::dev_flags
Some switch drivers (e.g: bcm_sf2) may have to communicate specific
workarounds or flags towards the PHY device driver. Allow switches
driver to be delegated that task by introducing a get_phy_flags()
callback which will do just that.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
487320c541 net: bcmgenet: communicate integrated PHY revision to PHY driver
The integrated BCM7xxx PHY contains no useful revision information in
its MII_PHYSID2 bits 3:0, that information is instead contained in the
GENET hardware block.

We already read the GENET 32-bit revision register, so store the
integrated PHY revision in the driver private structure, and then
communicate this revision value to the PHY driver by overriding the
phy_flags value.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
80780a54ec net: bcmgenet: remove PHY_BRCM_100MBPS_WAR
Now that we have removed the need for the PHY_BRCM_100MBPS_WAR flag, we
can remove it from the GENET driver and the broadcom shared header file.
The PHY driver checks the PHY supported bitmask instead.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
e18556ee3b net: phy: bcm7xxx: do not use PHY_BRCM_100MBPS_WAR
There is no need for the PHY driver to check PHY_BRCM_100MBPS_WAR since
that is redundant with checking the PHY device supported features. Get
rid of that workaround flag.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
bb7d93496f net: phy: broadcom: add helper for PHY revision and patch level
The Broadcom BCM7xxx internal PHYs do not contain any useful revision
information in the low 4-bits of their MII_PHYSID2 (MII register 3)
which could allow us to properly identify them.

As a result, we need the actual hardware block integrating these PHYs:
GENET or the SF2 switch to tell us what revision they are built with. To
assist with that, add two helper macros for fetching the the PHY
revision and patch level from the struct phy_device::dev_flags.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Florian Fainelli
2f63715138 of: mdio: honor flags passed to of_phy_connect
Commit f9a8f83b04 ("net: phy: remove flags argument from phy_{attach,
connect, connect_direct}") removed the flags argument to the PHY library
calls to: phy_{attach,connect,connect_direct}.

Most Device Tree aware drivers call of_phy_connect() with the flag
argument set to 0, but some of them might want to set a different value
there in order for the PHY driver to key a specific behavior based on
the phy_device::phy_flags value.

Allow such drivers to set custom phy_flags as part of the
of_phy_connect() call since of_phy_connect() does start the PHY state
machine, it will call into the PHY driver config_init() callback which
is usually where a specific phy_flags value is important.

Fixes: f9a8f83b04 ("net: phy: remove flags argument from phy_{attach, connect, connect_direct}")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:27:07 -04:00
Eric Dumazet
2e4e441071 net: add alloc_skb_with_frags() helper
Extract from sock_alloc_send_pskb() code building skb with frags,
so that we can reuse this in other contexts.

Intent is to use it from tcp_send_rcvq(), tcp_collapse(), ...

We also want to replace some skb_linearize() calls to a more reliable
strategy in pathological cases where we need to reduce number of frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:25:23 -04:00
Linus Torvalds
46be7b73e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
Pull btrfs fixes from Chris Mason:
 "I've got a revert to fix a regression with btrfs device registration,
  and Filipe has part two of his fsync fix from last week"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Revert "Btrfs: device_list_add() should not update list when mounted"
  Btrfs: set inode's logged_trans/last_log_commit after ranged fsync
2014-09-19 13:10:53 -07:00
Linus Torvalds
81770f4144 NFS client fixes for 3.17
Highligts:
 - Fix an Oops in nfs4_open_and_get_state
 - Fix an Oops in the nfs4_state_manager
 - Fix another bug in the close/open_downgrade code
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUHIRLAAoJEGcL54qWCgDykZkP/jHDs/0HcK3x8jW+zbxKP6tf
 xfyhJGySwnTo2v0UPD1pETtQke9bWnm38RVl04wf2H4Gb7jR/BoDZ5J1C7956vuN
 FFXt9lcnTj2Cijn/8wz2S9GneY/mjsWf9OP7NUM3O6DgxORhdoviOnYOAzqzEXjG
 ylqTP/3FVglDbawKaLy3ubI0dteNxOu9U4gLveP617Ysd8h4s5XsYHPYKOOltybS
 HhVNf/3EdoD3lms67Zj7yPl7PtdDhNKFrS32nhnfdLLgsMiwTyb9ZYaFpK2XcD9v
 KDKblibH/wpQCsnReB66dKBR8P4ktTvXM1ovkb7LFUZD5tsOcb1Bp5ROzHXUSmiI
 sXh5Ueue0FPKExU5WFKROE43+G5KOJG5pB2RwgugsqVlZjFhGotZrIle17Zuqxz0
 kVR+vGZ50O/nLQ+EoRhDRRbDBrUMT7/xxHDSPQ6d4HK2hNTbrXuSXcoe8/BvbSTt
 JXQCdbWDPZ5oR6z8RoBN1xHhJvXC3Y2w/d7ZzOpl3yLzsKpJ7K4tys4Z29iv3ut6
 ziRS1AvJvedwSK73fWTs+zEHKm+pFMqq2U+DncvWWOWOVpIv6eKRlY9O8enP7IeW
 qNHj4UVYnr9w4oAhvk2WJt1TZrhzBX9NhMjHSxUCSOs5v/YeiBjPTTy40N4O0Y6Z
 DwKwDNxZq49ILEznntsd
 =EOW6
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.17-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client fixes from Trond Myklebust:
 "Highligts:
   - fix an Oops in nfs4_open_and_get_state
   - fix an Oops in the nfs4_state_manager
   - fix another bug in the close/open_downgrade code"

* tag 'nfs-for-3.17-5' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFSv4: Fix another bug in the close/open_downgrade code
  NFSv4: nfs4_state_manager() vs. nfs_server_remove_lists()
  NFS: remove BUG possibility in nfs4_open_and_get_state
2014-09-19 13:07:49 -07:00
Eric Dumazet
cb93471acc tcp: do not fake tcp headers in tcp_send_rcvq()
Now we no longer rely on having tcp headers for skbs in receive queue,
tcp repair do not need to build fake ones.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:04:13 -04:00
Alexei Starovoitov
f6f2332dce sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
fix several issues in sparc BPF JIT compiler.

ldx/stx related:
. classic BPF instructions that access mem[] slots were not setting
  SEEN_MEM flag, so stack wasn't allocated. Fix that by advertising
  correct flags

. LDX/STX instructions were missing SEEN_XREG, so register value
  could have leaked to user space. Fix it.

. since stack for mem[] slots is allocated with 'sub %sp' instead
  of 'save %sp', use %sp as base register instead of %fp.

. ldx mem[0] means first slot in classic BPF which should have
  -4 offset instead of 0.

. sparc64 needs 2047 stack bias as per ABI to access stack

. emit_stmem() was using LD32I macro instead of ST32I

SKF_AD_VLAN_TAG* related:
. SKF_AD_VLAN_TAG_PRESENT must return 1 or 0 instead of '> 0' or 0
  as per classic BPF de facto standard

. SKF_AD_VLAN_TAG needs to mask the field correctly

Fixes: 2809a2087c ("net: filter: Just In Time compiler for sparc")
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 16:01:18 -04:00
David S. Miller
3ff6425961 Merge branch 'udp-tunnel-common'
Andy Zhou says:

====================
Refactor vxlan and l2tp to use new common UDP tunnel APIs

This patch series add a few more UDP tunnel APIs and refactoring current
UDP tunnel based protocols, vxlan and l2tp to make use of the new APIs.

The added APIs are setup_udp_tunnel_sock(), udp_tunnel_xmit_skb() and
udp_tunnel_sock_release(). Those implementation logics already exist in
current vxlan and l2tp implementation. Move them to common APIs to reduce
code duplications.

Also split udp_tunnel.c into net/ipv4/udp_tunnel.c and
net/ipv6/ip6_udp_tunnel.c to maintain proper IP protocol separation.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:46 -04:00
Andy Zhou
c8fffcea0a l2tp: Refactor l2tp core driver to make use of the common UDP tunnel functions
Simplify l2tp implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
acbf74a763 vxlan: Refactor vxlan driver to make use of the common UDP tunnel functions.
Simplify vxlan implementation using common UDP tunnel APIs.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
6a93cc9052 udp-tunnel: Add a few more UDP tunnel APIs
Added a few more UDP tunnel APIs that can be shared by UDP based
tunnel protocol implementation. The main ones are highlighted below.

setup_udp_tunnel_sock() configures UDP listener socket for
receiving UDP encapsulated packets.

udp_tunnel_xmit_skb() and upd_tunnel6_xmit_skb() transmit skb
using UDP encapsulation.

udp_tunnel_sock_release() closes the UDP tunnel listener socket.

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
Andy Zhou
fd384412e1 udp_tunnel: Seperate ipv6 functions into its own file.
Add ip6_udp_tunnel.c for ipv6 UDP tunnel functions to avoid ifdefs
in udp_tunnel.c

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:57:15 -04:00
David S. Miller
79ba2b4c5d Merge branch 'fec-next'
Frank Li says:

====================
net: fec: add interrupt coalescence

improve error handle when parse queue number.
add interrupt coalescence feature.

Change from v2 to v3
 - add error check in fec_enet_set_coalesce
 - fix a run time warning to get clock rate in interrupt
 - fix commit message use TKT number

Change from v1 to v2
 - fix indention
 - use errata number instead of TKT
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:36:54 -04:00
Fugang Duan
37d6017b84 net: fec: Workaround for imx6sx enet tx hang when enable three queues
When enable three queues on imx6sx enet, and then do tx performance
test with iperf tool, after some time running, tx hang.

Found that:
	If uDMA is running, software set TDAR may cause tx hang.
	If uDMA is in idle, software set TDAR don't cause tx hang.

There is a TDAR race condition for mutliQ when the software sets TDAR
and the UDMA clears TDAR simultaneously or in a small window (2-4 cycles).
This will cause the udma_tx and udma_tx_arbiter state machines to hang.
The issue exist at i.MX6SX enet IP.

So, the Workaround is checking TDAR status four time, if TDAR cleared by
hardware and then write TDAR, otherwise don't set TDAR.

The patch is only one Workaround for the issue ERR007885.

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:36:50 -04:00
Fugang Duan
73e7228941 net:fec: increase DMA queue number
when enable interrupt coalesce, 8 BD is not enough.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:36:50 -04:00
Fugang Duan
d851b47b22 net: fec: add interrupt coalescence feature support
i.MX6 SX support interrupt coalescence feature
By default, init the interrupt coalescing frame count threshold and
timer threshold.

Supply the ethtool interfaces as below for user tuning to improve
enet performance:
	rx_max_coalesced_frames
	rx_coalesce_usecs
	tx_max_coalesced_frames
	tx_coalesce_usecs

Signed-off-by: Fugang Duan <B38611@freescale.com>
Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:36:50 -04:00
Frank Li
b7bd75cf53 net: fec: refine error handle of parser queue number from DT
check tx and rx queue seperately.
fix typo, "Invalidate" and "fail".
change pr_err to pr_warn.

Signed-off-by: Frank Li <Frank.Li@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:36:49 -04:00
Alexei Starovoitov
709f6c58d4 sparc: bpf_jit: add SKF_AD_PKTTYPE support to JIT
commit 233577a220 ("net: filter: constify detection of pkt_type_offset")
allows us to implement simple PKTTYPE support in sparc JIT

Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-19 15:34:40 -04:00
Linus Torvalds
b29f83aa8b PCI updates for v3.17:
Enumeration
     - Don't default exclusively to first video device (Bruno Prémont)
 
   PCI device hotplug
     - Remove "no hotplug settings from platform" warning (Bjorn Helgaas)
     - Add pci_ignore_hotplug() for VGA switcheroo (Bjorn Helgaas)
 
   Freescale i.MX6
     - Put LTSSM in "Detect" state before disabling (Lucas Stach)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJUG7o9AAoJEFmIoMA60/r8hbYP/3gR3xHd2QKpkmBcM1lf1yiR
 osQQnAfRqEO4fzrpmOYrYbLIAOPwanK6Y36rmIYB+wHU2SUaffV7ZI9uW32shTud
 09+1N+OrSS6fwzVUWOuKsf1kv/jxpS+ic2fb+Qe1OXwJh5G+z1D9Kvd2EPLJdlgK
 ySyX4zSTrLni8CoclzREO7u82VVO5rTdvbujBxuvpOQTOdD5TFqV/uhb/y3gQz+u
 sG6IxUbdXsy4r24C6OnPrmmZ1Rk/lgCMyA+QSozc5Eu5PdGzcY9a6gcKlTnsbwBs
 qYLAb+/KCa3KgQh07NYmFfYdpoMZUXgSsEtD8gyvfJQHwUYwW8rsEMKxlSCQrzYr
 0OrpBSVTO6ta1r8SKOWtSYETQgPE3GUiJR1DuCyV+55RLZYp6Q8zH6dbgfWQbA/g
 R/kWHihR/tcD9YIlT99QrBppZtvG5nZ3y7aLSqdYYxEJqHE0tlbuxAu8hgwDf3Qp
 lKZJMyadLB1MS9lnrMj8DYqIOKbe62LOwcEYzhMJzaq8vCy+JWtjxOOgwBkT7P5v
 bhhYh3eqi5/MBONtw52V6RDUQId9vOLGHoiM5/akG4FFmWdhO9S0SbMBhAJyazKT
 n3IP5yj657XAi/fK939PUCQ3YuT5GqlCNcXDCqUzBZhGnt/Ln7LPmQI323519Lp4
 3vI3irFT0Fu2IFEBhc6l
 =xIYe
 -----END PGP SIGNATURE-----

Merge tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci

Pull PCI fixes from Bjorn Helgaas:
 "These fix:

   - Boot video device detection on dual-GPU Apple systems
   - Hotplug fiascos on VGA switcheroo with radeon & nouveau drivers
   - Boot hang on Freescale i.MX6 systems
   - Excessive "no hotplug settings from platform" warnings

  In particular:

  Enumeration
    - Don't default exclusively to first video device (Bruno Prémont)

  PCI device hotplug
    - Remove "no hotplug settings from platform" warning (Bjorn Helgaas)
    - Add pci_ignore_hotplug() for VGA switcheroo (Bjorn Helgaas)

  Freescale i.MX6
    - Put LTSSM in "Detect" state before disabling (Lucas Stach)"

* tag 'pci-v3.17-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
  vgaarb: Drop obsolete #ifndef
  vgaarb: Don't default exclusively to first video device with mem+io
  ACPIPHP / radeon / nouveau: Remove acpi_bus_no_hotplug()
  PCI: Remove "no hotplug settings from platform" warning
  PCI: Add pci_ignore_hotplug() to ignore hotplug events for a device
  PCI: imx6: Put LTSSM in "Detect" state before disabling it
  MAINTAINERS: Add Lucas Stach as co-maintainer for i.MX6 PCI driver
2014-09-19 10:50:30 -07:00
Linus Torvalds
73030efad5 SCSI fixes on 20140919
This is a set of three fixes.  One represents a nasty shared tag map
 regression (another inverted condition) caused by recent SCSI MQ patches, one
 is a longstanding potential buffer overrun in the iscsi data buffer and the
 final one is a use after free for the rare bidirectional commands.
 
 Signed-off-by: James Bottomley <JBottomley@Parallels.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUHDQjAAoJEDeqqVYsXL0MAksIAMW7JMBW2BRSixKhqtZOsItd
 DrYN1wfvOuoKgyKnXBfhLbgKih9UgMEnc4zWnB96gBnNJCwFpW9vIICdf6kCWwXh
 c7Bh24SjcHMvYU9Of08TcAdUxyql6md/d28jzjoIi9zFj4C3XvLgevR8n4x2Lk2b
 3SQnX/PYUVMd4sKZaTQDLLAy5+CYGf2tbLPHlMZQHlIN+Cgjorm8zW7tIWJ+rhrv
 Sc3f9O9LxROsJ5wJNJpD0/9adlpsFxSD6cJWELaXBFsgw7jh0ZUjAXSYpyHASZqF
 Mln4kNCEcoa3IIiLVxBbBoQCVF01+kYpGuZM4o4nU6b+wluXupcuhTh5o37G7Q8=
 =FPm9
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a set of three fixes.

  One represents a nasty shared tag map regression (another inverted
  condition) caused by recent SCSI MQ patches, one is a longstanding
  potential buffer overrun in the iscsi data buffer and the final one is
  a use after free for the rare bidirectional commands"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  [SCSI] fix for bidi use after free
  [SCSI] fix regression that accidentally disabled block-based tcq
  [SCSI] libiscsi: fix potential buffer overrun in __iscsi_conn_send_pdu
2014-09-19 10:46:48 -07:00
Linus Torvalds
598a0c7d09 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "Two kernel side fixes: a kprobes fix and a perf_remove_from_context()
  fix (which does not yet fix the migration bug which is WIP)"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf: Fix a race condition in perf_remove_from_context()
  kprobes/x86: Free 'optinsn' cache when range check fails
2014-09-19 10:31:36 -07:00
Mike Marciniszyn
85cbb7c728 IB/qib: Correct reference counting in debugfs qp_stats
This particular reference count is not needed with the rcu protection,
and the current code leaks a reference count, causing a hang in
qib_qp_destroy().

Cc: <stable@vger.kernel.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19 10:18:32 -07:00
Alex Estrin
68f9d83c77 IPoIB: Remove unnecessary port query
There are two queries for port attributes one after another. A second
call is not needed since port_attr structure already holds the data.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19 10:17:40 -07:00
Bjorn Helgaas
12d8706963 Revert "PCI: Make sure bus number resources stay within their parents bounds"
This reverts commit 1820ffdccb ("PCI: Make sure bus number resources stay
within their parents bounds") because it breaks some systems with LSI Logic
FC949ES Fibre Channel Adapters, apparently by exposing a defect in those
adapters.

Dirk tested a Tyan VX50 (B4985) with this device that worked like this
prior to 1820ffdccb:

    bus: [bus 00-7f] on node 0 link 1
    ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-07])
    pci 0000:00:0e.0: PCI bridge to [bus 0a]
    pci_bus 0000:0a: busn_res: can not insert [bus 0a] under [bus 00-07] (conflicts with (null) [bus 00-07])
    pci 0000:0a:00.0: [1000:0646] type 00 class 0x0c0400 (FC adapter)

Note that the root bridge [bus 00-07] aperture is wrong; this is a BIOS
defect in the PCI0 _CRS method.  But prior to 1820ffdccb, we didn't
enforce that aperture, and the FC adapter worked fine at 0a:00.0.

After 1820ffdccb, we notice that 00:0e.0's aperture is not contained in
the root bridge's aperture, so we reconfigure it so it *is* contained:

    pci 0000:00:0e.0: bridge configuration invalid ([bus 0a-0a]), reconfiguring
    pci 0000:00:0e.0: PCI bridge to [bus 06-07]

This effectively moves the FC device from 0a:00.0 to 07:00.0, which should
be legal.  But when we enumerate bus 06, the FC device doesn't respond, so
we don't find anything.  This is probably a defect in the FC device.

Possible fixes (due to Yinghai):

    1) Add a quirk to fix the _CRS information based on what amd_bus.c read
       from the hardware

    2) Reset the FC device after we change its bus number

    3) Revert 1820ffdccb

Fix 1 would be relatively easy, but it does sweep the LSI FC issue under
the rug.  We might want to reconfigure bus numbers in the future for some
other reason, e.g., hotplug, and then we could trip over this again.

For that reason, I like fix 2, but we don't know whether it actually works,
and we don't have a patch for it yet.

This revert is fix 3, which also sweeps the LSI FC issue under the rug.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=84281
Reported-by: Dirk Gouders <dirk@gouders.net>
Tested-by: Dirk Gouders <dirk@gouders.net>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
CC: stable@vger.kernel.org	# v3.15+
CC: Yinghai Lu <yinghai@kernel.org>
2014-09-19 11:08:40 -06:00