Commit graph

33976 commits

Author SHA1 Message Date
Marcel Holtmann
65cc2b49db Bluetooth: Use struct delayed_work for HCI command timeout
Since the whole HCI command, event and data packet processing has been
migrated to use workqueues instead of tasklets, it makes sense to use
struct delayed_work instead of struct timer_list for the timeout
handling. This patch converts the hdev->cmd_timer to use workqueue
as well.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:42 +02:00
Marcel Holtmann
d9fbd02be5 Bluetooth: Use explicit header and body length for L2CAP SKB allocation
When allocating the L2CAP SKB for transmission, provide the upper layers
with a clear distinction on what is the header and what is the body
portion of the SKB.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:42 +02:00
Marcel Holtmann
8d46321c4f Bluetooth: Assign L2CAP socket priority when allocating SKB
The SKB for L2CAP sockets are all allocated in a central callback
in the socket support. Instead of having to pass around the socket
priority all the time, assign it to skb->priority when actually
allocating the SKB.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:41 +02:00
Marcel Holtmann
67f86a45bb Bluetooth: Use const for struct l2cap_ops field
The struct l2cap_ops field should not allow any modifications and thus
it is better declared as const.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2014-07-03 17:42:41 +02:00
Marcel Holtmann
53730107b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth 2014-07-03 16:39:39 +02:00
Trond Myklebust
2fc193cf92 SUNRPC: Handle EPIPE in xprt_connect_status
The callback handler xs_error_report() can end up propagating an EPIPE
error by means of the call to xprt_wake_pending_tasks(). Ensure that
xprt_connect_status() does not automatically convert this into an
EIO error.

Reported-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-07-03 00:11:35 -04:00
Daniel Borkmann
eaea2da728 net: sctp: only warn in proc_sctp_do_alpha_beta if write
Only warn if the value is written to alpha or beta. We don't care
emitting a one-time warning when only reading it.

Reported-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:44:07 -07:00
Daniel Borkmann
8f61059a96 net: sctp: improve timer slack calculation for transport HBs
RFC4960, section 8.3 says:

  On an idle destination address that is allowed to heartbeat,
  it is recommended that a HEARTBEAT chunk is sent once per RTO
  of that destination address plus the protocol parameter
  'HB.interval', with jittering of +/- 50% of the RTO value,
  and exponential backoff of the RTO if the previous HEARTBEAT
  is unanswered.

Currently, we calculate jitter via sctp_jitter() function first,
and then add its result to the current RTO for the new timeout:

  TMO = RTO + (RAND() % RTO) - (RTO / 2)
              `------------------------^-=> sctp_jitter()

Instead, we can just simplify all this by directly calculating:

  TMO = (RTO / 2) + (RAND() % RTO)

With the help of prandom_u32_max(), we don't need to open code
our own global PRNG, but can instead just make use of the per
CPU implementation of prandom with better quality numbers. Also,
we can now spare us the conditional for divide by zero check
since no div or mod operation needs to be used. Note that
prandom_u32_max() won't emit the same result as a mod operation,
but we really don't care here as we only want to have a random
number scaled into RTO interval.

Note, exponential RTO backoff is handeled elsewhere, namely in
sctp_do_8_2_transport_strike().

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:44:07 -07:00
Christoph Paasch
5924f17a8a tcp: Fix divide by zero when pushing during tcp-repair
When in repair-mode and TCP_RECV_QUEUE is set, we end up calling
tcp_push with mss_now being 0. If data is in the send-queue and
tcp_set_skb_tso_segs gets called, we crash because it will divide by
mss_now:

[  347.151939] divide error: 0000 [#1] SMP
[  347.152907] Modules linked in:
[  347.152907] CPU: 1 PID: 1123 Comm: packetdrill Not tainted 3.16.0-rc2 #4
[  347.152907] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[  347.152907] task: f5b88540 ti: f3c82000 task.ti: f3c82000
[  347.152907] EIP: 0060:[<c1601359>] EFLAGS: 00210246 CPU: 1
[  347.152907] EIP is at tcp_set_skb_tso_segs+0x49/0xa0
[  347.152907] EAX: 00000b67 EBX: f5acd080 ECX: 00000000 EDX: 00000000
[  347.152907] ESI: f5a28f40 EDI: f3c88f00 EBP: f3c83d10 ESP: f3c83d00
[  347.152907]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  347.152907] CR0: 80050033 CR2: 083158b0 CR3: 35146000 CR4: 000006b0
[  347.152907] Stack:
[  347.152907]  c167f9d9 f5acd080 000005b4 00000002 f3c83d20 c16013e6 f3c88f00 f5acd080
[  347.152907]  f3c83da0 c1603b5a f3c83d38 c10a0188 00000000 00000000 f3c83d84 c10acc85
[  347.152907]  c1ad5ec0 00000000 00000000 c1ad679c 010003e0 00000000 00000000 f3c88fc8
[  347.152907] Call Trace:
[  347.152907]  [<c167f9d9>] ? apic_timer_interrupt+0x2d/0x34
[  347.152907]  [<c16013e6>] tcp_init_tso_segs+0x36/0x50
[  347.152907]  [<c1603b5a>] tcp_write_xmit+0x7a/0xbf0
[  347.152907]  [<c10a0188>] ? up+0x28/0x40
[  347.152907]  [<c10acc85>] ? console_unlock+0x295/0x480
[  347.152907]  [<c10ad24f>] ? vprintk_emit+0x1ef/0x4b0
[  347.152907]  [<c1605716>] __tcp_push_pending_frames+0x36/0xd0
[  347.152907]  [<c15f4860>] tcp_push+0xf0/0x120
[  347.152907]  [<c15f7641>] tcp_sendmsg+0xf1/0xbf0
[  347.152907]  [<c116d920>] ? kmem_cache_free+0xf0/0x120
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c106a682>] ? __sigqueue_free+0x32/0x40
[  347.152907]  [<c114f0f0>] ? do_wp_page+0x3e0/0x850
[  347.152907]  [<c161c36a>] inet_sendmsg+0x4a/0xb0
[  347.152907]  [<c1150269>] ? handle_mm_fault+0x709/0xfb0
[  347.152907]  [<c15a006b>] sock_aio_write+0xbb/0xd0
[  347.152907]  [<c1180b79>] do_sync_write+0x69/0xa0
[  347.152907]  [<c1181023>] vfs_write+0x123/0x160
[  347.152907]  [<c1181d55>] SyS_write+0x55/0xb0
[  347.152907]  [<c167f0d8>] sysenter_do_call+0x12/0x28

This can easily be reproduced with the following packetdrill-script (the
"magic" with netem, sk_pacing and limit_output_bytes is done to prevent
the kernel from pushing all segments, because hitting the limit without
doing this is not so easy with packetdrill):

0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

+0  bind(3, ..., ...) = 0
+0  listen(3, 1) = 0

+0  < S 0:0(0) win 32792 <mss 1460>
+0  > S. 0:0(0) ack 1 <mss 1460>
+0.1  < . 1:1(0) ack 1 win 65000

+0  accept(3, ..., ...) = 4

// This forces that not all segments of the snd-queue will be pushed
+0 `tc qdisc add dev tun0 root netem delay 10ms`
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=2`
+0 setsockopt(4, SOL_SOCKET, 47, [2], 4) = 0

+0 write(4,...,10000) = 10000
+0 write(4,...,10000) = 10000

// Set tcp-repair stuff, particularly TCP_RECV_QUEUE
+0 setsockopt(4, SOL_TCP, 19, [1], 4) = 0
+0 setsockopt(4, SOL_TCP, 20, [1], 4) = 0

// This now will make the write push the remaining segments
+0 setsockopt(4, SOL_SOCKET, 47, [20000], 4) = 0
+0 `sysctl -w net.ipv4.tcp_limit_output_bytes=130000`

// Now we will crash
+0 write(4,...,1000) = 1000

This happens since ec34232575 (tcp: fix retransmission in repair
mode). Prior to that, the call to tcp_push was prevented by a check for
tp->repair.

The patch fixes it, by adding the new goto-label out_nopush. When exiting
tcp_sendmsg and a push is not required, which is the case for tp->repair,
we go to this label.

When repairing and calling send() with TCP_RECV_QUEUE, the data is
actually put in the receive-queue. So, no push is required because no
data has been added to the send-queue.

Cc: Andrew Vagin <avagin@openvz.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Fixes: ec34232575 (tcp: fix retransmission in repair mode)
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Acked-by: Andrew Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 18:21:03 -07:00
Fabian Frederick
fb0d164cc1 net/caif/caif_socket.c: remove unnecessary null test before debugfs_remove_recursive
based on checkpatch:
"debugfs_remove_recursive(NULL) is safe this check is probably not required"

Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 17:05:29 -07:00
Eric Dumazet
a48e5fafec vlan: free percpu stats in device destructor
Madalin-Cristian reported crashs happening after a recent commit
(5a4ae5f6e7 "vlan: unnecessary to check if vlan_pcpu_stats is NULL")

-----------------------------------------------------------------------
root@p5040ds:~# vconfig add eth8 1
root@p5040ds:~# vconfig rem eth8.1
Unable to handle kernel paging request for data at address 0x2bc88028
Faulting instruction address: 0xc058e950
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=8 CoreNet Generic
Modules linked in:
CPU: 3 PID: 2167 Comm: vconfig Tainted: G        W     3.16.0-rc3-00346-g65e85bf #2
task: e7264d90 ti: e2c2c000 task.ti: e2c2c000
NIP: c058e950 LR: c058ea30 CTR: c058e900
REGS: e2c2db20 TRAP: 0300   Tainted: G        W      (3.16.0-rc3-00346-g65e85bf)
MSR: 00029002 <CE,EE,ME>  CR: 48000428  XER: 20000000
DEAR: 2bc88028 ESR: 00000000
GPR00: c047299c e2c2dbd0 e7264d90 00000000 2bc88000 00000000 ffffffff 00000000
GPR08: 0000000f 00000000 000000ff 00000000 28000422 10121928 10100000 10100000
GPR16: 10100000 00000000 c07c5968 00000000 00000000 00000000 e2c2dc48 e7838000
GPR24: c07c5bac c07c58a8 e77290cc c07b0000 00000000 c05de6c0 e7838000 e2c2dc48
NIP [c058e950] vlan_dev_get_stats64+0x50/0x170
LR [c058ea30] vlan_dev_get_stats64+0x130/0x170
Call Trace:
[e2c2dbd0] [ffffffea] 0xffffffea (unreliable)
[e2c2dc20] [c047299c] dev_get_stats+0x4c/0x140
[e2c2dc40] [c0488ca8] rtnl_fill_ifinfo+0x3d8/0x960
[e2c2dd70] [c0489f4c] rtmsg_ifinfo+0x6c/0x110
[e2c2dd90] [c04731d4] rollback_registered_many+0x344/0x3b0
[e2c2ddd0] [c047332c] rollback_registered+0x2c/0x50
[e2c2ddf0] [c0476058] unregister_netdevice_queue+0x78/0xf0
[e2c2de00] [c058d800] unregister_vlan_dev+0xc0/0x160
[e2c2de20] [c058e360] vlan_ioctl_handler+0x1c0/0x550
[e2c2de90] [c045d11c] sock_ioctl+0x28c/0x2f0
[e2c2deb0] [c010d070] do_vfs_ioctl+0x90/0x7b0
[e2c2df20] [c010d7d0] SyS_ioctl+0x40/0x80
[e2c2df40] [c000f924] ret_from_syscall+0x0/0x3c

Fix this problem by freeing percpu stats from dev->destructor() instead
of ndo_uninit()

Reported-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Madalin-Cristian Bucur <madalin.bucur@freescale.com>
Fixes: 5a4ae5f6e7 ("vlan: unnecessary to check if vlan_pcpu_stats is NULL")
Cc: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-02 17:02:05 -07:00
David S. Miller
eb608d2b99 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
pull request: wireless 2014-06-27

Please pull the following batch of fixes for the 3.16 stream...

For the mac80211 bits, Johannes says:

"We have a fix from Eliad for a time calculation, a fix from Max for
head/tailroom when sending authentication packets, a revert that Felix
requested since the patch in question broke regulatory and a fix from
myself for an issue with a new command that we advertised in the wrong
place."

For the bluetooth bits, Gustavo says:

"A few fixes for 3.16. This pull request contains a NULL dereference fix,
and some security/pairing fixes."

For the iwlwifi bits, Emmanuel says:

"I have here a fix from Eliad for scheduled scan: it fixes a firmware
assertion. Arik reverts a patch I made that didn't take into account
that 3160 doesn't have UAPSD and hence, we can't assume that all
newer firmwares support the feature. Here too, the visible effect
is a firmware assertion. Along with that, we have a few fixes and
additions to the device list."

For the ath10k bits, Kalle says:

"Bartosz fixed an issue where we were not able to create 8 vdevs when
using DFS. Michal removed a false warning which was just confusing
people."

On top of that...

Arend van Spriel fixes a 'divide by zero' regression in brcmfmac.

Amitkumar Karwar corrects a transmit timeout in mwifiex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 23:47:33 -07:00
Eric Dumazet
9fe516ba3f inet: move ipv6only in sock_common
When an UDP application switches from AF_INET to AF_INET6 sockets, we
have a small performance degradation for IPv4 communications because of
extra cache line misses to access ipv6only information.

This can also be noticed for TCP listeners, as ipv6_only_sock() is also
used from __inet_lookup_listener()->compute_score()

This is magnified when SO_REUSEPORT is used.

Move ipv6only into struct sock_common so that it is available at
no extra cost in lookups.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 23:46:21 -07:00
Jesper Dangaard Brouer
8788370a1d pktgen: RCU-ify "if_list" to remove lock in next_to_run()
The if_lock()/if_unlock() in next_to_run() adds a significant
overhead, because its called for every packet in busy loop of
pktgen_thread_worker().  (Thomas Graf originally pointed me
at this lock problem).

Removing these two "LOCK" operations should in theory save us approx
16ns (8ns x 2), as illustrated below we do save 16ns when removing
the locks and introducing RCU protection.

Performance data with CLONE_SKB==100000, TX-size=512, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev   : 5684009 pps --> 175.93ns (1/5684009*10^9)
 * RCU-fix: 6272204 pps --> 159.43ns (1/6272204*10^9)
 * Diff   : +588195 pps --> -16.50ns

To understand this RCU patch, I describe the pktgen thread model
below.

In pktgen there is several kernel threads, but there is only one CPU
running each kernel thread.  Communication with the kernel threads are
done through some thread control flags.  This allow the thread to
change data structures at a know synchronization point, see main
thread func pktgen_thread_worker().

Userspace changes are communicated through proc-file writes.  There
are three types of changes, general control changes "pgctrl"
(func:pgctrl_write), thread changes "kpktgend_X"
(func:pktgen_thread_write), and interface config changes "etcX@N"
(func:pktgen_if_write).

Userspace "pgctrl" and "thread" changes are synchronized via the mutex
pktgen_thread_lock, thus only a single userspace instance can run.
The mutex is taken while the packet generator is running, by pgctrl
"start".  Thus e.g. "add_device" cannot be invoked when pktgen is
running/started.

All "pgctrl" and all "thread" changes, except thread "add_device",
communicate via the thread control flags.  The main problem is the
exception "add_device", that modifies threads "if_list" directly.

Fortunately "add_device" cannot be invoked while pktgen is running.
But there exists a race between "rem_device_all" and "add_device"
(which normally don't occur, because "rem_device_all" waits 125ms
before returning). Background'ing "rem_device_all" and running
"add_device" immediately allow the race to occur.

The race affects the threads (list of devices) "if_list".  The if_lock
is used for protecting this "if_list".  Other readers are given
lock-free access to the list under RCU read sections.

Note, interface config changes (via proc) can occur while pktgen is
running, which worries me a bit.  I'm assuming proc_remove() takes
appropriate locks, to assure no writers exists after proc_remove()
finish.

I've been running a script exercising the race condition (leading me
to fix the proc_remove order), without any issues.  The script also
exercises concurrent proc writes, while the interface config is
getting removed.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 15:50:23 -07:00
Jesper Dangaard Brouer
baac167b70 pktgen: avoid expensive set_current_state() call in loop
Avoid calling set_current_state() inside the busy-loop in
pktgen_thread_worker().  In case of pkt_dev->delay, then it is still
used/enabled in pktgen_xmit() via the spin() call.

The set_current_state(TASK_INTERRUPTIBLE) uses a xchg, which implicit
is LOCK prefixed.  I've measured the asm LOCK operation to take approx
8ns on this E5-2630 CPU.  Performance increase corrolate with this
measurement.

Performance data with CLONE_SKB==100000, rx-usecs=30:
 (single CPU performance, ixgbe 10Gbit/s, E5-2630)
 * Prev:  5454050 pps --> 183.35ns (1/5454050*10^9)
 * Now:   5684009 pps --> 175.93ns (1/5684009*10^9)
 * Diff:  +229959 pps -->  -7.42ns

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 15:50:23 -07:00
Jiri Pirko
5b9e7e1607 openvswitch: introduce rtnl ops stub
This stub now allows userspace to see IFLA_INFO_KIND for ovs master and
IFLA_INFO_SLAVE_KIND for slave.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 14:40:17 -07:00
Jiri Pirko
b0ab2fabb5 rtnetlink: allow to register ops without ops->setup set
So far, it is assumed that ops->setup is filled up. But there might be
case that ops might make sense even without ->setup. In that case,
forbid to newlink and dellink.

This allows to register simple rtnl link ops containing only ->kind.
That allows consistent way of passing device kind (either device-kind or
slave-kind) to userspace.

Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 14:40:17 -07:00
Ying Xue
9bf2b8c280 net: fix some typos in comment
In commit 371121057607e3127e19b3fa094330181b5b031e("net:
QDISC_STATE_RUNNING dont need atomic bit ops") the
__QDISC_STATE_RUNNING is renamed to __QDISC___STATE_RUNNING,
but the old names existing in comment are not replaced with
the new name completely.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 14:20:32 -07:00
Ben Greear
d933319657 ipv6: Allow accepting RA from local IP addresses.
This can be used in virtual networking applications, and
may have other uses as well.  The option is disabled by
default.

A specific use case is setting up virtual routers, bridges, and
hosts on a single OS without the use of network namespaces or
virtual machines.  With proper use of ip rules, routing tables,
veth interface pairs and/or other virtual interfaces,
and applications that can bind to interfaces and/or IP addresses,
it is possibly to create one or more virtual routers with multiple
hosts attached.  The host interfaces can act as IPv6 systems,
with radvd running on the ports in the virtual routers.  With the
option provided in this patch enabled, those hosts can now properly
obtain IPv6 addresses from the radvd.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 12:16:24 -07:00
Ben Greear
f2a762d8a9 ipv6: Add more debugging around accept-ra logic.
This is disabled by default, just like similar debug info
already in this module.  But, makes it easier to find out
why RA is not being accepted when debugging strange behaviour.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-01 12:16:24 -07:00
Eric Dumazet
7f50236153 ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix
We have two different ways to handle changes to sk->sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5dfb2 ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 9cb3a50c5f ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-30 23:40:58 -07:00
Alex Wang
4a46b24e14 openvswitch: Use exact lookup for flow_get and flow_del.
Due to the race condition in userspace, there is chance that two
overlapping megaflows could be installed in datapath.  And this
causes userspace unable to delete the less inclusive megaflow flow
even after it timeout, since the flow_del logic will stop at the
first match of masked flow.

This commit fixes the bug by making the kernel flow_del and flow_get
logic check all masks in that case.

Introduced by 03f0d916a (openvswitch: Mega flow implementation).

Signed-off-by: Alex Wang <alexw@nicira.com>
Acked-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-30 20:47:15 -07:00
Trond Myklebust
3601c4a91e SUNRPC: Ensure that we handle ENOBUFS errors correctly.
Currently, an ENOBUFS error will result in a fatal error for the RPC
call. Normally, we will just want to wait and then retry.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
2014-06-30 13:42:19 -04:00
Pablo Neira Ayuso
63283dd21e netfilter: nf_tables: skip transaction if no update flags in tables
Skip transaction handling for table updates with no changes in
the flags. This fixes a crash when passing the table flag with all
bits unset.

Reported-by: Ana Rey <anarey@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-30 11:44:24 +02:00
Duan Jiong
24de3d3775 netfilter: use IS_ENABLED() macro
replace:
 #if defined(CONFIG_NF_CT_NETLINK) || defined(CONFIG_NF_CT_NETLINK_MODULE)
with
 #if IS_ENABLED(CONFIG_NF_CT_NETLINK)

replace:
 #if !defined(CONFIG_NF_NAT) && !defined(CONFIG_NF_NAT_MODULE)
with
 #if !IS_ENABLED(CONFIG_NF_NAT)

replace:
 #if !defined(CONFIG_NF_CONNTRACK) && !defined(CONFIG_NF_CONNTRACK_MODULE)
with
 #if !IS_ENABLED(CONFIG_NF_CONNTRACK)

And add missing:
 IS_ENABLED(CONFIG_NF_CT_NETLINK)

in net/ipv{4,6}/netfilter/nf_nat_l3proto_ipv{4,6}.c

Signed-off-by: Duan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-30 11:38:03 +02:00
Octavian Purdila
4135ab8208 tcp: tcp_conn_request: fix build error when IPv6 is disabled
Fixes build error introduced by commit 1fb6f159fd (tcp: add
tcp_conn_request):

net/ipv4/tcp_input.c: In function 'pr_drop_req':
net/ipv4/tcp_input.c:5889:130: error: 'struct sock_common' has no member named 'skc_v6_daddr'

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-29 23:46:38 -07:00
Christoph Paasch
1759389e8a xfrm4: Remove duplicate semicolon
3328715e6c (xfrm4: Add IPsec protocol multiplexer) adds a
duplicate semicolon after the return-statement.

Although it has no negative impact, the second semicolon should be
removed.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-06-30 07:49:47 +02:00
Tobias Brunner
a0e5ef53aa xfrm: Fix installation of AH IPsec SAs
The SPI check introduced in ea9884b3ac
was intended for IPComp SAs but actually prevented AH SAs from getting
installed (depending on the SPI).

Fixes: ea9884b3ac ("xfrm: check user specified spi for IPComp")
Cc: Fan Du <fan.du@windriver.com>
Signed-off-by: Tobias Brunner <tobias@strongswan.org>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
2014-06-30 07:42:12 +02:00
Ben Pfaff
ad55200734 openvswitch: Fix tracking of flags seen in TCP flows.
Flow statistics need to take into account the TCP flags from the packet
currently being processed (in 'key'), not the TCP flags matched by the
flow found in the kernel flow table (in 'flow').

This bug made the Open vSwitch userspace fin_timeout action have no effect
in many cases.
This bug is introduced by commit 88d73f6c41 (openvswitch: Use
TCP flags in the flow key for stats.)

Reported-by: Len Gao <leng@vmware.com>
Signed-off-by: Ben Pfaff <blp@nicira.com>
Acked-by: Jarno Rajahalme <jrajahalme@nicira.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:51 -07:00
Wei Zhang
e0bb8c44ed openvswitch: supply a dummy err_handler of gre_cisco_protocol to prevent kernel crash
When use gre vport, openvswitch register a gre_cisco_protocol but
does not supply a err_handler with it. The gre_cisco_err() in
net/ipv4/gre_demux.c expect err_handler be provided with the
gre_cisco_protocol implementation, and call ->err_handler() without
existence check, cause the kernel crash.

This patch provide a err_handler to fix this bug.
This bug introduced by commit aa310701e7 (openvswitch: Add gre
tunnel support.)

Signed-off-by: Wei Zhang <asuka.com@163.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:48 -07:00
Andy Zhou
fe984c08e2 openvswitch: Fix a double free bug for the sample action
When sample action returns with an error, the skb has already been
freed. This patch fix a bug to make sure we don't free it again.
This bug introduced by commit ccb1352e76 (net: Add Open vSwitch
kernel components.)

Signed-off-by: Andy Zhou <azhou@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
2014-06-29 14:10:43 -07:00
Fengguang Wu
5cbfda2043 netfilter: nft_log: fix coccinelle warnings
net/netfilter/nft_log.c:79:44-45: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci

CC: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-29 13:55:08 +02:00
Pablo Neira Ayuso
ca1aa54f27 netfilter: xt_LOG: add missing string format in nf_log_packet()
net/netfilter/xt_LOG.c: In function 'log_tg':
>> net/netfilter/xt_LOG.c:43: error: format not a string literal and no format arguments

Fixes: fab4085 ("netfilter: log: nf_log_packet() as real unified interface")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-28 18:50:35 +02:00
Pablo Neira Ayuso
c1878869c0 netfilter: fix several Kconfig problems in NF_LOG_*
warning: (NETFILTER_XT_TARGET_LOG) selects NF_LOG_IPV6 which has unmet direct dependencies (NET && INET && IPV6 && NETFILTER && IP6_NF_IPTABLES && NETFILTER_ADVANCED)
warning: (NF_LOG_IPV4 && NF_LOG_IPV6) selects NF_LOG_COMMON which has unmet direct dependencies (NET && INET && NETFILTER && NF_CONNTRACK)

Fixes: 83e96d4 ("netfilter: log: split family specific code to nf_log_{ip,ip6,common}.c files")
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2014-06-28 18:49:49 +02:00
Linus Torvalds
eb477e03fe Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time around.  The highlights include:

   - iscsi-target CHAP authentication fixes to enforce explicit key
     values (Tejas Vaykole + rahul.rane)
   - fix a long-standing OOPs in target-core when a alua configfs
     attribute is accessed after port symlink has been removed.
     (Sebastian Herbszt)
   - fix a v3.10.y iscsi-target regression causing the login reject
     status class/detail to be ignored (Christoph Vu-Brugier)
   - fix a v3.10.y iscsi-target regression to avoid rejecting an
     existing ITT during Data-Out when data-direction is wrong (Santosh
     Kulkarni + Arshad Hussain)
   - fix a iscsi-target related shutdown deadlock on UP kernels (Mikulas
     Patocka)
   - fix a v3.16-rc1 build issue with vhost-scsi + !CONFIG_NET (MST)"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  iscsi-target: fix iscsit_del_np deadlock on unload
  iovec: move memcpy_from/toiovecend to lib/iovec.c
  iscsi-target: Avoid rejecting incorrect ITT for Data-Out
  tcm_loop: Fix memory leak in tcm_loop_submission_work error path
  iscsi-target: Explicily clear login response PDU in exception path
  target: Fix left-over se_lun->lun_sep pointer OOPs
  iscsi-target; Enforce 1024 byte maximum for CHAP_C key value
  iscsi-target: Convert chap_server_compute_md5 to use kstrtoul
2014-06-28 09:43:58 -07:00
Octavian Purdila
1fb6f159fd tcp: add tcp_conn_request
Create tcp_conn_request and remove most of the code from
tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:37 -07:00
Octavian Purdila
695da14eb0 tcp: add queue_add_hash to tcp_request_sock_ops
Add queue_add_hash member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
2aec4a297b tcp: add mss_clamp to tcp_request_sock_ops
Add mss_clamp member to tcp_request_sock_ops so that we can later
unify tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
5db92c9949 tcp: unify tcp_v4_rtx_synack and tcp_v6_rtx_synack
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
d6274bd8d6 tcp: add send_synack method to tcp_request_sock_ops
Create a new tcp_request_sock_ops method to unify the IPv4/IPv6
signature for tcp_v[46]_send_synack. This allows us to later unify
tcp_v4_rtx_synack with tcp_v6_rtx_synack and tcp_v4_conn_request with
tcp_v4_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
936b8bdb53 tcp: add init_seq method to tcp_request_sock_ops
More work in preparation of unifying tcp_v4_conn_request and
tcp_v6_conn_request: indirect the init sequence calls via the
tcp_request_sock_ops.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
9403715977 tcp: move around a few calls in tcp_v6_conn_request
Make the tcp_v6_conn_request calls flow similar with that of
tcp_v4_conn_request.

Note that want_cookie can be true only if isn is zero and that is why
we can move the if (want_cookie) block out of the if (!isn) block.

Moving security_inet_conn_request() has a couple of side effects:
missing inet_rsk(req)->ecn_ok update and the req->cookie_ts
update. However, neither SELinux nor Smack security hooks seems to
check them. This change should also avoid future different behaviour
for IPv4 and IPv6 in the security hooks.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
d94e0417ad tcp: add route_req method to tcp_request_sock_ops
Create wrappers with same signature for the IPv4/IPv6 request routing
calls and use these wrappers (via route_req method from
tcp_request_sock_ops) in tcp_v4_conn_request and tcp_v6_conn_request
with the purpose of unifying the two functions in a later patch.

We can later drop the wrapper functions and modify inet_csk_route_req
and inet6_cks_route_req to use the same signature.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:36 -07:00
Octavian Purdila
fb7b37a7f3 tcp: add init_cookie_seq method to tcp_request_sock_ops
Move the specific IPv4/IPv6 cookie sequence initialization to a new
method in tcp_request_sock_ops in preparation for unifying
tcp_v4_conn_request and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
16bea70aa7 tcp: add init_req method to tcp_request_sock_ops
Move the specific IPv4/IPv6 intializations to a new method in
tcp_request_sock_ops in preparation for unifying tcp_v4_conn_request
and tcp_v6_conn_request.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
476eab8251 net: remove inet6_reqsk_alloc
Since pktops is only used for IPv6 only and opts is used for IPv4
only, we can move these fields into a union and this allows us to drop
the inet6_reqsk_alloc function as after this change it becomes
equivalent with inet_reqsk_alloc.

This patch also fixes a kmemcheck issue in the IPv6 stack: the flags
field was not annotated after a request_sock was allocated.

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
aa27fc5018 tcp: tcp_v[46]_conn_request: fix snt_synack initialization
Commit 016818d07 (tcp: TCP Fast Open Server - take SYNACK RTT after
completing 3WHS) changes the code to only take a snt_synack timestamp
when a SYNACK transmit or retransmit succeeds. This behaviour is later
broken by commit 843f4a55e (tcp: use tcp_v4_send_synack on first
SYN-ACK), as snt_synack is now updated even if tcp_v4_send_synack
fails.

Also, commit 3a19ce0ee (tcp: IPv6 support for fastopen server) misses
the required IPv6 updates for 016818d07.

This patch makes sure that snt_synack is updated only when the SYNACK
trasnmit/retransmit succeeds, for both IPv4 and IPv6.

Cc: Cardwell <ncardwell@google.com>
Cc: Daniel Lee <longinus00@gmail.com>
Cc: Yuchung Cheng <ycheng@google.com>

Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Octavian Purdila
57b47553f6 tcp: cookie_v4_init_sequence: skb should be const
Signed-off-by: Octavian Purdila <octavian.purdila@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 15:53:35 -07:00
Jon Paul Maloy
60120526c2 tipc: simplify connection congestion handling
As a consequence of the recently introduced serialized access
to the socket in commit 8d94168a761819d10252bab1f8de6d7b202c3baa
("tipc: same receive code path for connection protocol and data
messages") we can make a number of simplifications in the
detection and handling of connection congestion situations.

- We don't need to keep two counters, one for sent messages and one
  for acked messages. There is no longer any risk for races between
  acknowledge messages arriving in BH and data message sending
  running in user context. So we merge this into one counter,
  'sent_unacked', which is incremented at sending and subtracted
  from at acknowledge reception.

- We don't need to set the 'congested' field in tipc_port to
  true before we sent the message, and clear it when sending
  is successful. (As a matter of fact, it was never necessary;
  the field was set in link_schedule_port() before any wakeup
  could arrive anyway.)

- We keep the conditions for link congestion and connection connection
  congestion separated. There would otherwise be a risk that an arriving
  acknowledge message may wake up a user sleeping because of link
  congestion.

- We can simplify reception of acknowledge messages.

We also make some cosmetic/structural changes:

- We rename the 'congested' field to the more correct 'link_cong´.

- We rename 'conn_unacked' to 'rcv_unacked'

- We move the above mentioned fields from struct tipc_port to
  struct tipc_sock.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 12:50:56 -07:00
Jon Paul Maloy
ac0074ee70 tipc: clean up connection protocol reception function
We simplify the code for receiving connection probes, leveraging the
recently introduced tipc_msg_reverse() function. We also stick to
the principle of sending a possible response message directly from
the calling (tipc_sk_rcv or backlog_rcv) functions, hence making
the call chain shallower and easier to follow.

We make one small protocol change here, allowed according to
the spec. If a protocol message arrives from a remote socket that
is not the one we are connected to, we are currently generating a
connection abort message and send it to the source. This behavior
is unnecessary, and might even be a security risk, so instead we
now choose to only ignore the message. The consequnce for the sender
is that he will need longer time to discover his mistake (until the
next timeout), but this is an extreme corner case, and may happen
anyway under other circumstances, so we deem this change acceptable.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Erik Hugne <erik.hugne@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-06-27 12:50:56 -07:00