Commit graph

550127 commits

Author SHA1 Message Date
David S. Miller
721daebbdb Merge branch 'bpf-perf'
Alexei Starovoitov says:

====================
bpf_perf_event_output helper

Over the last year there were multiple attempts to let eBPF programs
output data into perf events by He Kuang and Wangnan.
The last one was:
https://lkml.org/lkml/2015/7/20/736
It was almost perfect with exception that all bpf programs would sent
data into one global perf_event.
This patch set takes different approach by letting user space
open independent PERF_COUNT_SW_BPF_OUTPUT events, so that program
output won't collide.

Wangnan is working on corresponding perf patches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:42:23 -07:00
Alexei Starovoitov
39111695b1 samples: bpf: add bpf_perf_event_output example
Performance test and example of bpf_perf_event_output().
kprobe is attached to sys_write() and trivial bpf program streams
pid+cookie into userspace via PERF_COUNT_SW_BPF_OUTPUT event.

Usage:
$ sudo ./bld_x64/samples/bpf/trace_output
recv 2968913 events per sec

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:42:15 -07:00
Alexei Starovoitov
a43eec3042 bpf: introduce bpf_perf_event_output() helper
This helper is used to send raw data from eBPF program into
special PERF_TYPE_SOFTWARE/PERF_COUNT_SW_BPF_OUTPUT perf_event.
User space needs to perf_event_open() it (either for one or all cpus) and
store FD into perf_event_array (similar to bpf_perf_event_read() helper)
before eBPF program can send data into it.

Today the programs triggered by kprobe collect the data and either store
it into the maps or print it via bpf_trace_printk() where latter is the debug
facility and not suitable to stream the data. This new helper replaces
such bpf_trace_printk() usage and allows programs to have dedicated
channel into user space for post-processing of the raw data collected.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:42:15 -07:00
Alexei Starovoitov
fa128e6a14 perf: pad raw data samples automatically
Instead of WARN_ON in perf_event_output() on unpaded raw samples,
pad them automatically.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:42:13 -07:00
Brenden Blanco
63b11e757d ipvlan: read direct ifindex instead of iflink
In the ipv4 outbound path of an ipvlan device in l3 mode, the ifindex is
being grabbed from dev_get_iflink. This works for the physical device
case, since as the documentation of that function notes: "Physical
interfaces have the same 'ifindex' and 'iflink' values.".  However, if
the master device is a veth, and the pairs are in separate net
namespaces, the route lookup will fail with -ENODEV due to outer veth
pair being in a separate namespace from the ipvlan master/routing
namespace.

  ns0    |   ns1    |   ns2
 veth0a--|--veth0b--|--ipvl0

In ipvlan_process_v4_outbound(), a packet sent from ipvl0 in the above
configuration will pass fl.flowi4_oif == veth0a to
ip_route_output_flow(), but *net == ns1.

Notice also that ipv6 processing is not using iflink. Since there is a
discrepancy in usage, fixup both v4 and v6 case to use local dev
variable.

Tested this with l3 ipvlan on top of veth, as well as with single
physical interface in the top namespace.

Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:39:08 -07:00
Andrew F. Davis
34e45ad937 net: phy: dp83848: Add TI DP83848 Ethernet PHY
Add support for the TI DP83848 Ethernet PHY device.

The DP83848 is a highly reliable, feature rich, IEEE 802.3 compliant
single port 10/100 Mb/s Ethernet Physical Layer Transceiver supporting
the MII and RMII interfaces.

Signed-off-by: Andrew F. Davis <afd@ti.com>
Signed-off-by: Dan Murphy <dmurphy@ti.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:37:19 -07:00
Eric Dumazet
dbf650b67b tcp: fastopen: limit max_qlen
Allowing an application to set whatever limit for
the list of recently RST fastopen sessions [1] is not wise,
as it open ways to deplete kernel memory.

Cap the user provided limit by somaxconn sysctl,
like listen() backlog.

[1] https://tools.ietf.org/html/rfc7413#section-5.1

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-22 06:22:13 -07:00
Hezi Shahmoon
0729a04977 i2c: mv64xxx: really allow I2C offloading
Commit 00d8689b85 ("i2c: mv64xxx: rework offload support to fix
several problems") completely reworked the offload support, but left a
debugging-related "return false" at the beginning of the
mv64xxx_i2c_can_offload() function. This has the unfortunate consequence
that offloading is in fact never used, which wasn't really the
intention.

This commit fixes that problem by removing the bogus "return false".

Fixes: 00d8689b85 ("i2c: mv64xxx: rework offload support to fix several problems")
Signed-off-by: Hezi Shahmoon <hezi@marvell.com>
[Thomas: reworked commit log and title.]
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
Cc: <stable@vger.kernel.org>
2015-10-22 14:47:45 +02:00
Jiada Wang
4eb0f7abce ASoC: wm8962: mark cache_dirty flag after software reset in pm_resume
By doing software reset of wm8962 in pm_resume, all registers which
have already been set will be reset to default value without regmap
interface be involved, thus driver need to mark cache_dirty flag,
to let regcache can be updated by regcache_sync().

Signed-off-by: Jiada Wang <jiada_wang@mentor.com>
Acked-by: Charles Keepax <ckeepax@opensource.wolfsonmicro.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
2015-10-22 13:30:51 +01:00
Marcel Holtmann
13972adc32 Bluetooth: Increase minor version of core module
With the addition of support for diagnostic feature, it makes sense to
increase the minor version of the Bluetooth core module.

The module version is not used anywhere, but it gives a nice extra
hint for debugging purposes.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
2015-10-22 13:37:26 +03:00
Alexander Aring
aeedebff69 ieee802154: 6lowpan: fix memory leak
Looking at current situation of memory management in 6lowpan receive
function I detected some invalid handling. After calling
lowpan_invoke_rx_handlers we will do a kfree_skb and then NET_RX_DROP on
error handling. We don't do this before, also on
skb_share_check/skb_unshare which might manipulate the reference
counters.

After running some 'grep -r "dev_add_pack" net/' to look how others
packet-layer receive callbacks works I detected that every subsystem do
a kfree_skb, then NET_RX_DROP without calling skb functions which
might manipulate the skb reference counters. This is the reason why we
should do the same here like all others subsystems. I didn't find any
documentation how the packet-layer receive callbacks handle NET_RX_DROP
return values either.

This patch will add a kfree_skb, then NET_RX_DROP handling for the
"trivial checks", in case of skb_share_check/skb_unshare the kfree_skb
call will be done inside these functions.

Signed-off-by: Alexander Aring <alex.aring@gmail.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 12:24:42 +02:00
Johan Hedberg
88d07feb09 Bluetooth: Make hci_disconnect() behave correctly for all states
There are a few places that don't explicitly check the connection
state before calling hci_disconnect(). To make this API do the right
thing take advantage of the new hci_abort_conn() API and also make
sure to only read the clock offset if we're really connected.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 11:37:22 +02:00
Johan Hedberg
89e0ccc882 Bluetooth: Take advantage of connection abort helpers
Convert the various places mapping connection state to
disconnect/cancel HCI command to use the new hci_abort_conn helper
API.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 11:37:22 +02:00
Johan Hedberg
dcc0f0d9ce Bluetooth: Introduce hci_req helper to abort a connection
There are several different places needing to make sure that a
connection gets disconnected or canceled. The exact action needed
depends on the connection state, so centralizing this logic can save
quite a lot of code duplication.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 11:37:22 +02:00
Dan Carpenter
a1857390e2 Bluetooth: hci_bcm: checking for ERR_PTR instead of NULL
bt_skb_alloc() returns NULL on error, it never returns an ERR_PTR.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 11:32:47 +02:00
Johan Hedberg
c81d555a26 Bluetooth: Fix crash in SMP when unpairing
When unpairing the keys stored in hci_dev are removed. If SMP is
ongoing the SMP context will also have references to these keys, so
removing them from the hci_dev lists will make the pointers invalid.
This can result in the following type of crashes:

 BUG: unable to handle kernel paging request at 6b6b6b6b
 IP: [<c11f26be>] __list_del_entry+0x44/0x71
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
 Modules linked in: hci_uart btqca btusb btintel btbcm btrtl hci_vhci rfcomm bluetooth_6lowpan bluetooth
 CPU: 0 PID: 723 Comm: kworker/u5:0 Not tainted 4.3.0-rc3+ #1379
 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014
 Workqueue: hci0 hci_rx_work [bluetooth]
 task: f19da940 ti: f1a94000 task.ti: f1a94000
 EIP: 0060:[<c11f26be>] EFLAGS: 00010202 CPU: 0
 EIP is at __list_del_entry+0x44/0x71
 EAX: c0088d20 EBX: f30fcac0 ECX: 6b6b6b6b EDX: 6b6b6b6b
 ESI: f4b60000 EDI: c0088d20 EBP: f1a95d90 ESP: f1a95d8c
  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
 CR0: 8005003b CR2: 6b6b6b6b CR3: 319e5000 CR4: 00000690
 Stack:
  f30fcac0 f1a95db0 f82dc3e1 f1bfc000 00000000 c106524f f1bfc000 f30fd020
  f1a95dc0 f1a95dd0 f82dcbdb f1a95de0 f82dcbdb 00000067 f1bfc000 f30fd020
  f1a95de0 f1a95df0 f82d1126 00000067 f82d1126 00000006 f30fd020 f1bfc000
 Call Trace:
  [<f82dc3e1>] smp_chan_destroy+0x192/0x240 [bluetooth]
  [<c106524f>] ? trace_hardirqs_on_caller+0x14e/0x169
  [<f82dcbdb>] smp_teardown_cb+0x47/0x64 [bluetooth]
  [<f82dcbdb>] ? smp_teardown_cb+0x47/0x64 [bluetooth]
  [<f82d1126>] l2cap_chan_del+0x5d/0x14d [bluetooth]
  [<f82d1126>] ? l2cap_chan_del+0x5d/0x14d [bluetooth]
  [<f82d40ef>] l2cap_conn_del+0x109/0x17b [bluetooth]
  [<f82d40ef>] ? l2cap_conn_del+0x109/0x17b [bluetooth]
  [<f82c0205>] ? hci_event_packet+0x5b1/0x2092 [bluetooth]
  [<f82d41aa>] l2cap_disconn_cfm+0x49/0x50 [bluetooth]
  [<f82d41aa>] ? l2cap_disconn_cfm+0x49/0x50 [bluetooth]
  [<f82c0228>] hci_event_packet+0x5d4/0x2092 [bluetooth]
  [<c1332c16>] ? skb_release_data+0x6a/0x95
  [<f82ce5d4>] ? hci_send_to_monitor+0xe7/0xf4 [bluetooth]
  [<c1409708>] ? _raw_spin_unlock_irqrestore+0x44/0x57
  [<f82b3bb0>] hci_rx_work+0xf1/0x28b [bluetooth]
  [<f82b3bb0>] ? hci_rx_work+0xf1/0x28b [bluetooth]
  [<c10635a0>] ? __lock_is_held+0x2e/0x44
  [<c104772e>] process_one_work+0x232/0x432
  [<c1071ddc>] ? rcu_read_lock_sched_held+0x50/0x5a
  [<c104772e>] ? process_one_work+0x232/0x432
  [<c1047d48>] worker_thread+0x1b8/0x255
  [<c1047b90>] ? rescuer_thread+0x23c/0x23c
  [<c104bb71>] kthread+0x91/0x96
  [<c14096a7>] ? _raw_spin_unlock_irq+0x27/0x44
  [<c1409d61>] ret_from_kernel_thread+0x21/0x30
  [<c104bae0>] ? kthread_parkme+0x1e/0x1e

To solve the issue, introduce a new smp_cancel_pairing() API that can
be used to clean up the SMP state before touching the hci_dev lists.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 09:02:03 +02:00
Johan Hedberg
fc64361ac1 Bluetooth: Disable auto-connection parameters when unpairing
For connection parameters that are left around until a disconnection
we should at least clear any auto-connection properties. This way a
new Add Device call is required to re-set them after calling Unpair
Device.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-22 09:02:03 +02:00
Vivien Didelot
e2aacd963a net: mdio-gpio: move platform data header
This header file only contains the platform data structure definition,
so move it to the include/linux/platform_data/ directory.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:50:44 -07:00
Vivien Didelot
844338e5a4 ARM: gemini: remove unnecessary mdio-gpio includes
Remove the inclusion of linux/mdio-gpio.h in nas4220b, wbd111 and wbd222
boards since mdio-gpio is not used.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:50:43 -07:00
Hans de Goede
104eb270e6 net: sun4i-emac: Properly free resources on probe failure and remove
Fix sun4i-emac not releasing the following resources:
-iomapped memory not released on probe-failure nor on remove
-clock not getting disabled on probe-failure nor on remove
-sram not being released on remove

And while at it also add error checking to the clk_prepare_enable call
done on probe.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:47:45 -07:00
Wu Fengguang
c6aa74d546 net: hisilicon: fix ptr_ret.cocci warnings
drivers/net/ethernet/hisilicon/hns/hnae.c:442:1-3: WARNING: PTR_ERR_OR_ZERO can be used

 Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR

Generated by: scripts/coccinelle/api/ptr_ret.cocci

CC: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:38:26 -07:00
Eric Dumazet
feec0cb3f2 ipv6: gro: support sit protocol
Tom Herbert added SIT support to GRO with commit
19424e052f ("sit: Add gro callbacks to sit_offload"),
later reverted by Herbert Xu.

The problem came because Tom patch was building GRO
packets without proper meta data : If packets were locally
delivered, we would not care.

But if packets needed to be forwarded, GSO engine was not
able to segment individual segments.

With the following patch, we correctly set skb->encapsulation
and inner network header. We also update gso_type.

Tested:

Server :
netserver
modprobe dummy
ifconfig dummy0 8.0.0.1 netmask 255.255.255.0 up
arp -s 8.0.0.100 4e:32:51:04:47:e5
iptables -I INPUT -s 10.246.7.151 -j TEE --gateway 8.0.0.100
ifconfig sixtofour0
sixtofour0 Link encap:IPv6-in-IPv4
          inet6 addr: 2002:af6:798::1/128 Scope:Global
          inet6 addr: 2002:af6:798::/128 Scope:Global
          UP RUNNING NOARP  MTU:1480  Metric:1
          RX packets:411169 errors:0 dropped:0 overruns:0 frame:0
          TX packets:409414 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:20319631739 (20.3 GB)  TX bytes:29529556 (29.5 MB)

Client :
netperf -H 2002:af6:798::1 -l 1000 &

Checked on server traffic copied on dummy0 and verify segments were
properly rebuilt, with proper IP headers, TCP checksums...

tcpdump on eth0 shows proper GRO aggregation takes place.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:36:11 -07:00
Eric Dumazet
8f3af27786 net: dummy: add more features
While testing my SIT/GRO patch using netfilter TEE module and a dummy
device, I found some features were missing :

TSO IPv6, UFO, and encapsulated traffic.

ethtool -k dummy0 now gives :
...
tcp-segmentation-offload: on
	tx-tcp-segmentation: on
	tx-tcp-ecn-segmentation: on
	tx-tcp6-segmentation: on
udp-fragmentation-offload: on
...
tx-gre-segmentation: on
tx-ipip-segmentation: on
tx-sit-segmentation: on
tx-udp_tnl-segmentation: on

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:36:10 -07:00
Joe Stringer
e754ec69ab openvswitch: Serialize nested ct actions if provided
If userspace provides a ct action with no nested mark or label, then the
storage for these fields is zeroed. Later when actions are requested,
such zeroed fields are serialized even though userspace didn't
originally specify them. Fix the behaviour by ensuring that no action is
serialized in this case, and reject actions where userspace attempts to
set these fields with mask=0. This should make netlink marshalling
consistent across deserialization/reserialization.

Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:43 -07:00
Joe Stringer
4f0909ee3d openvswitch: Mark connections new when not confirmed.
New, related connections are marked as such as part of ovs_ct_lookup(),
but they are not marked as "new" if the commit flag is used. Make this
consistent by setting the "new" flag whenever !nf_ct_is_confirmed(ct).

Reported-by: Jarno Rajahalme <jrajahalme@nicira.com>
Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:40 -07:00
Joe Stringer
1d008a1df9 openvswitch: Clarify conntrack COMMIT behaviour
The presence of this attribute does not modify the ct_state for the
current packet, only future packets. Make this more clear in the header
definition.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:38 -07:00
Joe Stringer
9e384715e9 openvswitch: Reject ct_state masks for unknown bits
Currently, 0-bits are generated in ct_state where the bit position is
undefined, and matches are accepted on these bit-positions. If userspace
requests to match the 0-value for this bit then it may expect only a
subset of traffic to match this value, whereas currently all packets
will have this bit set to 0. Fix this by rejecting such masks.

Signed-off-by: Joe Stringer <joestringer@nicira.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:33:36 -07:00
Renato Westphal
e2e8009ff7 tcp: remove improper preemption check in tcp_xmit_probe_skb()
Commit e520af48c7 introduced the following bug when setting the
TCP_REPAIR sockoption:

[ 2860.657036] BUG: using __this_cpu_add() in preemptible [00000000] code: daemon/12164
[ 2860.657045] caller is __this_cpu_preempt_check+0x13/0x20
[ 2860.657049] CPU: 1 PID: 12164 Comm: daemon Not tainted 4.2.3 #1
[ 2860.657051] Hardware name: Dell Inc. PowerEdge R210 II/0JP7TR, BIOS 2.0.5 03/13/2012
[ 2860.657054]  ffffffff81c7f071 ffff880231e9fdf8 ffffffff8185d765 0000000000000002
[ 2860.657058]  0000000000000001 ffff880231e9fe28 ffffffff8146ed91 ffff880231e9fe18
[ 2860.657062]  ffffffff81cd1a5d ffff88023534f200 ffff8800b9811000 ffff880231e9fe38
[ 2860.657065] Call Trace:
[ 2860.657072]  [<ffffffff8185d765>] dump_stack+0x4f/0x7b
[ 2860.657075]  [<ffffffff8146ed91>] check_preemption_disabled+0xe1/0xf0
[ 2860.657078]  [<ffffffff8146edd3>] __this_cpu_preempt_check+0x13/0x20
[ 2860.657082]  [<ffffffff817e0bc7>] tcp_xmit_probe_skb+0xc7/0x100
[ 2860.657085]  [<ffffffff817e1e2d>] tcp_send_window_probe+0x2d/0x30
[ 2860.657089]  [<ffffffff817d1d8c>] do_tcp_setsockopt.isra.29+0x74c/0x830
[ 2860.657093]  [<ffffffff817d1e9c>] tcp_setsockopt+0x2c/0x30
[ 2860.657097]  [<ffffffff81767b74>] sock_common_setsockopt+0x14/0x20
[ 2860.657100]  [<ffffffff817669e1>] SyS_setsockopt+0x71/0xc0
[ 2860.657104]  [<ffffffff81865172>] entry_SYSCALL_64_fastpath+0x16/0x75

Since tcp_xmit_probe_skb() can be called from process context, use
NET_INC_STATS() instead of NET_INC_STATS_BH().

Fixes: e520af48c7 ("tcp: add TCPWinProbe and TCPKeepAlive SNMP counters")
Signed-off-by: Renato Westphal <renatow@taghos.com.br>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:29:26 -07:00
David S. Miller
36a28b2116 Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf
Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains four Netfilter fixes for net, they are:

1) Fix Kconfig dependencies of new nf_dup_ipv4 and nf_dup_ipv6.

2) Remove bogus test nh_scope in IPv4 rpfilter match that is breaking
   --accept-local, from Xin Long.

3) Wait for RCU grace period after dropping the pending packets in the
   nfqueue, from Florian Westphal.

4) Fix sleeping allocation while holding spin_lock_bh, from Nikolay Borisov.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:26:17 -07:00
Arad, Ronen
b1974ed05e netlink: Rightsize IFLA_AF_SPEC size calculation
if_nlmsg_size() overestimates the minimum allocation size of netlink
dump request (when called from rtnl_calcit()) or the size of the
message (when called from rtnl_getlink()). This is because
ext_filter_mask is not supported by rtnl_link_get_af_size() and
rtnl_link_get_size().

The over-estimation is significant when at least one netdev has many
VLANs configured (8 bytes for each configured VLAN).

This patch-set "rightsizes" the protocol specific attribute size
calculation by propagating ext_filter_mask to rtnl_link_get_af_size()
and adding this a argument to get_link_af_size op in rtnl_af_ops.

Bridge module already used filtering aware sizing for notifications.
br_get_link_af_size_filtered() is consistent with the modified
get_link_af_size op so it replaces br_get_link_af_size() in br_af_ops.
br_get_link_af_size() becomes unused and thus removed.

Signed-off-by: Ronen Arad <ronen.arad@intel.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:15:20 -07:00
Jon Paul Maloy
e53567948f tipc: conditionally expand buffer headroom over udp tunnel
In commit d999297c3d ("tipc: reduce locking scope during packet reception")
we altered the packet retransmission function. Since then, when
restransmitting packets, we create a clone of the original buffer
using __pskb_copy(skb, MIN_H_SIZE), where MIN_H_SIZE is the size of
the area we want to have copied, but also the smallest possible TIPC
packet size. The value of MIN_H_SIZE is 24.

Unfortunately, __pskb_copy() also has the effect that the headroom
of the cloned buffer takes the size MIN_H_SIZE. This is too small
for carrying the packet over the UDP tunnel bearer, which requires
a minimum headroom of 28 bytes. A change to just use pskb_copy()
lets the clone inherit the original headroom of 80 bytes, but also
assumes that the copied data area is of at least that size, something
that is not always the case. So that is not a viable solution.

We now fix this by adding a check for sufficient headroom in the
transmit function of udp_media.c, and expanding it when necessary.

Fixes: commit d999297c3d ("tipc: reduce locking scope during packet reception")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:13:48 -07:00
Andreas Schwab
7a4264a925 net: cavium: change NET_VENDOR_CAVIUM to bool
CONFIG_NET_VENDOR_CAVIUM is only used to hide/show config options and to
include subdirectories in the build, so it doesn't make sense to make it
tristate.

Signed-off-by: Andreas Schwab <schwab@suse.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:12:16 -07:00
Jon Paul Maloy
45c8b7b175 tipc: allow non-linear first fragment buffer
The current code for message reassembly is erroneously assuming that
the the first arriving fragment buffer always is linear, and then goes
ahead resetting the fragment list of that buffer in anticipation of
more arriving fragments.

However, if the buffer already happens to be non-linear, we will
inadvertently drop the already attached fragment list, and later
on trig a BUG() in __pskb_pull_tail().

We see this happen when running fragmented TIPC multicast across UDP,
something made possible since
commit d0f91938be ("tipc: add ip/udp media type")

We fix this by not resetting the fragment list when the buffer is non-
linear, and by initiatlizing our private fragment list tail pointer to
the tail of the existing fragment list.

Fixes: commit d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:08:24 -07:00
James Morse
1241365f1a openvswitch: Allocate memory for ovs internal device stats.
"openvswitch: Remove vport stats" removed the per-vport statistics, in
order to use the netdev's statistics fields.
"openvswitch: Fix ovs_vport_get_stats()" fixed the export of these stats
to user-space, by using the provided netdev_ops to collate them - but ovs
internal devices still use an unallocated dev->tstats field to count
packets, which are no longer exported by this api.

Allocate the dev->tstats field for ovs internal devices, and wire up
ndo_get_stats64 with the original implementation of
ovs_vport_get_stats().

On its own, "openvswitch: Fix ovs_vport_get_stats()" fixes the OOPs,
unmasking a full-on panic on arm64:

=============%<==============
[<ffffffbffc00ce4c>] internal_dev_recv+0xa8/0x170 [openvswitch]
[<ffffffbffc0008b4>] do_output.isra.31+0x60/0x19c [openvswitch]
[<ffffffbffc000bf8>] do_execute_actions+0x208/0x11c0 [openvswitch]
[<ffffffbffc001c78>] ovs_execute_actions+0xc8/0x238 [openvswitch]
[<ffffffbffc003dfc>] ovs_packet_cmd_execute+0x21c/0x288 [openvswitch]
[<ffffffc0005e8c5c>] genl_family_rcv_msg+0x1b0/0x310
[<ffffffc0005e8e60>] genl_rcv_msg+0xa4/0xe4
[<ffffffc0005e7ddc>] netlink_rcv_skb+0xb0/0xdc
[<ffffffc0005e8a94>] genl_rcv+0x38/0x50
[<ffffffc0005e76c0>] netlink_unicast+0x164/0x210
[<ffffffc0005e7b70>] netlink_sendmsg+0x304/0x368
[<ffffffc0005a21c0>] sock_sendmsg+0x30/0x4c
[SNIP]
Kernel panic - not syncing: Fatal exception in interrupt
=============%<==============

Fixes: 8c876639c9 ("openvswitch: Remove vport stats.")
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:06:36 -07:00
David Ahern
f1900fb5ec net: Really fix vti6 with oif in dst lookups
6e28b00082 ("net: Fix vti use case with oif in dst lookups for IPv6")
is missing the checks on FLOWI_FLAG_SKIP_NH_OIF. Add them.

Fixes: 42a7b32b73 ("xfrm: Add oif to dst lookups")
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Acked-by: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:04:54 -07:00
Jon Paul Maloy
53387c4e22 tipc: extend broadcast link window size
The default fix broadcast window size is currently set to 20 packets.
This is a very low value, set at a time when we were still testing on
10 Mb/s hubs, and a change to it is long overdue.

Commit 7845989cb4 ("net: tipc: fix stall during bclink wakeup procedure")
revealed a problem with this low value. For messages of importance LOW,
the backlog queue limit will be  calculated to 30 packets, while a
single, maximum sized message of 66000 bytes, carried across a 1500 MTU
network consists of 46 packets.

This leads to the following scenario (among others leading to the same
situation):

1: Msg 1 of 46 packets is sent. 20 packets go to the transmit queue, 26
   packets to the backlog queue.
2: Msg 2 of 46 packets is attempted sent, but rejected because there is
   no more space in the backlog queue at this level. The sender is added
   to the wakeup queue with a "pending packets chain size" number of 46.
3: Some packets in the transmit queue are acked and released. We try to
   wake up the sender, but the pending size of 46 is bigger than the LOW
   wakeup limit of 30, so this doesn't happen.
5: Subsequent acks releases all the remaining buffers. Each time we test
   for the wakeup criteria and find that 46 still is larger than 30,
   even after both the transmit and the backlog queues are empty.
6: The sender is never woken up and given a chance to send its message.
   He is stuck.

We could now loosen the wakeup criteria (used by link_prepare_wakeup())
to become equal to the send criteria (used by tipc_link_xmit()), i.e.,
by ignoring the "pending packets chain size" value altogether, or we can
just increase the queue limits so that the criteria can be satisfied
anyway. There are good reasons (potentially multiple waiting senders) to
not opt for the former solution, so we choose the latter one.

This commit fixes the problem by giving the broadcast link window a
default value of 50 packets. We also introduce a new minimum link
window size BCLINK_MIN_WIN of 32, which is enough to always avoid the
described situation. Finally, in order to not break any existing users
which may set the window explicitly, we enforce that the window is set
to the new minimum value in case the user is trying to set it to
anything lower.

Fixes: 7845989cb4 ("net: tipc: fix stall during bclink wakeup procedure")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Reviewed-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-21 19:02:17 -07:00
Christian Engelmayer
0f89abf56a btrfs: fix possible leak in btrfs_ioctl_balance()
Commit 8eb934591f ("btrfs: check unsupported filters in balance
arguments") adds a jump to exit label out_bargs in case the argument
check fails. At this point in addition to the bargs memory, the
memory for struct btrfs_balance_control has already been allocated.
Ownership of bctl is passed to btrfs_balance() in the good case,
thus the memory is not freed due to the introduced jump. Make sure
that the memory gets freed in any case as necessary. Detected by
Coverity CID 1328378.

Signed-off-by: Christian Engelmayer <cengelma@gmx.at>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
2015-10-21 18:10:02 -07:00
Dave Airlie
c50f13f911 Merge branch 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a crash fix for radeon and amdgpu if the user has forcibly disabled
dpm and tries to access the pwm sysfs controls.

* 'drm-fixes-4.3' of git://people.freedesktop.org/~agd5f/linux:
  drm/amdgpu: add missing dpm check for KV dpm late init
  drm/amdgpu/dpm: don't add pwm attributes if DPM is disabled
  drm/radeon/dpm: don't add pwm attributes if DPM is disabled
2015-10-22 10:24:55 +10:00
Dave Airlie
c2a75586ff Merge tag 'drm-intel-fixes-2015-10-16' of git://anongit.freedesktop.org/drm-intel into drm-fixes
The revert dance could use some explanation: we had stuff fixed in
-next, and initially backported one commit to v4.3. Now, turns out we
need more fixes, and we could cherry-pick them all without conflicts if
we reverted the backported one first. So did that to not have to edit
and backport them all.

* tag 'drm-intel-fixes-2015-10-16' of git://anongit.freedesktop.org/drm-intel:
  drm/i915: Add primary plane to mask if it's visible
  drm/i915: Move sprite/cursor plane disable to intel_sanitize_crtc()
  drm/i915: Assign hwmode after encoder state readout
  Revert "drm/i915: Add primary plane to mask if it's visible"
  drm/i915: Deny wrapping an userptr into a framebuffer
  drm/i915: Enable DPLL VGA mode before P1/P2 divider write
  drm/i915: Restore lost DPLL register write on gen2-4
  drm/i915: Flush pipecontrol post-sync writes
  drm/i915: Fix kerneldoc for i915_gem_shrink_all
2015-10-22 10:24:21 +10:00
Vasant Hegde
8832317f66 powerpc/rtas: Validate rtas.entry before calling enter_rtas()
Currently we do not validate rtas.entry before calling enter_rtas(). This
leads to a kernel oops when user space calls rtas system call on a powernv
platform (see below). This patch adds code to validate rtas.entry before
making enter_rtas() call.

  Oops: Exception in kernel mode, sig: 4 [#1]
  SMP NR_CPUS=1024 NUMA PowerNV
  task: c000000004294b80 ti: c0000007e1a78000 task.ti: c0000007e1a78000
  NIP: 0000000000000000 LR: 0000000000009c14 CTR: c000000000423140
  REGS: c0000007e1a7b920 TRAP: 0e40   Not tainted  (3.18.17-340.el7_1.pkvm3_1_0.2400.1.ppc64le)
  MSR: 1000000000081000 <HV,ME>  CR: 00000000  XER: 00000000
  CFAR: c000000000009c0c SOFTE: 0
  NIP [0000000000000000]           (null)
  LR [0000000000009c14] 0x9c14
  Call Trace:
  [c0000007e1a7bba0] [c00000000041a7f4] avc_has_perm_noaudit+0x54/0x110 (unreliable)
  [c0000007e1a7bd80] [c00000000002ddc0] ppc_rtas+0x150/0x2d0
  [c0000007e1a7be30] [c000000000009358] syscall_exit+0x0/0x98

Cc: stable@vger.kernel.org # v3.2+
Fixes: 55190f8878 ("powerpc: Add skeleton PowerNV platform")
Reported-by: NAGESWARA R. SASTRY <nasastry@in.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
[mpe: Reword change log, trim oops, and add stable + fixes]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-10-22 11:03:25 +11:00
Dave Airlie
37363bc03e Merge branch 'linux-4.3' of git://anongit.freedesktop.org/nouveau/linux-2.6 into drm-fixes
Just one fix from Ilia to resolve various issues that have resulted from
buffer eviction.

* 'linux-4.3' of git://anongit.freedesktop.org/nouveau/linux-2.6:
  drm/nouveau/gem: return only valid domain when there's only one
2015-10-22 09:15:10 +10:00
Ilia Mirkin
2a6c521bb4 drm/nouveau/gem: return only valid domain when there's only one
On nv50+, we restrict the valid domains to just the one where the buffer
was originally created. However after the buffer is evicted to system
memory, we might move it back to a different domain that was not
originally valid. When sharing the buffer and retrieving its GEM_INFO
data, we still want the domain that will be valid for this buffer in a
pushbuf, not the one where it currently happens to be.

This resolves fdo#92504 and several others. These are due to suspend
evicting all buffers, making it more likely that they temporarily end up
in the wrong place.

Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92504
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2015-10-22 09:10:52 +10:00
Adam Richter
30730c7f59 drm: fix mutex leak in drm_dp_get_mst_branch_device
In Linux 4.3-rc5, there is an error case in drm_dp_get_branch_device
that returns without releasing mgr->lock, resulting a spew of kernel
messages about a kernel work function possibly having leaked a mutex
and presumably more serious adverse consequences later.  This patch
changes the error to "goto out" to unlock the mutex before returning.

[airlied: grabbed from drm-next as it fixes something we've seen]

Signed-off-by: Adam J. Richter <adam_richter2004@yahoo.com>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2015-10-22 08:29:08 +10:00
Linus Torvalds
8a70dd2669 Merge tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu
Pull intel-iommu bugfix from David Woodhouse:
 "This contains a single fix, for when the IOMMU API is used to overlay
  an existing mapping comprised of 4KiB pages, with a mapping that can
  use superpages.

  For the *first* superpage in the new mapping, we were correctly¹
  freeing the old bottom-level page table page and clearing the link to
  it, before installing the superpage.  For subsequent superpages,
  however, we weren't.  This causes a memory leak, and a warning about
  setting a PTE which is already set.

  ¹ Well, not *entirely* correctly.  We just free the page table pages
    right there and then, which is wrong.  In fact they should only be
    freed *after* the IOTLB is flushed so we know the hardware will no
    longer be looking at them....  and in fact I note that the IOTLB
    flush is completely missing from the intel_iommu_map() code path,
    although it needs to be there if it's permitted to overwrite
    existing mappings.

    Fixing those is somewhat more intrusive though, and will probably
    need to wait for 4.4 at this point"

* tag 'for-linus-20151021' of git://git.infradead.org/intel-iommu:
  iommu/vt-d: fix range computation when making room for large pages
2015-10-22 06:32:48 +09:00
Linus Torvalds
7f67786330 MMC core:
- Don't re-tune in the reset sequence to allow re-init of the card
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWJ028AAoJEP4mhCVzWIwpmFEQAKvyEdBCdIzf8U5jAnqmUBp2
 tF/6qKpHcyKcKopHH+fsx2mdZoCwq5my+UNnY6PsSbCLpNT7m7ea/cD/I1yWRWR0
 qXPqbZddM71xyJ0VLBbGubpgqbR9ofPa5Bg7V6JT74P/5EXORSt1wmLcy0QLeI4y
 WQlmdL9xzEk5MKHyAXFURiyexXZSmRyw8YUV6vOESwv1/O18sTeKo3NerQx9urc+
 DqyID9+uPvqesQ09IcboaZT5KtTpGAh0O52EiKl90Fv/4yLuXD3dV6DhsI4B1use
 jVnCFrCOdXT7cpugOGht7aDnP8Sga1k4jTBiWS97CusMykwc7kIF8Yty3LlKr510
 tglarnSvmhYSrY0uR1YGWugfwgg+8V5SrolZa+zMg+hHwSIvWLybnJkUtdaNSB1r
 k0Q97k0mKUT7hTAhd60U7XTMOrumb2w1VdY0/BRGtm8J6XwEhnjF+j8h7CRqCzIn
 HJuvZ7AhfUjr93ASMfzebNaSEFIt8I0G0fIv+YP0nu5WnkySfgC7Q4LZzwy/MC4/
 V9rqdq0iObHZmIyYEydgudmdXYkR5lSp1SenVQ3kBT/xE1Ygcj5ujdfEHxpVoz0C
 owqjaKL5zXOOVB7/bhKnyAXTu7f/lr/QPo0z8zoVVQqxEl+KP+5vyP1K/sN12zrj
 vz9c1MU7k7d7nSqlcOV5
 =LHs/
 -----END PGP SIGNATURE-----

Merge tag 'mmc-v4.3-rc5' of git://git.linaro.org/people/ulf.hansson/mmc

Pull MMC bugfix from Ulf Hansson:
 "Here's yet another MMC fix intended for v4.3 rc7.  I don't expect to
  send any further pull requests for 4.3 rc[n].

  MMC core:
   - Don't re-tune in the reset sequence to allow re-init of the card"

* tag 'mmc-v4.3-rc5' of git://git.linaro.org/people/ulf.hansson/mmc:
  mmc: core: Fix init_card in 52Mhz
2015-10-22 06:31:27 +09:00
Doron Tsur
0ca81a2840 IB/cm: Fix rb-tree duplicate free and use-after-free
ib_send_cm_sidr_rep could sometimes erase the node from the sidr
(depending on errors in the process). Since ib_send_cm_sidr_rep is
called both from cm_sidr_req_handler and cm_destroy_id, cm_id_priv
could be either erased from the rb_tree twice or not erased at all.
Fixing that by making sure it's erased only once before freeing
cm_id_priv.

Fixes: a977049dac ('[PATCH] IB: Add the kernel CM implementation')
Signed-off-by: Doron Tsur <doront@mellanox.com>
Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2015-10-21 15:43:12 -04:00
Thomas Hellstrom
09dc1387c9 drm/vmwgfx: Stabilize the command buffer submission code
This commit addresses some stability problems with the command buffer
submission code recently introduced:

1) Make the vmw_cmdbuf_man_process() function handle reruns internally to
avoid losing interrupts if the caller forgets to rerun on -EAGAIN.
2) Handle default command buffer allocations using inline command buffers.
This avoids rare allocation deadlocks.
3) In case of command buffer errors we might lose fence submissions.
Therefore send a new fence after each command buffer error. This will help
avoid lengthy fence waits.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Sinclair Yeh <syeh@vmware.com>
2015-10-21 21:31:49 +02:00
Johan Hedberg
17bc08f0d1 Bluetooth: Remove unnecessary hci_explicit_connect_lookup function
There's only one user of this helper which can be replaces with a call
to hci_pend_le_action_lookup() and a check for params->explicit_connect.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 18:58:23 +02:00
Johan Hedberg
1ede9868f6 Bluetooth: Remove redundant (and possibly wrong) flag clearing
There's no need to clear the HCI_CONN_ENCRYPT_PEND flag in
smp_failure. In fact, this may cause the encryption tracking to get
out of sync as this has nothing to do with HCI activity.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 18:57:03 +02:00
Johan Hedberg
b5c2b6214c Bluetooth: Add hdev helper variable to hci_le_create_connection_cancel
The hci_le_create_connection_cancel() function needs to use the hdev
pointer in many places so add a variable for it to avoid the need to
dereference the hci_conn every time.

Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
2015-10-21 18:45:43 +02:00