Commit graph

26122 commits

Author SHA1 Message Date
David S. Miller
b1afce9538 ipv6: Protect ->mc_forwarding access with CONFIG_IPV6_MROUTE
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 14:46:34 -05:00
Nicolas Dichtel
193c1e478c ip6mr: fix rtm_family of rtnl msg
We talk about IPv6, hence the family is RTNL_FAMILY_IP6MR!
rtnl_register() is already called with RTNL_FAMILY_IP6MR.

The bug is here since the beginning of this function (commit 5b285cac35).

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:27:24 -05:00
Serge Hallyn
4e66ae2ea3 net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD
When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
nothing to ns1.

This patch changes that behavior so that when moving a nic from ns1 to ns2, we
send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
sent to ns2).

The effects of this can be seen when starting and stopping containers in
an upstart based host.  Lxc will create a pair of veth nics, the kernel
sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
one nic is moved to the container, because no KOBJ_REMOVED event is
received, the network-instance job for that veth never goes away.  This
was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
With this patch the networ-instance jobs properly go away.

The other oddness solved here is that if a nic is passed into a running
upstart-based container, without this patch no network-instance job is
started in the container.  But when the container creates a new nic
itself (ip link add new type veth) then network-interface jobs are
created.  With this patch, behavior comes in line with a regular host.

v2: also send KOBJ_ADD to new netns.  There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:25:57 -05:00
Nicolas Dichtel
812e44dd18 ip6mr: advertise new mfc entries via rtnl
This patch allows to monitor mf6c activities via rtnetlink.
To avoid parsing two times the mf6c oifs, we use maxvif to allocate the rtnl
msg, thus we may allocate some superfluous space.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
8cd3ac9f9b ipmr: advertise new mfc entries via rtnl
This patch allows to monitor mfc activities via rtnetlink.
To avoid parsing two times the mfc oifs, we use maxvif to allocate the rtnl
msg, thus we may allocate some superfluous space.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
1eb99af52c ipmr/ip6mr: allow to get unresolved cache via netlink
/proc/net/ip[6]_mr_cache allows to get all mfc entries, even if they are put in
the unresolved list (mfc[6]_unres_queue). But only the table RT_TABLE_DEFAULT is
displayed.
This patch adds the parsing of the unresolved list when the dump is made via
rtnetlink, hence each table can be checked.

In IPv6, we set rtm_type in ip6mr_fill_mroute(), because in case of unresolved
mfc __ip6mr_fill_mroute() will not set it. In IPv4, it is already done.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
9a68ac72a4 ipmr/ip6mr: report origin of mfc entry into rtnl msg
A mfc entry can be static or not (added via the mroute_sk socket). The patch
reports MFC_STATIC flag into rtm_protocol by setting rtm_protocol to
RTPROT_STATIC or RTPROT_MROUTED.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:11 -05:00
Nicolas Dichtel
adfa85e45d ipmr/ip6mr: advertise mfc stats via rtnetlink
These statistics can be checked only via /proc/net/ip_mr_cache or
SIOCGETSGCNT[_IN6] and thus only for the table RT_TABLE_DEFAULT.
Advertising them via rtnetlink allows to get statistics for all cache entries,
whatever the table is.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
Nicolas Dichtel
70b386a0cc ip6mr: use nla_nest_* helpers
This patch removes the skb manipulations when nested attributes are added by
using standard helpers.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
Nicolas Dichtel
d67b8c616b netconf: advertise mc_forwarding status
This patch advertise the MC_FORWARDING status for IPv4 and IPv6.
This field is readonly, only multicast engine in the kernel updates it.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:08:10 -05:00
David S. Miller
e8ad1a8fab Merge branch 'master' of git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
* Remove limitation in the maximum number of supported sets in ipset.
  Now ipset automagically increments the number of slots in the array
  of sets by 64 new spare slots, from Jozsef Kadlecsik.

* Partially remove the generic queue infrastructure now that ip_queue
  is gone. Its only client is nfnetlink_queue now, from Florian
  Westphal.

* Add missing attribute policy checkings in ctnetlink, from Florian
  Westphal.

* Automagically kill conntrack entries that use the wrong output
  interface for the masquerading case in case of routing changes,
  from Jozsef Kadlecsik.

* Two patches two improve ct object traceability. Now ct objects are
  always placed in any of the existing lists. This allows us to dump
  the content of unconfirmed and dying conntracks via ctnetlink as
  a way to provide more instrumentation in case you suspect leaks,
  from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-04 13:01:19 -05:00
Simon Wunderlich
2f91a96799 mac80211: adapt slot time in IBSS mode
In 5GHz/802.11a, we are allowed to use short slot times. Doing this
may increases performance by 20% for legacy connections (54 MBit/s).
I can confirm this in my tests (27% more throughput using iperf), and
also have a small positive effect (5% more throughput) for HT rates,
tested on 1 stream.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Mathias Kretschmer <mathias.kretschmer@fokus.fraunhofer.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-04 15:54:43 +01:00
Paul Marks
a5a81f0b90 ipv6: Fix default route failover when CONFIG_IPV6_ROUTER_PREF=n
I believe this commit from 2008 was incorrect:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=398bcbebb6f721ac308df1e3d658c0029bb74503

When CONFIG_IPV6_ROUTER_PREF is disabled, the kernel should follow
RFC4861 section 6.3.6: if no route is NUD_VALID, then traffic should be
sprayed across all routers (indirectly triggering NUD) until one of them
becomes NUD_VALID.

However, the following experiment demonstrates that this does not work:

1) Connect to an IPv6 network.
2) Change the router's MAC (and link-local) address.

The kernel will lock onto the first router and never try the new one, even
if the first becomes unreachable.  This patch fixes the problem by
allowing rt6_check_neigh() to return 0; if all routers return 0, then
rt6_select() will fall back to round-robin behavior.

This patch should have no effect when CONFIG_IPV6_ROUTER_PREF=y.

Note that rt6_check_neigh() is only used in a boolean context, so I've
changed its return type accordingly.

Signed-off-by: Paul Marks <pmarks@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 15:34:47 -05:00
Shmulik Ladkani
9ba2add3cf ipv6: Make 'addrconf_rs_timer' send Router Solicitations (and re-arm itself) if Router Advertisements are accepted
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted],
Router Solicitations are sent whenever kernel accepts Router
Advertisements on the interface.

However, this logic isn't reflected in 'addrconf_rs_timer'.

The timer fails to issue subsequent RS messages (and fails to re-arm
itself) if forwarding is enabled and the special hybrid mode is
enabled (accept_ra=2).

Fix the condition determining whether next RS should be sent, by using
'ipv6_accept_ra()'.

Reported-by: Ami Koren <amikoren@yahoo.com>
Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:59:57 -05:00
John W. Linville
06ef5c4bbb Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next 2012-12-03 13:46:03 -05:00
Michele Baldessari
196d675934 sctp: Add support to per-association statistics via a new SCTP_GET_ASSOC_STATS call
The current SCTP stack is lacking a mechanism to have per association
statistics. This is an implementation modeled after OpenSolaris'
SCTP_GET_ASSOC_STATS.

Userspace part will follow on lksctp if/when there is a general ACK on
this.
V4:
- Move ipackets++ before q->immediate.func() for consistency reasons
- Move sctp_max_rto() at the end of sctp_transport_update_rto() to avoid
  returning bogus RTO values
- return asoc->rto_min when max_obs_rto value has not changed

V3:
- Increase ictrlchunks in sctp_assoc_bh_rcv() as well
- Move ipackets++ to sctp_inq_push()
- return 0 when no rto updates took place since the last call

V2:
- Implement partial retrieval of stat struct to cope for future expansion
- Kill the rtxpackets counter as it cannot be precise anyway
- Rename outseqtsns to outofseqtsns to make it clearer that these are out
  of sequence unexpected TSNs
- Move asoc->ipackets++ under a lock to avoid potential miscounts
- Fold asoc->opackets++ into the already existing asoc check
- Kill unneeded (q->asoc) test when increasing rtxchunks
- Do not count octrlchunks if sending failed (SCTP_XMIT_OK != 0)
- Don't count SHUTDOWNs as SACKs
- Move SCTP_GET_ASSOC_STATS to the private space API
- Adjust the len check in sctp_getsockopt_assoc_stats() to allow for
  future struct growth
- Move association statistics in their own struct
- Update idupchunks when we send a SACK with dup TSNs
- return min_rto in max_rto when RTO has not changed. Also return the
  transport when max_rto last changed.

Signed-off: Michele Baldessari <michele@acksyn.org>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-03 13:32:15 -05:00
Gustavo Padovan
0b27a4b97c Revert "Bluetooth: Fix possible deadlock in SCO code"
This reverts commit 269c4845d5.

The commit was causing dead locks and NULL dereferences in the sco code:

 [28084.104013] BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u:0H:7]
 [28084.104021] Modules linked in: btusb bluetooth <snip [last unloaded:
bluetooth]
...
 [28084.104021]  [<c160246d>] _raw_spin_lock+0xd/0x10
 [28084.104021]  [<f920e708>] sco_conn_del+0x58/0x1b0 [bluetooth]
 [28084.104021]  [<f920f1a9>] sco_connect_cfm+0xb9/0x2b0 [bluetooth]
 [28084.104021]  [<f91ef289>]
hci_sync_conn_complete_evt.isra.94+0x1c9/0x260 [bluetooth]
 [28084.104021]  [<f91f1a8d>] hci_event_packet+0x74d/0x2b40 [bluetooth]
 [28084.104021]  [<c1501abd>] ? __kfree_skb+0x3d/0x90
 [28084.104021]  [<c1501b46>] ? kfree_skb+0x36/0x90
 [28084.104021]  [<f91fcb4e>] ? hci_send_to_monitor+0x10e/0x190 [bluetooth]
 [28084.104021]  [<f91fcb4e>] ? hci_send_to_monitor+0x10e/0x190 [bluetooth]

Cc: stable@vger.kernel.org
Reported-by: Chan-yeol Park <chanyeol.park@gmail.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:04 -02:00
Andrei Emeltchenko
f2592d3ee3 Bluetooth: trivial: Change NO_FCS_RECV to RECV_NO_FCS
Make code more readable by changing CONF_NO_FCS_RECV which is read
as "No L2CAP FCS option received" to CONF_RECV_NO_FCS which means
"Received L2CAP option NO_FCS". This flag really means that we have
received L2CAP FRAME CHECK SEQUENCE (FCS) OPTION with value "No FCS".

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:01 -02:00
Andrei Emeltchenko
cbabee788f Bluetooth: Process receiving FCS_NONE in L2CAP Conf Rsp
Process L2CAP Config rsp Pending with FCS Option 0x00 (No FCS)
which is sent by Motorola Windows 7 Bluetooth stack. The trace
is shown below (all other options are skipped).

...
< ACL data: handle 1 flags 0x00 dlen 48
    L2CAP(s): Config req: dcid 0x0043 flags 0x00 clen 36
      ...
      FCS Option 0x00 (No FCS)
> ACL data: handle 1 flags 0x02 dlen 48
    L2CAP(s): Config req: dcid 0x0041 flags 0x00 clen 36
      ...
      FCS Option 0x01 (CRC16 Check)
< ACL data: handle 1 flags 0x00 dlen 47
    L2CAP(s): Config rsp: scid 0x0043 flags 0x00 result 4 clen 33
      Pending
      ...
> ACL data: handle 1 flags 0x02 dlen 50
    L2CAP(s): Config rsp: scid 0x0041 flags 0x00 result 4 clen 36
      Pending
      ...
      FCS Option 0x00 (No FCS)
< ACL data: handle 1 flags 0x00 dlen 14
    L2CAP(s): Config rsp: scid 0x0043 flags 0x00 result 0 clen 0
      Success
> ACL data: handle 1 flags 0x02 dlen 14
    L2CAP(s): Config rsp: scid 0x0041 flags 0x00 result 0 clen 0
      Success
...

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:01 -02:00
Andrei Emeltchenko
60918918a9 Bluetooth: Fix missing L2CAP EWS Conf parameter
If L2CAP_FEAT_FCS is not supported we sould miss EWS option
configuration because of break. Make code more readable by
combining FCS configuration in the single block.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:00 -02:00
Andrei Emeltchenko
5d05416e09 Bluetooth: AMP: Check that AMP is present and active
Before starting quering remote AMP controllers make sure
that there is local active AMP controller.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 16:00:00 -02:00
Andrei Emeltchenko
ced5c338d7 Bluetooth: AMP: Mark controller radio powered down after HCIDEVDOWN
After getting HCIDEVDOWN controller did not mark itself as 0x00 which
means: "The Controller radio is available but is currently physically
powered down". The result was even if the hdev was down we return
in controller list value 0x01 "status 0x01 (Bluetooth only)".

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:59 -02:00
Andrei Emeltchenko
5e4e3972b8 Bluetooth: Refactor l2cap_send_disconn_req
l2cap_send_disconn_req takes 3 parameters of which conn might be
derived from chan. Make this conversion inside l2cap_send_disconn_req.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:59 -02:00
Gustavo Padovan
ffa88e02bc Bluetooth: Move double negation to macros
Some comparisons needs to double negation(!!) in order to make the value
of the field boolean. Add it to the macro makes the code more readable.

Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:59 -02:00
Frédéric Dalleau
20714bfef8 Bluetooth: Implement deferred sco socket setup
In order to authenticate and configure an incoming SCO connection, the
BT_DEFER_SETUP option was added. This option is intended to defer reply
to Connect Request on SCO sockets.
When a connection is requested, the listening socket is unblocked but
the effective connection setup happens only on first recv. Any send
between accept and recv fails with -ENOTCONN.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:58 -02:00
Frédéric Dalleau
b96e9c671b Bluetooth: Add BT_DEFER_SETUP option to sco socket
This option will set the BT_SK_DEFER_SETUP bit in socket flags.

Signed-off-by: Frédéric Dalleau <frederic.dalleau@linux.intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:58 -02:00
Gustavo Padovan
b9b5ef188e Bluetooth: cancel power_on work when unregistering the device
We need to cancel the hci_power_on work in order to avoid it run when we
try to free the hdev.

[ 1434.201149] ------------[ cut here ]------------
[ 1434.204998] WARNING: at lib/debugobjects.c:261 debug_print_object+0x8e/0xb0()
[ 1434.208324] ODEBUG: free active (active state 0) object type: work_struct hint: hci
_power_on+0x0/0x90
[ 1434.210386] Pid: 8564, comm: trinity-child25 Tainted: G        W    3.7.0-rc5-next-
20121112-sasha-00018-g2f4ce0e #127
[ 1434.210760] Call Trace:
[ 1434.210760]  [<ffffffff819f3d6e>] ? debug_print_object+0x8e/0xb0
[ 1434.210760]  [<ffffffff8110b887>] warn_slowpath_common+0x87/0xb0
[ 1434.210760]  [<ffffffff8110b911>] warn_slowpath_fmt+0x41/0x50
[ 1434.210760]  [<ffffffff819f3d6e>] debug_print_object+0x8e/0xb0
[ 1434.210760]  [<ffffffff8376b750>] ? hci_dev_open+0x310/0x310
[ 1434.210760]  [<ffffffff83bf94e5>] ? _raw_spin_unlock_irqrestore+0x55/0xa0
[ 1434.210760]  [<ffffffff819f3ee5>] __debug_check_no_obj_freed+0xa5/0x230
[ 1434.210760]  [<ffffffff83785db0>] ? bt_host_release+0x10/0x20
[ 1434.210760]  [<ffffffff819f4d15>] debug_check_no_obj_freed+0x15/0x20
[ 1434.210760]  [<ffffffff8125eee7>] kfree+0x227/0x330
[ 1434.210760]  [<ffffffff83785db0>] bt_host_release+0x10/0x20
[ 1434.210760]  [<ffffffff81e539e5>] device_release+0x65/0xc0
[ 1434.210760]  [<ffffffff819d3975>] kobject_cleanup+0x145/0x190
[ 1434.210760]  [<ffffffff819d39cd>] kobject_release+0xd/0x10
[ 1434.210760]  [<ffffffff819d33cc>] kobject_put+0x4c/0x60
[ 1434.210760]  [<ffffffff81e548b2>] put_device+0x12/0x20
[ 1434.210760]  [<ffffffff8376a334>] hci_free_dev+0x24/0x30
[ 1434.210760]  [<ffffffff82fd8fe1>] vhci_release+0x31/0x60
[ 1434.210760]  [<ffffffff8127be12>] __fput+0x122/0x250
[ 1434.210760]  [<ffffffff811cab0d>] ? rcu_user_exit+0x9d/0xd0
[ 1434.210760]  [<ffffffff8127bf49>] ____fput+0x9/0x10
[ 1434.210760]  [<ffffffff81133402>] task_work_run+0xb2/0xf0
[ 1434.210760]  [<ffffffff8106cfa7>] do_notify_resume+0x77/0xa0
[ 1434.210760]  [<ffffffff83bfb0ea>] int_signal+0x12/0x17
[ 1434.210760] ---[ end trace a6d57fefbc8a8cc7 ]---

Cc: stable@vger.kernel.org
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:43 -02:00
Gustavo Padovan
dc2a0e20fb Bluetooth: Add missing lock nesting notation
This patch fixes the following report, it happens when accepting rfcomm
connections:

[  228.165378] =============================================
[  228.165378] [ INFO: possible recursive locking detected ]
[  228.165378] 3.7.0-rc1-00536-gc1d5dc4 #120 Tainted: G        W
[  228.165378] ---------------------------------------------
[  228.165378] bluetoothd/1341 is trying to acquire lock:
[  228.165378]  (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at:
[<ffffffffa0000aa0>] bt_accept_dequeue+0xa0/0x180 [bluetooth]
[  228.165378]
[  228.165378] but task is already holding lock:
[  228.165378]  (sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM){+.+...}, at:
[<ffffffffa0205118>] rfcomm_sock_accept+0x58/0x2d0 [rfcomm]
[  228.165378]
[  228.165378] other info that might help us debug this:
[  228.165378]  Possible unsafe locking scenario:
[  228.165378]
[  228.165378]        CPU0
[  228.165378]        ----
[  228.165378]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
[  228.165378]   lock(sk_lock-AF_BLUETOOTH-BTPROTO_RFCOMM);
[  228.165378]
[  228.165378]  *** DEADLOCK ***
[  228.165378]
[  228.165378]  May be due to missing lock nesting notation

Cc: stable@vger.kernel.org
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-12-03 15:59:10 -02:00
Jozsef Kadlecsik
a0ecb85a2c netfilter: nf_nat: Handle routing changes in MASQUERADE target
When the route changes (backup default route, VPNs) which affect a
masqueraded target, the packets were sent out with the outdated source
address. The patch addresses the issue by comparing the outgoing interface
directly with the masqueraded interface in the nat table.

Events are inefficient in this case, because it'd require adding route
events to the network core and then scanning the whole conntrack table
and re-checking the route for all entry.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:14:20 +01:00
Florian Westphal
6d1fafcaec netfilter: ctnetlink: nla_policy updates
Add stricter checking for a few attributes.
Note that these changes don't fix any bug in the current code base.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:13:10 +01:00
Florian Westphal
0360ae412d netfilter: kill support for per-af queue backends
We used to have several queueing backends, but nowadays only
nfnetlink_queue remains.

In light of this there doesn't seem to be a good reason to
support per-af registering -- just hook up nfnetlink_queue on module
load and remove it on unload.

This means that the userspace BIND/UNBIND_PF commands are now obsolete;
the kernel will ignore them.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:07:48 +01:00
Pablo Neira Ayuso
d871befe35 netfilter: ctnetlink: dump entries from the dying and unconfirmed lists
This patch adds a new operation to dump the content of the dying and
unconfirmed lists.

Under some situations, the global conntrack counter can be inconsistent
with the number of entries that we can dump from the conntrack table.
The way to resolve this is to allow dumping the content of the unconfirmed
and dying lists, so far it was not possible to look at its content.

This provides some extra instrumentation to resolve problematic situations
in which anyone suspects memory leaks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:06:52 +01:00
Pablo Neira Ayuso
04dac0111d netfilter: nf_conntrack: improve nf_conn object traceability
This patch modifies the conntrack subsystem so that all existing
allocated conntrack objects can be found in any of the following
places:

* the hash table, this is the typical place for alive conntrack objects.
* the unconfirmed list, this is the place for newly created conntrack objects
  that are still traversing the stack.
* the dying list, this is where you can find conntrack objects that are dying
  or that should die anytime soon (eg. once the destroy event is delivered to
  the conntrackd daemon).

Thus, we make sure that we follow the track for all existing conntrack
objects. This patch, together with some extension of the ctnetlink interface
to dump the content of the dying and unconfirmed lists, will help in case
to debug suspected nf_conn object leaks.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 15:06:33 +01:00
Jozsef Kadlecsik
9076aea765 netfilter: ipset: Increase the number of maximal sets automatically
The max number of sets was hardcoded at kernel cofiguration time and
could only be modified via a module parameter. The patch adds the support
of increasing the max number of sets automatically, as needed.

The array of sets is incremented by 64 new slots if we run out of
empty slots. The absolute limit for the maximal number of sets
is limited by 65534.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-12-03 14:36:08 +01:00
Marco Porsch
da29d2a578 cfg80211: fix channel error on mesh join
Fix an error on mesh join when no channel has been
explicitly set beforehand.

Also remove a double semicolon.

Signed-off-by: Marco Porsch <marco.porsch@etit.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-03 11:24:49 +01:00
Simon Wunderlich
246dc3fddf mac80211: return if CSA is not handle
If channel contexts are enabled, the CSA should not be processed
further. A return is missing here.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-12-03 11:21:40 +01:00
Willy Tarreau
02275a2ee7 tcp: don't abort splice() after small transfers
TCP coalescing added a regression in splice(socket->pipe) performance,
for some workloads because of the way tcp_read_sock() is implemented.

The reason for this is the break when (offset + 1 != skb->len).

As we released the socket lock, this condition is possible if TCP stack
added a fragment to the skb, which can happen with TCP coalescing.

So let's go back to the beginning of the loop when this happens,
to give a chance to splice more frags per system call.

Doing so fixes the issue and makes GRO 10% faster than LRO
on CPU-bound splice() workloads instead of the opposite.

Signed-off-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-02 20:23:01 -05:00
David S. Miller
ddb303301b Merge git://git.infradead.org/users/dwmw2/atm
David Woodhouse says:

====================
This is the result of pulling on the thread started by Krzysztof Mazur's
original patch 'pppoatm: don't send frames to destroyed vcc'.

Various problems in the pppoatm and br2684 code are solved, some of which
were easily triggered and would panic the kernel.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 20:45:24 -05:00
Eric Dumazet
64022d0b4e tcp: fix crashes in do_tcp_sendpages()
Recent network changes allowed high order pages being used
for skb fragments.

This uncovered a bug in do_tcp_sendpages() which was assuming its caller
provided an array of order-0 page pointers.

We only have to deal with a single page in this function, and its order
is irrelevant.

Reported-by: Willy Tarreau <w@1wt.eu>
Tested-by: Willy Tarreau <w@1wt.eu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 20:39:16 -05:00
David Woodhouse
5b4d72080f pppoatm: optimise PPP channel wakeups after sock_owned_by_user()
We don't need to schedule the wakeup tasklet on *every* unlock; only if we
actually blocked the channel in the first place.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:20 +00:00
Krzysztof Mazur
9eba25268e br2684: allow assign only on a connected socket
The br2684 does not check if used vcc is in connected state,
causing potential Oops in pppoatm_send() when vcc->send() is called
on not fully connected socket.

Now br2684 can be assigned only on connected sockets; otherwise
-EINVAL error is returned.

Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
2012-12-02 00:05:19 +00:00
David Woodhouse
d71ffeb123 br2684: fix module_put() race
The br2684 code used module_put() during unassignment from vcc with
hope that we have BKL. This assumption is no longer true.

Now owner field in atmvcc is used to move this module_put()
to vcc_destroy_socket().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:16 +00:00
David Woodhouse
0e56d99a5b pppoatm: fix missing wakeup in pppoatm_send()
Now that we can return zero from pppoatm_send() for reasons *other* than
the queue being full, that means we can't depend on a subsequent call to
pppoatm_pop() waking the queue, and we might leave it stalled
indefinitely.

Use the ->release_cb() callback to wake the queue after the sock is
unlocked.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:15 +00:00
David Woodhouse
b89588531f br2684: don't send frames on not-ready vcc
Avoid submitting packets to a vcc which is being closed. Things go badly
wrong when the ->pop method gets later called after everything's been
torn down.

Use the ATM socket lock for synchronisation with vcc_destroy_socket(),
which clears the ATM_VF_READY bit under the same lock. Otherwise, we
could end up submitting a packet to the device driver even after its
->ops->close method has been called. And it could call the vcc's ->pop
method after the protocol has been shut down. Which leads to a panic.

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:14 +00:00
David Woodhouse
c971f08cba atm: add release_cb() callback to vcc
The immediate use case for this is that it will allow us to ensure that a
pppoatm queue is woken after it has to drop a packet due to the sock being
locked.

Note that 'release_cb' is called when the socket is *unlocked*. This is
not to be confused with vcc_release() — which probably ought to be called
vcc_close().

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Acked-by: Krzysztof Mazur <krzysiek@podlesie.net>
2012-12-02 00:05:12 +00:00
Shmulik Ladkani
aeaf6e9d2f ipv6: unify logic evaluating inet6_dev's accept_ra property
As of 026359b [ipv6: Send ICMPv6 RSes only when RAs are accepted], the
logic determining whether to send Router Solicitations is identical
to the logic determining whether kernel accepts Router Advertisements.

However the condition itself is repeated in several code locations.

Unify it by introducing 'ipv6_accept_ra()' accessor.

Also, simplify the condition expression, making it more readable.
No semantic change.

Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 11:36:37 -05:00
Eric Dumazet
fd90b29d75 tcp: change default tcp hash size
As time passed, available memory increased faster than number of
concurrent tcp sockets.

As a result, a machine with 4GB of ram gets a hash table
with 524288 slots, using 8388608 bytes of memory.

Lets change that by a 16x factor (one slot for 128 KB of ram)

Even if a small machine needs a _lot_ of sockets, tcp lookups are now
very efficient, using one cache line per socket.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-12-01 11:36:37 -05:00
Eric Dumazet
ce43b03e88 net: move inet_dport/inet_num in sock_common
commit 68835aba4d (net: optimize INET input path further)
moved some fields used for tcp/udp sockets lookup in the first cache
line of struct sock_common.

This patch moves inet_dport/inet_num as well, filling a 32bit hole
on 64 bit arches and reducing number of cache line misses in lookups.

Also change INET_MATCH()/INET_TW_MATCH() to perform the ports match
before addresses match, as this check is more discriminant.

Remove the hash check from MATCH() macros because we dont need to
re validate the hash value after taking a refcount on socket, and
use likely/unlikely compiler hints, as the sk_hash/hash check
makes the following conditional tests 100% predicted by cpu.

Introduce skc_addrpair/skc_portpair pair values to better
document the alignment requirements of the port/addr pairs
used in the various MATCH() macros, and remove some casts.

The namespace check can also be done at last.

This slightly improves TCP/UDP lookup times.

IP/TCP early demux needs inet->rx_dst_ifindex and
TCP needs inet->min_ttl, lets group them together in same cache line.

With help from Ben Hutchings & Joe Perches.

Idea of this patch came after Ling Ma proposal to move skc_hash
to the beginning of struct sock_common, and should allow him
to submit a final version of his patch. My tests show an improvement
doing so.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Joe Perches <joe@perches.com>
Cc: Ling Ma <ling.ma.program@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 15:02:56 -05:00
Thomas Graf
06a31e2b91 sctp: verify length provided in heartbeat information parameter
If the variable parameter length provided in the mandatory
heartbeat information parameter exceeds the calculated payload
length the packet has been corrupted. Reply with a parameter
length protocol violation message.

Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:25:52 -05:00
Rami Rosen
c07135633b rtnelink: remove unused parameter from rtnl_create_link().
This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa5e by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-11-30 12:24:40 -05:00