Commit graph

15636 commits

Author SHA1 Message Date
Vlad Yasevich
d9efc2231b sctp: Do not force T3 timer on fast retransmissions.
We don't need to force the T3 timer any more and it's
actually wrong to do as it causes too long of a delay.
The timer will be started if one is not running, but if
one is running, we leave it alone.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
ae19c54866 sctp: remove 'resent' bit from the chunk
The 'resent' bit is used to make sure that we don't update
rto estimate based on retransmitted chunks.  However, we already
have the 'rto_pending' bit that we test when need to update rto,
so 'resent' bit is just extra.  Additionally, we currently have
a bug in that we always set a 'resent' bit and thus rto estimate
is only updated by Heartbeats.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
d598b166ce sctp: Make sure we always return valid retransmit path
commit 4951feda0c60d1ef681f1a270afdd617924ab041
    sctp: Do no select unconfirmed transports for retransmissions

added code to make sure that we do not select unconfirmed paths
for data transmission.  This caused a problem when there are only
2 paths, 1 unconfirmed and 1 unreachable.  In that case, the next
retransmit path returned is NULL and that causes a kernel crash.

The solution is to only change retransmit paths if we found one to use.

Reported-by: Frank Schuster <frank.schuster01@web.de>
Signed-off-b: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Dan Carpenter
b99a4d53a7 sctp: cleanup: remove duplicate assignment
This assignment isn't needed because we did it earlier already.

Also another reason to delete the assignment is because it triggers a
Smatch warning about checking for NULL pointers after a dereference.

Reported-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Wei Yongjun
787a51a087 sctp: implement sctp association probing module
This patch implement sctp association probing module, the module
will be called sctp_probe.

This module allows for capturing the changes to SCTP association
state in response to incoming packets. It is used for debugging
SCTP congestion control algorithms.

Usage:
  $ modprobe sctp_probe [full=n] [port=n] [bufsize=n]
  $ cat /proc/net/sctpprobe

  The output format is:
    TIME     ASSOC     LPORT RPORT MTU    RWND  UNACK <REMOTE-ADDR   STATE  CWND   SSTHRESH  INFLIGHT  PARTIAL_BYTES_ACKED MTU> ...

  The output will be like this:
    9.226086 c4064c48  9000  8000  1500    53352     1 *192.168.0.19  1     4380    54784     1252        0     1500
    9.287195 c4064c48  9000  8000  1500    45144     5 *192.168.0.19  1     5880    54784     6500        0     1500
    9.289130 c4064c48  9000  8000  1500    42724     5 *192.168.0.19  1     7380    54784     6500        0     1500
    9.620332 c4064c48  9000  8000  1500    48284     4 *192.168.0.19  1     8880    54784     5200        0     1500
    ......

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Shan Wei
ec7b951950 sctp: use sctp_chunk_is_data macro to decide a chunk is data chunk
sctp_chunk_is_data macro is defined to decide that
whether a chunk is data chunk or not.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:41:09 -04:00
Vlad Yasevich
fbdf501c93 sctp: Do no select unconfirmed transports for retransmissions
An unconfirmed transport is one that we have not been
able to reach since the beginning.  There is no point in
trying to retrasnmit data on those transports.  Also, the
specification forbids it due to security issues.

Reported-by: Frank Schuster <frank.schuster01@web.de>

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:39:26 -04:00
Wei Yongjun
bc4f841a05 sctp: fix to retranmit at least one DATA chunk
While doing retranmit, if control chunk exists, such as
FORWARD TSN chunk, and the DATA chunk can not be bundled with
this control chunk because of PMTU limit, no DATA chunk
will be retranmitted in the current implementation. This
patch makes sure to retranmit at least one DATA chunk in this case.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 22:38:53 -04:00
Wei Yongjun
6429d3dc4b sctp: missing set src and dest port while lookup output route
While lookup the output route, we do not set the src and dest
port. This will cause we got a wrong route if we had set the
outbund transport to IPsec with src or dst port.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 21:42:44 -04:00
Wei Yongjun
bd69b981a3 sctp: assure at least one T3-rtx timer is running if a FORWARD TSN is sent
PR-SCTP extension section 3.5 Sender Side Implementation of PR-SCTP:
  C5) If a FORWARD TSN is sent, the sender MUST assure that at
      least one T3-rtx timer is running.

So this patch fix to assure at least one T3-rtx timer is running
if a FORWARD TSN is or will to sent.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 21:42:43 -04:00
Vlad Yasevich
c17b02b38a sctp: send SHUTDOWN-ACK chunk back to the source.
SHUTDOWN-ACK is alaways sent to the primary path at the first time,
but should better transmit SHUTDOWN-ACK chunk to the same destination
transport address from which it received the SHUTDOWN chunk.
Based on the work from Wei Yongjun <yjwei@cn.fujitsu.com>.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 21:42:43 -04:00
Vlad Yasevich
a5f4cea74f sctp: Use correct address family in sctp_getsockopt_peer_addrs()
The function should use the address family of the address when
trying to determine the length of the structure.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
2010-04-30 21:42:42 -04:00
Dan Carpenter
83d7eb2979 ipv6: cleanup: remove unneeded null check
We dereference "sk" unconditionally elsewhere in the function.  

This was left over from:  b30bd282 "ip6_xmit: remove unnecessary NULL
ptr check".  According to that commit message, "the sk argument to 
ip6_xmit is never NULL nowadays since the skb->priority assigment 
expects a valid socket."

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-30 16:42:08 -07:00
Changli Gao
4b021628be xfrm: potential uninitialized variable num_xfrms
potential uninitialized variable num_xfrms

fix compiler warning: 'num_xfrms' may be used uninitialized in this function.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
----
 net/xfrm/xfrm_policy.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-30 16:40:05 -07:00
Eric Dumazet
767dd03369 net: speedup sock_recv_ts_and_drops()
sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some
profiles)

We can test all socket flags at once to make fast path fast again.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-30 16:29:42 -07:00
Reinette Chatre
ad41ee3a45 Merge branch 'wireless-2.6' into wireless-next-2.6
Patch "iwlwifi: work around passive scan issue" was merged into
wireless-2.6, but touched a lot of code since modified (and moved)
in wireless-next-2.6. This caused some conflicts.

Conflicts:
	drivers/net/wireless/iwlwifi/iwl-scan.c

Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
2010-04-30 15:13:00 -07:00
John W. Linville
f5c044e53a mac80211: remove deprecated noise field from ieee80211_rx_status
Also remove associated IEEE80211_HW_NOISE_DBM from ieee80211_hw_flags.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-30 15:38:13 -04:00
Johannes Berg
49b5c7f473 mac80211: tell driver about IBSS merge
My previous patch "mac80211: notify driver about
IBSS status" left a problem -- when we merge with
a new BSSID, we never tell the driver that we left
the old one. Fix that.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-30 14:59:15 -04:00
Linus Torvalds
27fb8d7b1f Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
  nfs: fix some issues in nfs41_proc_reclaim_complete()
  NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
  NFS: Fix an unstable write data integrity race
  nfs: testing for null instead of ERR_PTR()
  NFS: rsize and wsize settings ignored on v4 mounts
  NFSv4: Don't attempt an atomic open if the file is a mountpoint
  SUNRPC: Fix a bug in rpcauth_prune_expired
2010-04-29 10:23:44 -07:00
Eric Dumazet
f84af32cbc net: ip_queue_rcv_skb() helper
When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 15:31:51 -07:00
Eric Dumazet
4b0b72f7dd net: speedup udp receive path
Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 14:35:48 -07:00
Neil Horman
5fa782c2f5 sctp: Fix skb_over_panic resulting from multiple invalid parameter errors (CVE-2010-1173) (v4)
Ok, version 4

Change Notes:
1) Minor cleanups, from Vlads notes

Summary:

Hey-
	Recently, it was reported to me that the kernel could oops in the
following way:

<5> kernel BUG at net/core/skbuff.c:91!
<5> invalid operand: 0000 [#1]
<5> Modules linked in: sctp netconsole nls_utf8 autofs4 sunrpc iptable_filter
ip_tables cpufreq_powersave parport_pc lp parport vmblock(U) vsock(U) vmci(U)
vmxnet(U) vmmemctl(U) vmhgfs(U) acpiphp dm_mirror dm_mod button battery ac md5
ipv6 uhci_hcd ehci_hcd snd_ens1371 snd_rawmidi snd_seq_device snd_pcm_oss
snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_ac97_codec snd soundcore
pcnet32 mii floppy ext3 jbd ata_piix libata mptscsih mptsas mptspi mptscsi
mptbase sd_mod scsi_mod
<5> CPU:    0
<5> EIP:    0060:[<c02bff27>]    Not tainted VLI
<5> EFLAGS: 00010216   (2.6.9-89.0.25.EL)
<5> EIP is at skb_over_panic+0x1f/0x2d
<5> eax: 0000002c   ebx: c033f461   ecx: c0357d96   edx: c040fd44
<5> esi: c033f461   edi: df653280   ebp: 00000000   esp: c040fd40
<5> ds: 007b   es: 007b   ss: 0068
<5> Process swapper (pid: 0, threadinfo=c040f000 task=c0370be0)
<5> Stack: c0357d96 e0c29478 00000084 00000004 c033f461 df653280 d7883180
e0c2947d
<5>        00000000 00000080 df653490 00000004 de4f1ac0 de4f1ac0 00000004
df653490
<5>        00000001 e0c2877a 08000800 de4f1ac0 df653490 00000000 e0c29d2e
00000004
<5> Call Trace:
<5>  [<e0c29478>] sctp_addto_chunk+0xb0/0x128 [sctp]
<5>  [<e0c2947d>] sctp_addto_chunk+0xb5/0x128 [sctp]
<5>  [<e0c2877a>] sctp_init_cause+0x3f/0x47 [sctp]
<5>  [<e0c29d2e>] sctp_process_unk_param+0xac/0xb8 [sctp]
<5>  [<e0c29e90>] sctp_verify_init+0xcc/0x134 [sctp]
<5>  [<e0c20322>] sctp_sf_do_5_1B_init+0x83/0x28e [sctp]
<5>  [<e0c25333>] sctp_do_sm+0x41/0x77 [sctp]
<5>  [<c01555a4>] cache_grow+0x140/0x233
<5>  [<e0c26ba1>] sctp_endpoint_bh_rcv+0xc5/0x108 [sctp]
<5>  [<e0c2b863>] sctp_inq_push+0xe/0x10 [sctp]
<5>  [<e0c34600>] sctp_rcv+0x454/0x509 [sctp]
<5>  [<e084e017>] ipt_hook+0x17/0x1c [iptable_filter]
<5>  [<c02d005e>] nf_iterate+0x40/0x81
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e0c7f>] ip_local_deliver_finish+0xc6/0x151
<5>  [<c02d0362>] nf_hook_slow+0x83/0xb5
<5>  [<c02e0bb2>] ip_local_deliver+0x1a2/0x1a9
<5>  [<c02e0bb9>] ip_local_deliver_finish+0x0/0x151
<5>  [<c02e103e>] ip_rcv+0x334/0x3b4
<5>  [<c02c66fd>] netif_receive_skb+0x320/0x35b
<5>  [<e0a0928b>] init_stall_timer+0x67/0x6a [uhci_hcd]
<5>  [<c02c67a4>] process_backlog+0x6c/0xd9
<5>  [<c02c690f>] net_rx_action+0xfe/0x1f8
<5>  [<c012a7b1>] __do_softirq+0x35/0x79
<5>  [<c0107efb>] handle_IRQ_event+0x0/0x4f
<5>  [<c01094de>] do_softirq+0x46/0x4d

Its an skb_over_panic BUG halt that results from processing an init chunk in
which too many of its variable length parameters are in some way malformed.

The problem is in sctp_process_unk_param:
if (NULL == *errp)
	*errp = sctp_make_op_error_space(asoc, chunk,
					 ntohs(chunk->chunk_hdr->length));

	if (*errp) {
		sctp_init_cause(*errp, SCTP_ERROR_UNKNOWN_PARAM,
				 WORD_ROUND(ntohs(param.p->length)));
		sctp_addto_chunk(*errp,
			WORD_ROUND(ntohs(param.p->length)),
				  param.v);

When we allocate an error chunk, we assume that the worst case scenario requires
that we have chunk_hdr->length data allocated, which would be correct nominally,
given that we call sctp_addto_chunk for the violating parameter.  Unfortunately,
we also, in sctp_init_cause insert a sctp_errhdr_t structure into the error
chunk, so the worst case situation in which all parameters are in violation
requires chunk_hdr->length+(sizeof(sctp_errhdr_t)*param_count) bytes of data.

The result of this error is that a deliberately malformed packet sent to a
listening host can cause a remote DOS, described in CVE-2010-1173:
http://cve.mitre.org/cgi-bin/cvename.cgi?name=2010-1173

I've tested the below fix and confirmed that it fixes the issue.  We move to a
strategy whereby we allocate a fixed size error chunk and ignore errors we don't
have space to report.  Tested by me successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 14:22:01 -07:00
Johannes Berg
8fc214ba95 mac80211: notify driver about IBSS status
Some drivers (e.g. iwlwifi) need to know and try
to figure it out based on other things, but making
it explicit is definitely better.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-28 16:50:29 -04:00
Stanislaw Gruszka
76f2736401 mac80211: fix supported rates IE if AP doesn't give us it's rates
If AP do not provide us supported rates before assiociation, send
all rates we are supporting instead of empty information element.

v1 -> v2: Add comment.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-28 16:50:29 -04:00
Stanislaw Gruszka
f0b058b617 mac80211: do not wip out old supported rates
Use old supported rates, if AP do not provide supported rates
information element in a new managment frame.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-28 16:50:28 -04:00
Sjur Braendeland
2c485209a5 Bugfix: Link selection was swapped in switch.
Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:15 -07:00
Sjur Braendeland
8391c4aab1 caif: Bugfixes in CAIF netdevice for close and flow control
Changes:
o Bugfix: Flow control was causing the device to be destroyed.
o Bugfix: Handle CAIF channel connect failures.
o If the underlying link layer is gone the net-device is no longer removed,
  but closed.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:14 -07:00
Sjur Braendeland
bece7b2398 caif: Rewritten socket implementation
Changes:
 This is a complete re-write of the socket layer. Making the socket
 implementation more aligned with the other socket layers and using more
 of the support functions available in sock.c. Lots of code is copied
 from af_unix (and some from af_irda).
 Non-blocking mode should be working as well.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:14 -07:00
Sjur Braendeland
8d545c8f95 caif: Disconnect without waiting for response
Changes:
o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
  disconnect, not waiting for any response from the modem. Due to this
  the function cfcnfg_linkdestroy_rsp does nothing anymore.
o Because disconnect may take down a connection before a connect response
  is received the function cfcnfg_linkup_rsp is checking if the client is
  still waiting for the response, if not a disconnect request is sent to
  the modem.
o cfctrl is no longer keeping track of pending disconnect requests.
o Added function cfctrl_cancel_req, which is used for deleting a pending
  connect request if disconnect is done before connect response is received.
o Removed unused function cfctrl_insert_req2
o Added better handling of connect reject from modem.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:13 -07:00
Sjur Braendeland
5b20865675 caif: Add reference counting to service layer
Changes:
o Added functions cfsrvl_get and cfsrvl_put.
o Added support release_client to use by socket and net device.
o Increase reference counting for in-flight packets from cfmuxl

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:12 -07:00
Sjur Braendeland
e539d83cc8 caif: Rename functions in cfcnfg and caif_dev
Changes:
 o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
 o Fixed typo cfcfg to cfcnfg
 o Renamed linkid to channel_id
 o Updated documentation in caif_dev.h
 o Minor formatting changes

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:55:11 -07:00
Vlad Yasevich
c078669340 sctp: Fix oops when sending queued ASCONF chunks
When we finish processing ASCONF_ACK chunk, we try to send
the next queued ASCONF.  This action runs the sctp state
machine recursively and it's not prepared to do so.

kernel BUG at kernel/timer.c:790!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/module/ipv6/initstate
Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
EIP: 0060:[<c044a2ef>] EFLAGS: 00010286 CPU: 0
EIP is at add_timer+0xd/0x1b
EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
Stack:
 c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
<0> 00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
00000004
<0> c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
000000d0
Call Trace:
 [<d1851214>] ? sctp_side_effects+0x607/0xdfc [sctp]
 [<d1851b11>] ? sctp_do_sm+0x108/0x159 [sctp]
 [<d1863386>] ? sctp_pname+0x0/0x1d [sctp]
 [<d1861a56>] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
 [<d185657c>] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
 [<d184e35c>] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
 [<d1851ac1>] ? sctp_do_sm+0xb8/0x159 [sctp]
 [<d1863334>] ? sctp_cname+0x0/0x52 [sctp]
 [<d1854377>] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
 [<d1858f0f>] ? sctp_inq_push+0x2d/0x30 [sctp]
 [<d186329d>] ? sctp_rcv+0x797/0x82e [sctp]

Tested-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Yuansong Qiao <ysqiao@research.ait.ie>
Signed-off-by: Shuaijun Zhang <szhang@research.ait.ie>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:34 -07:00
Wei Yongjun
a8170c35e7 sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
When calculating the INIT/INIT-ACK chunk length, we should not
only account the length of parameters, but also the parameters
zero padding length, such as AUTH HMACS parameter and CHUNKS
parameter. Without the parameters zero padding length we may get
following oops.

skb_over_panic: text:ce2068d2 len:130 put:6 head:cac3fe00 data:cac3fe00 tail:0xcac3fe82 end:0xcac3fe80 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:127!
invalid opcode: 0000 [#2] SMP
last sysfs file: /sys/module/aes_generic/initstate
Modules linked in: authenc ......

Pid: 4102, comm: sctp_darn Tainted: G      D    2.6.34-rc2 #6
EIP: 0060:[<c0607630>] EFLAGS: 00010282 CPU: 0
EIP is at skb_over_panic+0x37/0x3e
EAX: 00000078 EBX: c07c024b ECX: c07c02b9 EDX: cb607b78
ESI: 00000000 EDI: cac3fe7a EBP: 00000002 ESP: cb607b74
 DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Process sctp_darn (pid: 4102, ti=cb607000 task=cabdc990 task.ti=cb607000)
Stack:
 c07c02b9 ce2068d2 00000082 00000006 cac3fe00 cac3fe00 cac3fe82 cac3fe80
<0> c07c024b cac3fe7c cac3fe7a c0608dec ca986e80 ce2068d2 00000006 0000007a
<0> cb8120ca ca986e80 cb812000 00000003 cb8120c4 ce208a25 cb8120ca cadd9400
Call Trace:
 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
 [<c0608dec>] ? skb_put+0x2e/0x32
 [<ce2068d2>] ? sctp_addto_chunk+0x45/0x85 [sctp]
 [<ce208a25>] ? sctp_make_init+0x279/0x28c [sctp]
 [<c0686a92>] ? apic_timer_interrupt+0x2a/0x30
 [<ce1fdc0b>] ? sctp_sf_do_prm_asoc+0x2b/0x7b [sctp]
 [<ce202823>] ? sctp_do_sm+0xa0/0x14a [sctp]
 [<ce2133b9>] ? sctp_pname+0x0/0x14 [sctp]
 [<ce211d72>] ? sctp_primitive_ASSOCIATE+0x2b/0x31 [sctp]
 [<ce20f3cf>] ? sctp_sendmsg+0x7a0/0x9eb [sctp]
 [<c064eb1e>] ? inet_sendmsg+0x3b/0x43
 [<c04244b7>] ? task_tick_fair+0x2d/0xd9
 [<c06031e1>] ? sock_sendmsg+0xa7/0xc1
 [<c0416afe>] ? smp_apic_timer_interrupt+0x6b/0x75
 [<c0425123>] ? dequeue_task_fair+0x34/0x19b
 [<c0446abb>] ? sched_clock_local+0x17/0x11e
 [<c052ea87>] ? _copy_from_user+0x2b/0x10c
 [<c060ab3a>] ? verify_iovec+0x3c/0x6a
 [<c06035ca>] ? sys_sendmsg+0x186/0x1e2
 [<c042176b>] ? __wake_up_common+0x34/0x5b
 [<c04240c2>] ? __wake_up+0x2c/0x3b
 [<c057e35c>] ? tty_wakeup+0x43/0x47
 [<c04430f2>] ? remove_wait_queue+0x16/0x24
 [<c0580c94>] ? n_tty_read+0x5b8/0x65e
 [<c042be02>] ? default_wake_function+0x0/0x8
 [<c0604e0e>] ? sys_socketcall+0x17f/0x1cd
 [<c040264c>] ? sysenter_do_call+0x12/0x22
Code: 0f 45 de 53 ff b0 98 00 00 00 ff b0 94 ......
EIP: [<c0607630>] skb_over_panic+0x37/0x3e SS:ESP 0068:cb607b74

To reproduce:

# modprobe sctp
# echo 1 > /proc/sys/net/sctp/addip_enable
# echo 1 > /proc/sys/net/sctp/auth_enable
# sctp_test -H 3ffe:501:ffff💯20c:29ff:fe4d:f37e -P 800 -l
# sctp_darn -H 3ffe:501:ffff💯20c:29ff:fe4d:f37e -P 900 -h 192.168.0.21 -p 800 -I -s -t
sctp_darn ready to send...
3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.0.21
3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> bindx-add=192.168.1.21
3ffe:501:ffff💯20c:29ff:fe4d:f37e:900-192.168.0.21:800 Interactive mode> snd=10

------------------------------------------------------------------
eth0 has addresses: 3ffe:501:ffff💯20c:29ff:fe4d:f37e and 192.168.0.21
eth1 has addresses: 192.168.1.21
------------------------------------------------------------------

Reported-by: George Cheimonidis <gchimon@gmail.com>
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:33 -07:00
Vlad Yasevich
81419d862d sctp: per_cpu variables should be in bh_disabled section
Since the change of the atomics to percpu variables, we now
have to disable BH in process context when touching percpu variables.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:33 -07:00
Vlad Yasevich
0c42749cff sctp: fix potential reference of a freed pointer
When sctp attempts to update an assocition, it removes any
addresses that were not in the updated INITs.  However, the loop
may attempt to refrence a transport with address after removing it.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:32 -07:00
Wei Yongjun
561b1733a4 sctp: avoid irq lock inversion while call sk->sk_data_ready()
sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
not be used in this case. Therefore, we have to make a new function
sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.33-rc6 #129
---------------------------------------------------------
sctp_darn/1517 just changed the state of lock:
 (clock-AF_INET){++.?..}, at: [<c06aab60>] sock_def_readable+0x20/0x80
but this lock took another, SOFTIRQ-unsafe lock in the past:
 (slock-AF_INET){+.-...}

and interrupts could create inverse lock ordering between them.

other info that might help us debug this:
1 lock held by sctp_darn/1517:
 #0:  (sk_lock-AF_INET){+.+.+.}, at: [<cdfe363d>] sctp_sendmsg+0x23d/0xc00 [sctp]

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 12:16:31 -07:00
David S. Miller
8d238b25b1 Revert "tcp: bind() fix when many ports are bound"
This reverts two commits:

fda48a0d7a
tcp: bind() fix when many ports are bound

and a follow-on fix for it:

6443bb1fc2
ipv6: Fix inet6_csk_bind_conflict()

It causes problems with binding listening sockets when time-wait
sockets from a previous instance still are alive.

It's too late to keep fiddling with this so late in the -rc
series, and we'll deal with it in net-next-2.6 instead.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-28 11:25:59 -07:00
stephen hemminger
afe0159d93 bridge: multicast_flood cleanup
Move some declarations around to make it clearer which variables
are being used inside loop.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:55 -07:00
stephen hemminger
83f6a740b4 bridge: multicast port group RCU fix
The recently introduced bridge mulitcast port group list was only
partially using RCU correctly. It was missing rcu_dereference()
and missing the necessary barrier on deletion.

The code should have used one of the standard list methods (list or hlist)
instead of open coding a RCU based link list.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:54 -07:00
stephen hemminger
168d40ee3d bridge: multicast flood
Fix unsafe usage of RCU. Would never work on Alpha SMP because
of lack of rcu_dereference()

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:54 -07:00
stephen hemminger
7e80c12448 bridge: simplify multicast_add_router
By coding slightly differently, there are only two cases
to deal with: add at head and add after previous entry.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 18:13:53 -07:00
Dan Carpenter
477fffb082 bluetooth: handle l2cap_create_connless_pdu() errors
l2cap_create_connless_pdu() can sometimes return ERR_PTR(-ENOMEM) or
ERR_PTR(-EFAULT).

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 17:03:36 -07:00
David S. Miller
709b9326ef Revert "bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()"
This reverts commit ff65e8275f.

As explained by Stephen Hemminger, the traversal doesn't require
RCU handling as we hold a lock.

The list addition et al. calls, on the other hand, do.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 16:49:58 -07:00
David S. Miller
ff65e8275f bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()
Noticed by Michał Mirosław.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 16:26:49 -07:00
Jiri Pirko
05fceb4ad7 net: disallow to use net_assign_generic externally
Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 15:49:02 -07:00
Eric Dumazet
c377411f24 net: sk_add_backlog() take rmem_alloc into account
Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 15:13:20 -07:00
Changli Gao
6e7676c1a7 net: batch skb dequeueing from softnet input_pkt_queue
batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled.

Note: in the worst case, the number of packets in a softnet_data may
be double of netdev_max_backlog.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 15:11:49 -07:00
David S. Miller
c58dc01bab net: Make RFS socket operations not be inet specific.
Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-04-27 15:11:48 -07:00
Changli Gao
a9cbd588fd net: reimplement softnet_data.output_queue as a FIFO queue
reimplement softnet_data.output_queue as a FIFO queue to keep the
fairness among the qdiscs rescheduled.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
----
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   22 ++++++++++++----------
 2 files changed, 13 insertions(+), 10 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-04-27 14:32:12 -07:00
Shanyu Zhao
a2c40249a3 mac80211: fix rts threshold check
Currently whenever rts thresold is set, every packet will use RTS
protection no matter its size exceeds the threshold or not. This is
due to a bug in the rts threshold check.
	if (len > tx->local->hw.wiphy->rts_threshold) {
		txrc.rts = rts = true;
	}
Basically it is comparing an int (len) and a u32 (rts_threshold),
and the variable len is assigned as:
	len = min_t(int, tx->skb->len + FCS_LEN,
			 tx->local->hw.wiphy->frag_threshold);
However, when frag_threshold is "-1", len is always "-1", which is
0xffffffff therefore rts is always set to true.

CC: stable@kernel.org
Signed-off-by: Shanyu Zhao <shanyu.zhao@intel.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-04-27 16:09:23 -04:00