Commit graph

1349 commits

Author SHA1 Message Date
Eric Dumazet
81c3d5470e [INET]: speedup inet (tcp/dccp) lookups
Arnaldo and I agreed it could be applied now, because I have other
pending patches depending on this one (Thank you Arnaldo)

(The other important patch moves skc_refcnt in a separate cache line,
so that the SMP/NUMA performance doesnt suffer from cache line ping pongs)

1) First some performance data :
--------------------------------

tcp_v4_rcv() wastes a *lot* of time in __inet_lookup_established()

The most time critical code is :

sk_for_each(sk, node, &head->chain) {
     if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif))
         goto hit; /* You sunk my battleship! */
}

The sk_for_each() does use prefetch() hints but only the begining of
"struct sock" is prefetched.

As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far
away from the begining of "struct sock", it has to bring into CPU
cache cold cache line. Each iteration has to use at least 2 cache
lines.

This can be problematic if some chains are very long.

2) The goal
-----------

The idea I had is to change things so that INET_MATCH() may return
FALSE in 99% of cases only using the data already in the CPU cache,
using one cache line per iteration.

3) Description of the patch
---------------------------

Adds a new 'unsigned int skc_hash' field in 'struct sock_common',
filling a 32 bits hole on 64 bits platform.

struct sock_common {
	unsigned short		skc_family;
	volatile unsigned char	skc_state;
	unsigned char		skc_reuse;
	int			skc_bound_dev_if;
	struct hlist_node	skc_node;
	struct hlist_node	skc_bind_node;
	atomic_t		skc_refcnt;
+	unsigned int		skc_hash;
	struct proto		*skc_prot;
};

Store in this 32 bits field the full hash, not masked by (ehash_size -
1) Using this full hash as the first comparison done in INET_MATCH
permits us immediatly skip the element without touching a second cache
line in case of a miss.

Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to
sk_hash and tw_hash) already contains the slot number if we mask with
(ehash_size - 1)

File include/net/inet_hashtables.h

64 bits platforms :
#define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))
     ((*((__u64 *)&(inet_sk(__sk)->daddr)))== (__cookie))   &&  \
     ((*((__u32 *)&(inet_sk(__sk)->dport))) == (__ports))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))

32bits platforms:
#define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\
     (((__sk)->sk_hash == (__hash))                 &&  \
     (inet_sk(__sk)->daddr          == (__saddr))   &&  \
     (inet_sk(__sk)->rcv_saddr      == (__daddr))   &&  \
     (!((__sk)->sk_bound_dev_if) || ((__sk)->sk_bound_dev_if == (__dif))))


- Adds a prefetch(head->chain.first) in 
__inet_lookup_established()/__tcp_v4_check_established() and 
__inet6_lookup_established()/__tcp_v6_check_established() and 
__dccp_v4_check_established() to bring into cache the first element of the 
list, before the {read|write}_lock(&head->lock);

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 14:13:38 -07:00
Herbert Xu
325ed82393 [NET]: Fix packet timestamping.
I've found the problem in general.  It affects any 64-bit
architecture.  The problem occurs when you change the system time.

Suppose that when you boot your system clock is forward by a day.
This gets recorded down in skb_tv_base.  You then wind the clock back
by a day.  From that point onwards the offset will be negative which
essentially overflows the 32-bit variables they're stored in.

In fact, why don't we just store the real time stamp in those 32-bit
variables? After all, we're not going to overflow for quite a while
yet.

When we do overflow, we'll need a better solution of course.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-10-03 13:57:23 -07:00
Linus Torvalds
7d6322b465 Merge master.kernel.org:/pub/scm/linux/kernel/git/jejb/scsi-for-linus-2.6 2005-10-03 08:07:10 -07:00
Diego Calleja
fd2e54b35b [PATCH] trivial #if -> #ifdef
Use '#ifdef' consistently on __KERNEL__.  This was reported as bug #5340
(isn't easier to send a fix than report the bug?!)

Signed-off-by: Diego Calleja <diegocg@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-10-01 10:54:47 -07:00
Zach Brown
897f15fb58 [PATCH] aio: remove unlocked task_list test and resulting race
Only one of the run or kick path is supposed to put an iocb on the run
list.  If both of them do it than one of them can end up referencing a
freed iocb.  The kick path could delete the task_list item from the wait
queue before getting the ctx_lock and putting the iocb on the run list.
The run path was testing the task_list item outside the lock so that it
could catch ki_retry methods that return -EIOCBRETRY *without* putting the
iocb on a wait queue and promising to call kick_iocb.  This unlocked check
could then race with the kick path to cause both to try and put the iocb on
the run list.

The patch stops the run path from testing task_list by requring that any
ki_retry that returns -EIOCBRETRY *must* guarantee that kick_iocb() will be
called in the future.  aio_p{read,write}, the only in-tree -EIOCBRETRY
users, are updated.

Signed-off-by: Zach Brown <zach.brown@oracle.com>
Signed-off-by: Benjamin LaHaise <bcrl@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-30 12:41:17 -07:00
Linus Torvalds
4a8342d233 Revert task flag re-ordering, add comments
Roland points out that the flags end up having non-obvious dependencies
elsewhere, so revert aa55a08687 and add
some comments about why things are as they are.

We'll just have to fix up the broken comparisons. Roland has a patch.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 15:18:21 -07:00
Oleg Nesterov
aa55a08687 [PATCH] fix TASK_STOPPED vs TASK_NONINTERACTIVE interaction
do_signal_stop:

	for_each_thread(t) {
		if (t->state < TASK_STOPPED)
			++sig->group_stop_count;
	}

However, TASK_NONINTERACTIVE > TASK_STOPPED, so this loop will not
count TASK_INTERRUPTIBLE | TASK_NONINTERACTIVE threads.

See also wait_task_stopped(), which checks ->state > TASK_STOPPED.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>

[ We really probably should always use the appropriate bitmasks to test
  task states, not do it like this. Using something like

	#define TASK_RUNNABLE (TASK_RUNNING | TASK_INTERRUPTIBLE | \
				TASK_UNINTERRUPTIBLE | TASK_NONINTERACTIVE)

  and then doing "if (task->state & TASK_RUNNABLE)" or similar. But the
  ordering of the task states is historical, and keeping the ordering
  does make sense regardless. ]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-29 09:05:52 -07:00
Linus Torvalds
eb693d2994 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-29 08:56:47 -07:00
Alan Cox
b4b52db715 [PATCH] ata: re-order speeds sensibly.
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-28 13:29:56 -04:00
Jeff Garzik
64f09c98d7 /spare/repo/libata-dev branch 'chs-support' 2005-09-28 12:11:15 -04:00
David Howells
664cceb009 [PATCH] Keys: Add possessor permissions to keys [try #3]
The attached patch adds extra permission grants to keys for the possessor of a
key in addition to the owner, group and other permissions bits. This makes
SUID binaries easier to support without going as far as labelling keys and key
targets using the LSM facilities.

This patch adds a second "pointer type" to key structures (struct key_ref *)
that can have the bottom bit of the address set to indicate the possession of
a key. This is propagated through searches from the keyring to the discovered
key. It has been made a separate type so that the compiler can spot attempts
to dereference a potentially incorrect pointer.

The "possession" attribute can't be attached to a key structure directly as
it's not an intrinsic property of a key.

Pointers to keys have been replaced with struct key_ref *'s wherever
possession information needs to be passed through.

This does assume that the bottom bit of the pointer will always be zero on
return from kmem_cache_alloc().

The key reference type has been made into a typedef so that at least it can be
located in the sources, even though it's basically a pointer to an undefined
type. I've also renamed the accessor functions to be more useful, and all
reference variables should now end in "_ref".

Signed-Off-By: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2005-09-28 09:10:47 -07:00
Albert Lee
14be71f4c5 [PATCH] libata: rename host states
Changes:
s/PIO_ST_/HSM_ST_/ and s/pio_task_state/hsm_task_state/.

Signed-off-by: Albert Lee <albertcc@tw.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
2005-09-28 11:58:39 -04:00
Ben Dooks
2fab35d78f [NET]: Fix GCC4 compile error: sysctl in linux/if_ether.h
The following is generated when compiling a
recent (2.6.14-rc2-git5) kernel configured for
ARM, with GCC4. 

  CC      init/main.o
In file included from include/linux/netdevice.h:29,
                 from include/net/sock.h:48,
                 from init/main.c:50:
include/linux/if_ether.h:114: error: array type has incomplete element type

It seems that if CONFIG_SYSCTL is not set, then
the compiler will throw an error due to the definition
of the ether_table[] array

Attached is a solution to the problem

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:59:43 -07:00
David S. Miller
1f26dac320 [NET]: Add Sun Cassini driver.
Written by Adrian Sun (asun@darksunrising.com).
Ported to 2.6.x by Tom 'spot' Callaway <tcallawa@redhat.com>.
Further cleaned up and integrated by David S. Miller

Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:24:13 -07:00
Eric Dumazet
9356b8fc07 [NET]: Reorder some hot fields of struct net_device
Place them on separate cache lines in SMP to lower memory bouncing
between multiple CPU accessing the device.

     - One part is mostly used on receive path (including
       eth_type_trans()) (poll_list, poll, quota, weight, last_rx,
       dev_addr, broadcast)

     - One part is mostly used on queue transmit path (qdisc)
      (queue_lock, qdisc, qdisc_sleeping, qdisc_list, tx_queue_len)

     - One part is mostly used on xmit path (device)
      (xmit_lock, xmit_lock_owner, priv, hard_start_xmit, trans_start)

'features' is placed outside of these hot points, in a location that
may be shared by all cpus (because mostly read)

name_hlist is moved close to name[IFNAMSIZ] to speedup __dev_get_by_name()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-27 15:23:16 -07:00
Linus Torvalds
5c1f4cac6f Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2005-09-26 18:33:26 -07:00
Bagalkote, Sreenivas
c4a3e0a529 [SCSI] MegaRAID SAS RAID: new driver
Signed-off-by: Sreenivas Bagalkote <Sreenivas.Bagalkote@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
2005-09-26 17:32:44 -05:00
David S. Miller
56e9b26324 Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/llc-2.6 2005-09-26 15:29:31 -07:00
Harald Welte
188bab3ae0 [NETFILTER]: Fix invalid module autoloading by splitting iptable_nat
When you've enabled conntrack and NAT as a module (standard case in all
distributions), and you've also enabled the new conntrack netlink
interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko.
This causes a huge performance penalty, since for every packet you iterate
the nat code, even if you don't want it.

This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the
iptables frontend (iptable_nat.ko).  Threfore, ip_conntrack_netlink.ko will
only pull ip_nat.ko, but not the frontend.  ip_nat.ko will "only" allocate
some resources, but not affect runtime performance.

This separation is also a nice step in anticipation of new packet filters
(nf-hipac, ipset, pkttables) being able to use the NAT core.

Signed-off-by: Harald Welte <laforge@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 15:25:11 -07:00
Evgeniy Polyakov
acd042bb2d [CONNECTOR]: async connector mode.
If input message rate from userspace is too high, do not drop them,
but try to deliver using work queue allocation.

Failing there is some kind of congestion control.

It also removes warn_on on this condition, which scares people.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2005-09-26 15:06:50 -07:00
Kars de Jong
4fb7edce52 [PATCH] pcmcia: fix cross-platform issues with pcmcia module aliases
- Added a missing TO_NATIVE call to scripts/mod/file2alias.c:do_pcmcia_entry()
- Add an alignment attribute to struct pcmcia_device_no to solve an alignment
  issue seen when cross-compiling on x86 for m68k.

Signed-off-by: Kars de Jong <jongk@linux-m68k.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2005-09-26 13:13:58 +02:00
Daniel Ritz
6c1a10dba9 [PATCH] yenta: add support for more TI bridges
Support some more TI cardbus bridges.  most of them are multifunction
devices which adds 1394 controllers, smartcard readers etc.  this could
also help with the various problems with the XX21 controllers seen on the
linux-pcmcia list.

Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2005-09-26 13:11:27 +02:00
Daniel Ritz
8c3520d4eb [PATCH] yenta: auto-tune EnE bridges for CardBus cards
Echo Audio cardbus products are known to be incompatible with EnE bridges.
in order to maybe solve the problem a EnE specific test bit has to be set,
another cleared...but other setups have a good chance to break when just
forcing the bits.  so do the whole thingy automatically.

The patch adds a hook in cb_alloc() that allows special tuning for the
different chipsets.  for ene just match the Echo products and set/clear the
test bits, defaults to do the same thing as w/o the patch to not break
working setups.

Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch>
Cc: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
2005-09-26 13:09:20 +02:00
Jeff Garzik
98ed72deeb Merge /spare/repo/linux-2.6/ 2005-09-24 00:26:49 -04:00
Linus Torvalds
87e807b6c4 Merge branch 'upstream' from master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev 2005-09-23 16:44:52 -07:00
Jeff Garzik
536f809802 Merge /spare/repo/linux-2.6/ 2005-09-23 19:03:21 -04:00
Trond Myklebust
f134585a73 Revert "[PATCH] RPC,NFS: new rpc_pipefs patch"
This reverts 17f4e6febca160a9f9dd4bdece9784577a2f4524 commit.
2005-09-23 12:39:00 -04:00
Christoph Hellwig
278c995c8a [PATCH] RPC,NFS: new rpc_pipefs patch
Currently rpc_mkdir/rpc_rmdir and rpc_mkpipe/mk_unlink have an API that's
 a little unfortunate.  They take a path relative to the rpc_pipefs root and
 thus need to perform a full lookup.  If you look at debugfs or usbfs they
 always store the dentry for directories they created and thus can pass in
 a dentry + single pathname component pair into their equivalents of the
 above functions.

 And in fact rpc_pipefs actually stores a dentry for all but one component so
 this change not only simplifies the core rpc_pipe code but also the callers.

 Unfortuntately this code path is only used by the NFS4 idmapper and
 AUTH_GSSAPI for which I don't have a test enviroment.  Could someone give
 it a spin?  It's the last bit needed before we can rework the
 lookup_hash API

 Signed-off-by: Christoph Hellwig <hch@lst.de>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:57 -04:00
Chuck Lever
470056c288 [PATCH] RPC: rationalize set_buffer_size
In fact, ->set_buffer_size should be completely functionless for non-UDP.

 Test-plan:
 Check socket buffer size on UDP sockets over time.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:55 -04:00
Chuck Lever
03bf4b707e [PATCH] RPC: parametrize various transport connect timeouts
Each transport implementation can now set unique bind, connect,
 reestablishment, and idle timeout values.  These are variables,
 allowing the values to be modified dynamically.  This permits
 exponential backoff of any of these values, for instance.

 As an example, we implement exponential backoff for the connection
 reestablishment timeout.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:53 -04:00
Chuck Lever
529b33c6db [PATCH] RPC: allow RPC client's port range to be adjustable
Select an RPC client source port between 650 and 1023 instead of between
 1 and 800.  The old range conflicts with a number of network services.
 Provide sysctls to allow admins to select a different port range.

 Note that this doesn't affect user-level RPC library behavior, which
 still uses 1 to 800.

 Based on a suggestion by Olaf Kirch <okir@suse.de>.

 Test-plan:
 Repeated mount and unmount.  Destructive testing.  Idle timeouts.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:50 -04:00
Chuck Lever
555ee3af16 [PATCH] RPC: clean up after nocong was removed
Clean-up:  Move some macros that are specific to the Van Jacobson
 implementation into xprt.c.  Get rid of the cong_wait field in
 rpc_xprt, which is no longer used.  Get rid of xprt_clear_backlog.

 Test-plan:
 Compile with CONFIG_NFS enabled.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:48 -04:00
Chuck Lever
ed63c00370 [PATCH] RPC: remove xprt->nocong
Get rid of the "xprt->nocong" variable.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss with UDP mounts.
 Look for significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:47 -04:00
Chuck Lever
a58dd398f5 [PATCH] RPC: add a release_rqst callout to the RPC transport switch
The final place where congestion control state is adjusted is in
 xprt_release, where each request is finally released.  Add a callout
 there to allow transports to perform additional processing when a
 request is about to be released.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:45 -04:00
Chuck Lever
1570c1e41e [PATCH] RPC: add generic interface for adjusting the congestion window
A new interface that allows transports to adjust their congestion window
 using the Van Jacobson implementation in xprt.c is provided.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for
 significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:43 -04:00
Chuck Lever
46c0ee8bc4 [PATCH] RPC: separate xprt_timer implementations
Allow transports to hook the retransmit timer interrupt.  Some transports
 calculate their congestion window here so that a retransmit timeout has
 immediate effect on the congestion window.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:41 -04:00
Chuck Lever
49e9a89086 [PATCH] RPC: expose API for serializing access to RPC transports
The next method we abstract is the one that releases a transport,
 allowing another task to have access to the transport.

 Again, one generic version of this is provided for transports that
 don't need the RPC client to perform congestion control, and one
 version is for transports that can use the original Van Jacobson
 implementation in xprt.c.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for
 significant regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:40 -04:00
Chuck Lever
12a804698b [PATCH] RPC: expose API for serializing access to RPC transports
The next several patches introduce an API that allows transports to
 choose whether the RPC client provides congestion control or whether
 the transport itself provides it.

 The first method we abstract is the one that serializes access to the
 RPC transport to prevent the bytes from different requests from mingling
 together.  This method provides proper request serialization and the
 opportunity to prevent new requests from being started because the
 transport is congested.

 The normal situation is for the transport to handle congestion control
 itself.  Although NFS over UDP was first, it has been recognized after
 years of experience that having the transport provide congestion control
 is much better than doing it in the RPC client.  Thus TCP, and probably
 every future transport implementation, will use the default method,
 xprt_lock_write, provided in xprt.c, which does not provide any kind
 of congestion control.  UDP can continue using the xprt.c-provided
 Van Jacobson congestion avoidance implementation.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:38 -04:00
Chuck Lever
fe3aca290f [PATCH] RPC: add API to set transport-specific timeouts
Prepare the way to remove the "xprt->nocong" variable by adding a callout
 to the RPC client transport switch API to handle setting RPC retransmit
 timeouts.

 Add a pair of generic helper functions that provide the ability to set a
 simple fixed timeout, or to set a timeout based on the state of a round-
 trip estimator.

 Test-plan:
 Use WAN simulation to cause sporadic bursty packet loss.  Look for significant
 regression in performance or client stability.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:36 -04:00
Chuck Lever
43118c29de [PATCH] RPC: get rid of xprt->stream
Now we can fix up the last few places that use the "xprt->stream"
 variable, and get rid of it from the rpc_xprt structure.

 Test-plan:
 Destructive testing (unplugging the network temporarily).  Connectathon
 with UDP and TCP.

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:35 -04:00
Chuck Lever
808012fbb2 [PATCH] RPC: skip over transport-specific heads automatically
Add a generic mechanism for skipping over transport-specific headers
 when constructing an RPC request.  This removes another "xprt->stream"
 dependency.

 Test-plan:
 Write-intensive workload on a single mount point (try both UDP and
 TCP).

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:33 -04:00
Chuck Lever
c7b2cae8a6 [PATCH] RPC: separate TCP and UDP write space callbacks
Split the socket write space callback function into a TCP version and UDP
 version, eliminating one dependence on the "xprt->stream" variable.

 Keep the common pieces of this path in xprt.c so other transports can use
 it too.

 Test-plan:
 Write-intensive workload on a single mount point.

 Version: Thu, 11 Aug 2005 16:07:51 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:28 -04:00
Chuck Lever
55aa4f58aa [PATCH] RPC: client-side transport switch cleanup
Clean-up: change some comments to reflect the realities of the new RPC
 transport switch mechanism.  Get rid of unused xprt_receive() prototype.

 Also, organize function prototypes in xprt.h by usage and scope.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:07:21 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:26 -04:00
Chuck Lever
44fbac2288 [PATCH] RPC: Add helper for waking tasks pending on a transport
Clean-up: remove only reference to xprt->pending from the socket transport
 implementation.  This makes a cleaner interface for other transport
 implementations as well.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:06:52 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:24 -04:00
Chuck Lever
2226feb6bc [PATCH] RPC: rename the sockstate field
Clean-up: get rid of a name reference to sockets in the generic parts of the
 RPC client by renaming the sockstate field in the rpc_xprt structure.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:05:53 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:21 -04:00
Chuck Lever
5dc07727f8 [PATCH] RPC: Rename xprt_lock
Clean-up: Replace the xprt_lock with something more aptly named.  This lock
 single-threads the XID and request slot reservation process.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:05:26 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:19 -04:00
Chuck Lever
4a0f8c04f2 [PATCH] RPC: Rename sock_lock
Clean-up: replace a name reference to sockets in the generic parts of the RPC
 client by renaming sock_lock in the rpc_xprt structure.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:05:00 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:17 -04:00
Chuck Lever
9903cd1c27 [PATCH] RPC: transport switch function naming
Introduce block header comments and a function naming convention to the
 socket transport implementation.  Provide a debug setting for transports
 that is separate from RPCDBG_XPRT.  Eliminate xprt_default_timeout().

 Provide block comments for exposed interfaces in xprt.c, and eliminate
 the useless obvious comments.

 Convert printk's to dprintk's.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.

 Version: Thu, 11 Aug 2005 16:04:04 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:14 -04:00
Chuck Lever
a246b0105b [PATCH] RPC: introduce client-side transport switch
Move the bulk of client-side socket-specific code into a separate source
 file, net/sunrpc/xprtsock.c.

 Test-plan:
 Millions of fsx operations.  Performance characterization such as "sio" or
 "iozone".  Destructive testing (unplugging the network temporarily, server
 reboots).  Connectathon with v2, v3, and v4.

 Version: Thu, 11 Aug 2005 16:03:38 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:12 -04:00
Chuck Lever
094bb20b9f [PATCH] RPC: extract socket logic common to both client and server
Clean-up: Move some code that is common to both RPC client- and server-side
 socket transports into its own source file, net/sunrpc/socklib.c.

 Test-plan:
 Compile kernel with CONFIG_NFS enabled.  Millions of fsx operations over
 UDP, client and server.  Connectathon over UDP.

 Version: Thu, 11 Aug 2005 16:03:09 -0400

 Signed-off-by: Chuck Lever <cel@netapp.com>
 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2005-09-23 12:38:11 -04:00