Makes caller side more obvious, there's no need to have
a wrapper for this oneliner!
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
First user will be the DCCP transport networking protocol.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces autotuning to the sctp buffer management code
similar to the TCP. The buffer space can be grown if the advertised
receive window still has room. This might happen if small message
sizes are used, which is common in telecom environmens.
New tunables are introduced that provide limits to buffer growth
and memory pressure is entered if to much buffer spaces is used.
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>.
This patch introduces support for dynamic reconfiguration (adding, removing
and/or modifying parameters of netconsole targets at runtime) using a
userspace interface exported via configfs. Documentation is also updated
accordingly.
Issues and brief design overview:
(1) Kernel-initiated creation / destruction of kernel objects is not
possible with configfs -- the lifetimes of the "config items" is managed
exclusively from userspace. But netconsole must support boot/module
params too, and these are parsed in kernel and hence netpolls must be
setup from the kernel. Joel Becker suggested to separately manage the
lifetimes of the two kinds of netconsole_target objects -- those created
via configfs mkdir(2) from userspace and those specified from the
boot/module option string. This adds complexity and some redundancy here
and also means that boot/module param-created targets are not exposed
through the configfs namespace (and hence cannot be updated / destroyed
dynamically). However, this saves us from locking / refcounting
complexities that would need to be introduced in configfs to support
kernel-initiated item creation / destroy there.
(2) In configfs, item creation takes place in the call chain of the
mkdir(2) syscall in the driver subsystem. If we used an ioctl(2) to
create / destroy objects from userspace, the special userspace program is
able to fill out the structure to be passed into the ioctl and hence
specify attributes such as local interface that are required at the time
we set up the netpoll. For configfs, this information is not available at
the time of mkdir(2). So, we keep all newly-created targets (via
configfs) disabled by default. The user is expected to set various
attributes appropriately (including the local network interface if
required) and then write(2) "1" to the "enabled" attribute. Thus,
netpoll_setup() is then called on the set parameters in the context of
_this_ write(2) on the "enabled" attribute itself. This design enables
the user to reconfigure existing netconsole targets at runtime to be
attached to newly-come-up interfaces that may not have existed when
netconsole was loaded or when the targets were actually created. All this
effectively enables us to get rid of custom ioctls.
(3) Ultra-paranoid configfs attribute show() and store() operations, with
sanity and input range checking, using only safe string primitives, and
compliant with the recommendations in Documentation/filesystems/sysfs.txt.
(4) A new function netpoll_print_options() is created in the netpoll API,
that just prints out the configured parameters for a netpoll structure.
netpoll_parse_options() is modified to use that and it is also exported to
be used from netconsole.
Signed-off-by: Satyam Sharma <satyam@infradead.org>
Acked-by: Keiichi Kii <k-keiichi@bx.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This stale info came from the original idea, which proved to be
unnecessarily complex, sacked_out > 0 is easy to do and that when
it's going to be needed anyway (it _can_ be valid also when
sacked_out == 0 but there's not going to be a guarantee about it
for now).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously code had IsReno/IsFack defined as macros that were
local to tcp_input.c though sack_ok field has user elsewhere too
for the same purpose. This changes them to static inlines as
preferred according the current coding style and unifies the
access to sack_ok across multiple files. Magic bitops of sack_ok
for FACK and DSACK are also abstracted to functions with
appropriate names.
Note:
- One sack_ok = 1 remains but that's self explanary, i.e., it
enables sack
- Couple of !IsReno cases are changed to tcp_is_sack
- There were no users for IsDSack => I dropped it
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
BUG_ON is an overkill. In fact, I was mislead by BUG_TRAP
severity (equals to WARN_ON) which is much lower than BUG_ON's
(that panics).
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously TCP had a transitional state during which reno
counted segments that are already below the current window into
sacked_out, which is now prevented. In addition, re-try now
the unconditional S+L skb catching.
This approach conservatively calls just remove_sack and leaves
reset_sack() calls alone. The best solution to the whole problem
would be to first calculate the new sacked_out fully (this patch
does not move reno_sack_reset calls from original sites and thus
does not implement this). However, that would require very
invasive change to fastretrans_alert (perhaps even slicing it to
two halves). Alternatively, all callers of tcp_packets_in_flight
(i.e., users that depend on sacked_out) should be postponed
until the new sacked_out has been calculated but it isn't any
simpler alternative.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
Left_out was dropped a while ago, thus leaving verifying
consistency of the "left out" as only task for the function in
question. Thus make it's name more appropriate.
In addition, it is intentionally converted to #define instead
of static inline because the location of the invariant failure
is the most important thing to have if this ever triggers. I
think it would have been helpful e.g. in this case where the
location of the failure point had to be based on some quesswork:
http://lkml.org/lkml/2007/5/2/464
...Luckily the guesswork seems to have proved to be correct.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
tp->left_out got removed but nothing came to replace it back
then (users just did addition by themselves), so add function
for users now.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is easily calculable when needed and user are not that many
after all.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
No other users exist for tcp_ecn.h. Very few things remain in
tcp.h, for most TCP ECN functions callers reside within a
single .c file and can be placed there.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
In addition, added a reference about the purpose of the loop.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
It is guaranteed to be valid only when !tp->sacked_out. In most
cases this seqno is available in the last ACK but there is no
guarantee for that. The new fast recovery loss marking algorithm
needs this as entry point.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch provides generic Large Receive Offload (LRO) functionality
for IPv4/TCP traffic.
LRO combines received tcp packets to a single larger tcp packet and
passes them then to the network stack in order to increase performance
(throughput). The interface supports two modes: Drivers can either
pass SKBs or fragment lists to the LRO engine.
Signed-off-by: Jan-Bernd Themann <themann@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Veth stands for Virtual ETHernet. It is a simple tunnel driver
that works at the link layer and looks like a pair of ethernet
devices interconnected with each other.
Mainly it allows to communicate between network namespaces but
it can be used as is as well.
The newlink callback is organized that way to make it easy to
create the peer device in the separate namespace when we have
them in kernel.
This implementation uses another interface - the RTM_NRELINK
message introduced by Patric.
Bug fixes from Daniel Lezcano.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
This routine gets the parsed rtnl attributes and creates a new
link with generic info (IFLA_LINKINFO policy). Its intention
is to help the drivers, that need to create several links at
once (like VETH).
This is nothing but a copy-paste-ed part of rtnl_newlink() function
that is responsible for creation of new device.
Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
ieee80211_get_radiotap_len() tries to dereference radiotap length without
taking care that it is completely unaligned and get_unaligned()
is required.
Signed-off-by: Andy Green <andy@warmcat.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
zd1211rw and bcm43xx are interested in being notified when ERP IE conditions
change, so that they can reprogram a register which affects how control frames
are transmitted.
This patch adds an interface similar to the one that can be found in softmac.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Similarly to CTS protection, whether short preambles are used for 802.11b
transmissions should be a per-subif setting, not device global.
For STAs, this patch makes short preamble handling automatic based on the ERP
IE. For APs, hostapd still uses the prism ioctls, but the write ioctl has been
restricted to AP-only subifs.
ieee80211_txrx_data.short_preamble (an unused field) was removed.
Unfortunately, some API changes were required for the following functions:
- ieee80211_generic_frame_duration
- ieee80211_rts_duration
- ieee80211_ctstoself_duration
- ieee80211_rts_get
- ieee80211_ctstoself_get
Affected drivers were updated accordingly.
Signed-off-by: Daniel Drake <dsd@gentoo.org>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
mac80211 informs the driver what the short and long retry values are through
set_retry_limit(), but when packets are being transmitted it did not inform the
driver which of the 2 retry limits should actually be used.
Instead it sends the actual value, but for drivers that can only set the retry limit
and the register and in the descriptor need to indicate which of the limits should
be used this is not really useful.
This patch will add a IEEE80211_TXCTL_LONG_RETRY_LIMIT flag to the
ieee80211_tx_control structure. By default the short retry limit should be
used but if the flag is set the long retry should be used.
This does not prevent the driver to ignore the request for "no retry" packets,
but at least those will be send out with the short retry limit. But there is no
perfect cure for this problem.. :(
Signed-off-by: Ivo van Doorn <IvDoorn@gmail.com>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The sta_info code has some awkward locking which prevents some driver
callbacks from being allowed to sleep. This patch makes the locking more
focused so code that calls driver callbacks are allowed to sleep. It also
converts sta_lock to a rwlock.
Signed-off-by: Michael Wu <flamingice@sourmilk.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Jiri Benc <jbenc@suse.cz>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
The Lite5200 u-boot image doesn't entirely configure the processor
correctly and so Linux needs to fixup the cpu setup in setup_arch. Fixing
the CPU setup is good, but making it into common code is not a good idea.
New board ports should be encouraged not to take the lead of the lite5200
and instead get their firmware to setup the CPU the right way.
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Sylvain Munaut <tnt@246tnt.com>
blk_trace_setup is broken on x86_64 compat systems,
this makes the code work correctly on all 64 bit architectures
in compat mode.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Move include/linux/umem.h to drivers/block, as umem.c is the only user,
and its not an exported header.
Move the PCI_{VENDOR,DEVICE}_ID_* constants to include/linux/pci_ids.h.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
As bi_end_io is only called once when the reqeust is complete,
the 'size' argument is now redundant. Remove it.
Now there is no need for bio_endio to subtract the size completed
from bi_size. So don't do that either.
While we are at it, change bi_end_io to return void.
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The entire function of flush_dry_bio_endio is to undo the effects
of bio_endio (when called on a barrier request). So remove the
function and the call to bio_endio.
This allows us to remove "bi_size" from "struct request_queue".
Signed-off-by: Neil Brown <neilb@suse.de>
### Diffstat output
./block/ll_rw_blk.c | 39 ++-------------------------------------
./include/linux/blkdev.h | 1 -
2 files changed, 2 insertions(+), 38 deletions(-)
diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Hide everything in blkdev.h with CONFIG_BLOCK isn't set, and fixup
the (few) files that fail to build because they were relying on blkdev.h
pulling in extra includes for them.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
blk_rq_bio_prep is exported for use in exactly
one place. That place can benefit from using
the new blk_rq_append_bio instead.
So
- change dm-emc to call blk_rq_append_bio
- stop exporting blk_rq_bio_prep, and
- initialise rq_disk in blk_rq_bio_prep,
as dm-emc needs it.
Signed-off-by: Neil Brown <neilb@suse.de>
diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
ll_back_merge_fn is currently exported to SCSI where is it used,
together with blk_rq_bio_prep, in exactly the same way these
functions are used in __blk_rq_map_user.
So move the common code into a new function (blk_rq_append_bio), and
don't export ll_back_merge_fn any longer.
Signed-off-by: Neil Brown <neilb@suse.de>
diff .prev/block/ll_rw_blk.c ./block/ll_rw_blk.c
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Every usage of rq_for_each_bio wraps a usage of
bio_for_each_segment, so these can be combined into
rq_for_each_segment.
We define "struct req_iterator" to hold the 'bio' and 'index' that
are needed for the double iteration.
Signed-off-by: Neil Brown <neilb@suse.de>
Various compile fixes by me...
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
The currently used "struct class_device" will be removed from the
kernel. Here is a patch that converts all users in drivers/media/video/
to struct device.
Reviewed-by: Thierry Merle <thierry.merle@free.fr>
Reviewed-by: Mike Isely <isely@pobox.com>
Reviewed-by: Luca Risolia <luca.risolia@studio.unibo.it>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
In the past, videobuf_queue_init were used to initialize PCI DMA videobuffers.
This patch renames it, to avoid confusion with the previous kernel API, doing:
s/videobuf_queue_init/void videobuf_queue_core_init/
Also, the operations is now part of the function parameter. The function will
also add a test if this is defined, otherwise producing BUG.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Before the videobuf redesign, a procedure for re-using videobuf without PCI
scatter/gather where provided by changing the pci-dependent operations by
other operations.
With the newer approach, those methods are obsolete and can safelly be removed.
Signed-off-by: Mauro Carvalho Chehab <mchehab@infradead.org>
The IB CM provides a message received acknowledged (MRA) message that
can be sent to indicate that a REQ or REP message has been received, but
will require more time to process than the timeout specified by those
messages. In many cases, the application may not know how long it will
take to respond to a CM message, but the majority of the time, it will
usually respond before a retry has been sent. Rather than sending an
MRA in response to all messages just to handle the case where a longer
timeout is needed, it is more efficient to queue the MRA for sending in
case a duplicate message is received.
This avoids sending an MRA when it is not needed, but limits the number
of times that a REQ or REP will be resent. It also provides for a
simpler implementation than generating the MRA based on a timer event.
(That is, trying to send the MRA after receiving the first REQ or REP if
a response has not been generated, so that it is received at the remote
side before a duplicate REQ or REP has been received)
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Implement FMRs for mlx4. This is an adaptation of code from mthca.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The declaration of struct ib_user_mad_reg_req.method_mask[] exported
to userspace was an array of __u32, but the kernel internally treated
it as a bitmap made up of longs. This makes a difference for 64-bit
big-endian kernels, where numbering the bits in an array of__u32 gives:
|31.....0|63....31|95....64|127...96|
while numbering the bits in an array of longs gives:
|63..............0|127............64|
64-bit userspace can handle this by just treating method_mask[] as an
array of longs, but 32-bit userspace is really stuck: the meaning of
the bits in method_mask[] depends on whether the kernel is 32-bit or
64-bit, and there's no sane way for userspace to know that.
Fix this by updating <rdma/ib_user_mad.h> to make it clear that
method_mask[] is an array of longs, and using a compat_ioctl method to
convert to an array of 64-bit longs to handle the 32-on-64 problem.
This fixes the interface description to match existing behavior (so
working binaries continue to work) in almost all situations, and gives
consistent semantics in the case of 32-bit userspace that can run on
either a 32-bit or 64-bit kernel, so that the same binary can work for
both 32-on-32 and 32-on-64 systems.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add support for setting the P_Key index of sent MADs and getting the
P_Key index of received MADs. This requires a change to the layout of
the ABI structure struct ib_user_mad_hdr, so to avoid breaking
compatibility, we default to the old (unchanged) ABI and add a new
ioctl IB_USER_MAD_ENABLE_PKEY that allows applications that are aware
of the new ABI to opt into using it.
We plan on switching to the new ABI by default in a year or so, and
this patch adds a warning that is printed when an application uses the
old ABI, to push people towards converting to the new ABI.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Reviewed-by: Hal Rosenstock <hal@xsigo.com>
display the following device information under /sys/class/infiniband/mlx4_X:
board_id, fw_ver, hw_rev, hca_type.
This patch makes this information available to userspace utilities
such as ibstat and ibv_devinfo.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
During ib_umem_get(), determine whether all pages from the memory
region are hugetlb pages and report this in the "hugetlb" member.
Low-level drivers can use this information if they need it.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>