Commit graph

42758 commits

Author SHA1 Message Date
NeilBrown
dfc7064500 md: restart recovery cleanly after device failure.
When we get any IO error during a recovery (rebuilding a spare), we abort
the recovery and restart it.

For RAID6 (and multi-drive RAID1) it may not be best to restart at the
beginning: when multiple failures can be tolerated, the recovery may be
able to continue and re-doing all that has already been done doesn't make
sense.

We already have the infrastructure to record where a recovery is up to
and restart from there, but it is not being used properly.
This is because:
  - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
    which causes the recovery not be be checkpointed.
  - We remove spares and then re-added them which loses important state
    information.

The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
needed.  If there is an error, the relevant drive will be marked as
Faulty, and that is enough to ensure correct handling of the error.  So we
first remove MD_RECOVERY_ERR, changing some of the uses of it to
MD_RECOVERY_INTR.

Then we cause the attempt to remove a non-faulty device from an array to
fail (unless recovery is impossible as the array is too degraded).  Then
when remove_and_add_spares attempts to remove the devices on which
recovery can continue, it will fail, they will remain in place, and
recovery will continue on them as desired.

Issue:  If we are halfway through rebuilding a spare and another drive
fails, and a new spare is immediately available,  do we want to:
 1/ complete the current rebuild, then go back and rebuild the new spare or
 2/ restart the rebuild from the start and rebuild both devices in
    parallel.

Both options can be argued for.  The code currently takes option 2 as
  a/ this requires least code change
  b/ this results in a minimally-degraded array in minimal time.

Cc: "Eivind Sarto" <ivan@kasenna.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
Bernd Schubert
90b08710e4 md: allow parallel resync of md-devices.
In some configurations, a raid6 resync can be limited by CPU speed
(Calculating P and Q and moving data) rather than by device speed.  In
these cases there is nothing to be gained byt serialising resync of arrays
that share a device, and doing the resync in parallel can provide benefit.
 So add a sysfs tunable to flag an array as being allowed to resync in
parallel with other arrays that use (a different part of) the same device.

Signed-off-by: Bernd Schubert <bs@q-leap.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
Dan Williams
4f54b0e948 md: notify userspace on 'stop' events
This additional notification to 'array_state' is needed to allow the
monitor application to learn about stop events via sysfs.  The
sysfs_notify("sync_action") call that comes at the end of do_md_stop()
(via md_new_event) is insufficient since the 'sync_action' attribute has
been removed by this point.

(Seems like a sysfs-notify-on-removal patch is a better fix.  Currently
removal updates the event count but does not wake up waiters)

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
NeilBrown
09a44cc150 md: notify userspace on 'write-pending' changes to array_state
When an array enters write pending, 'array_state' changes, so we must be
sure to sysfs_notify.

Also, when waiting for user-space to acknowledge 'write-pending' by
marking the metadata as dirty, we don't want to wait for MD_CHANGE_DEVS to
be cleared as that might not happen.  So explicity test for the bits that
we are really interested in.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
NeilBrown
698b18c1e8 md: raid1: Fix restoration of bio between failed read and write.
When performing a "recovery" or "check" pass on a RAID1 array, we read
from each device and possible, if there is a difference or a read error,
write back to some devices.

We use the same 'bio' for both read and write, resetting various fields
between the two operations.

We forgot to reset bv_offset and bv_len however.  These are often left
unchanged, but in the case where there is an IO error one or two sectors
into a page, they are changed.

This results in correctable errors not being corrected properly.  It does
not result in any data corruption.

Cc: "Fairbanks, David" <David.Fairbanks@stratus.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
Bernd Schubert
6be9d49401 md: md: raid5 rate limit error printk
Last night we had scsi problems and a hardware raid unit was offlined
during heavy i/o.  While this happened we got for about 3 minutes a huge
number messages like these

Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error not correctable (sector 2993096568 on sdj2).

I guess the high error rate is responsible for not scheduling other events
- during this time the system was not pingable and in the end also other
devices run into scsi command timeouts causing problems on these unrelated
devices as well.

Signed-off-by: Bernd Schubert <bernd-schubert@gmx.de>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:10 -07:00
Christoph Hellwig
6bcfd60186 md: kill file_path wrapper
Kill the trivial and rather pointless file_path wrapper around d_path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:09 -07:00
NeilBrown
84255d1018 md: fix possible oops when removing a bitmap from an active array
It is possible to add a write-intent bitmap to an active array, or remove
the bitmap that is there.

When we do with the 'quiesce' the array, which causes make_request to
block in "wait_barrier()".

However we are sampling the value of "mddev->bitmap" before the
wait_barrier call, and using it afterwards.  This can result in using a
bitmap structure that has been freed.

Signed-off-by: Neil Brown <neilb@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:09 -07:00
Ignacio García Pérez
4b6f6ce97e serial: support for InstaShield IS-400 four port RS-232 PCI card
Add support for the InstaShield IS-400 four port RS-232 PCI card.

Signed-off-by: Ignacio García Pérez <iggarpe@t2i.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:09 -07:00
Darrick J. Wong
8808a793f0 ibmaem: new driver for power/energy/temp meters in IBM System X hardware
This driver reads IBM Active Energy Manager energy/temperature/power
sensors on IBM System X hardware.

[akpm@linux-foundation.org: fix printk warnings]
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Corey Minyard <minyard@acm.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:08 -07:00
Darrick J. Wong
b8fdaf5a05 i5k_amb: support Intel 5400 chipset
Minor rework to support the Intel 5400 chipset.

Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:08 -07:00
Gabor Czigola
ca68d0ac16 hdaps: invert the axes for HDAPS on Lenovo R61i ThinkPads
Cc: "Mark M. Hoffman" <mhoffman@lightlink.com>
Cc: Dmitry Torokhov <dtor@mail.ru>
Cc: Jiri Kosina <jikos@jikos.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-24 09:56:08 -07:00
Linus Torvalds
c2448278e3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
  IPoIB: Test for NULL broadcast object in ipiob_mcast_join_finish()
  MAINTAINERS: Add cxgb3 and iw_cxgb3 NIC and iWARP driver entries
  IB/mlx4: Fix creation of kernel QP with max number of send s/g entries
  IB/mthca: Fix max_sge value returned by query_device
  RDMA/cxgb3: Fix uninitialized variable warning in iwch_post_send()
  IB/mlx4: Fix uninitialized-var warning in mlx4_ib_post_send()
  IB/ipath: Fix UC receive completion opcode for RDMA WRITE with immediate
  IB/ipath: Fix printk format for ipath_sdma_status
2008-05-23 11:11:44 -07:00
Dave Olson
5a4f2b6752 IB/mad: Fix kernel crash when .process_mad() returns SUCCESS|CONSUMED
If a low-level driver returns IB_MAD_RESULT_SUCCESS | IB_MAD_RESULT_CONSUMED,
handle_outgoing_dr_smp() doesn't clean up properly.  The fix is to
kfree the local data and break, rather than falling through.  This was
observed with the ipath driver, but could happen with any driver.

This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=1027>.

Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-05-23 10:52:59 -07:00
Linus Torvalds
e6b027a398 Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] clarify license of freq_table.c
  [CPUFREQ] Remove documentation of removed ondemand tunable.
  [CPUFREQ] Crusoe: longrun cpufreq module reports false min freq
  [CPUFREQ] powernow-k8: improve error messages
2008-05-23 09:24:52 -07:00
Jesse Barnes
57f7bd5b45 remove debug printk from DRM suspend path
Not sure how this snuck upstream, but it really doesn't belong there.  We
don't need a KERN_ERR printk in the suspend path to know what's going on (at
least not anymore).

Signed-off-by:  Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:53:13 -07:00
Linus Torvalds
cbff290491 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
  [POWERPC] iSeries: Remove unused mail address
  [POWERPC] mpic: Fix use of uninitialized variable
  [POWERPC] Add kernstart_addr to list of allowed symbols in prom_init
  [POWERPC] Fix __set_fixmap() for STRICT_MM_TYPECHECKS
  [POWERPC] PS3: Fix memory hotplug
2008-05-23 08:15:12 -07:00
Harvey Harrison
5e2daeb3c9 fbdev: fix integer as NULL pointer warning
drivers/video/aty/atyfb_base.c:3359:26: warning: Using plain integer as NULL pointer
drivers/video/aty/radeon_base.c:2280:32: warning: Using plain integer as NULL pointer
drivers/video/matrox/matroxfb_base.h:203:25: warning: Using plain integer as NULL pointer
drivers/video/matrox/matroxfb_base.h:203:25: warning: Using plain integer as NULL pointer
drivers/video/sis/sis_main.c:5790:44: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:07 -07:00
Harvey Harrison
9bcf091083 scsi: fix integer as NULL pointer warning
drivers/scsi/aha152x.c:3585:60: warning: Using plain integer as NULL pointer
drivers/scsi/aha152x.c:3845:56: warning: Using plain integer as NULL pointer
drivers/scsi/qla1280.c:2814:37: warning: Using plain integer as NULL pointer
drivers/scsi/atp870u.c:750:47: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1281:36: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1293:36: warning: Using plain integer as NULL pointer
drivers/scsi/3w-9xxx.c:1301:35: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:447:10: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:457:10: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:479:24: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:483:22: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:1213:23: warning: Using plain integer as NULL pointer
drivers/scsi/hptiop.c:1214:23: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:07 -07:00
Harvey Harrison
94b5e0ac69 isdn: fix integer as NULL pointer warning
drivers/isdn/hysdn/hycapi.c:465:42: warning: Using plain integer as NULL pointer
drivers/isdn/hysdn/hycapi.c:467:44: warning: Using plain integer as NULL pointer
drivers/isdn/hysdn/hycapi.c:469:42: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:06 -07:00
Harvey Harrison
b62151de49 acpi: fix integer as NULL pointer warning
drivers/acpi/dispatcher/dsmethod.c:568:50: warning: Using plain integer as NULL pointer
drivers/acpi/executer/exmutex.c:329:30: warning: Using plain integer as NULL pointer
drivers/acpi/executer/exmutex.c:466:31: warning: Using plain integer as NULL pointer

Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-05-23 08:11:06 -07:00
Stephen Rothwell
8962cadbe7 [POWERPC] iSeries: Remove unused mail address
I don't use my IBM email address normally and people can find me in
CREDITS.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-05-23 16:45:04 +10:00
Dominik Brodowski
4f74369422 [CPUFREQ] clarify license of freq_table.c
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
2008-05-22 16:38:03 -04:00
Stephen Hemminger
c03571a3e2 via-velocity: use memmove
Use memmove to handle overlapping copy of data.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:55 -04:00
Stephen Hemminger
c73d2589b7 via-velocity: use netdev_alloc_skb
Use netdev_alloc_skb for rx buffer allocation. This sets skb->dev
and can be overriden for NUMA machines.

Change code to return new buffer rather than call by reference.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:50 -04:00
Stephen Hemminger
47f98c7d4b dl2k: use netdev_alloc_skb
Use netdev_alloc_skb. This sets skb->dev and allows arch specific
allocation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:45 -04:00
Stephen Hemminger
8eb6013189 hamachi: use netdev_alloc_skb
Use netdev_alloc_skb. This sets skb->dev and allows arch specific
allocation.

Remove dead code and dead comments.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:40 -04:00
Stephen Hemminger
d27e7c3f6c ixp2000: use netdev_alloc_skb
Use netdev_alloc_skb. This sets skb->dev and allows arch specific
allocation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:39 -04:00
Maciej W. Rozycki
3f7a3535a6 sb1250: use netdev_alloc_skb
Use netdev_alloc_skb.  This sets skb->dev and allows arch specific
allocation.  Also simplify and cleanup the alignment code.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:37 -04:00
Stephen Hemminger
b102df14d7 atl1: use netdev_alloc_skb
Use netdev_alloc_skb for rx buffer allocation. This sets skb->dev
and can be overriden for NUMA machines.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:35 -04:00
Stephen Hemminger
855e1111f3 tg3: remove unneeded semicolons
Remove extraneous semicolons after switch and conditional statements.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:12:30 -04:00
Komuro
bdefff1f54 fmvj18x_cs: add NextCom NC5310 rev B support
fmvj18x_cs: The manfid of "NextCom NC5310 rev B" is MANF_ID_FUJITSU.
            but this card is MBH10302 based card.
            use ConfigBase to detect the cardtype for this card.

Signed-off-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:08:56 -04:00
Komuro
43fc63dceb xirc2ps_cs: re-initialize the multicast address in do_reset
keep bit7,8 of XIRCREG42_SWC1 in set_multicast_list.

Signed-off-by: Komuro <komurojun-mbn@nifty.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:08:50 -04:00
Wang Chen
f7f312a0c7 3C509: rx_bytes should not be increased when alloc_skb failed
If alloc_skb failed, the recieved packet will be dropped. Do not increase
rx_bytes for dropped packet.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:02:46 -04:00
Wang Chen
56cfe5d028 NETFRONT: Use __skb_queue_purge()
Use standard routine for queue purging.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:01:03 -04:00
Wang Chen
288369cc25 VIRTIO: Use __skb_queue_purge()
Use standard routine for queue purging.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:01:02 -04:00
Paul Gortmaker
a01b3d766c phylib: do EXPORT_SYMBOL on get_phy_id
Commit cac1f3c8 factored out the code for get_phy_id so that it
could be reused in multiple places.  Turns out that some of the
users can be modular, so we need to export this symbol as well.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 14:01:01 -04:00
Philipp Zabel
4ba35fbe29 [ARM] 5043/1: pxafb: remove unused mode variable in pxafb_init_fbinfo
Signed-off-by: Philipp Zabel <philipp.zabel@gmail.com>
Acked-by: Eric Miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2008-05-22 14:10:38 +01:00
Randy Dunlap
92fbc1c146 3c515: fix using pnp_get_resource when CONFIG_ISAPNP=n
3c515.c uses pnp_irq(), which calls pnp_get_resource(),
which is not defined when CONFIG_PNP=n, so in that case,
get the IRQ from a hardware register.

3c515.c:(.text+0x3adc0): undefined reference to `pnp_get_resource'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:34 -04:00
Maciej W. Rozycki
1b0771ab3e PHYLIB: Kconfig: Complete the list of Broadcom PHYs supported
Add Broadcom PHYs supported missing from the description.

Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:32 -04:00
Nate Case
cd9af3dac6 PHYLIB: Add 1000Base-X support for Broadcom bcm5482
Configure the BCM5482S secondary SerDes for 1000Base-X mode when the
appropriate dev_flags are passed in to phy_connect().  This is
needed when the PHY is used for fiber and backplane connections.

Signed-off-by: Nate Case <ncase@xes-inc.com>
Signed-off-by: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:31 -04:00
Jay Vosburgh
3915c1e863 bonding: Add "follow" option to fail_over_mac
Add a "follow" selection for fail_over_mac.  This option
causes the MAC address to move from slave to slave as the active
slave changes.  This is in addition to the existing fail_over_mac option
that causes the bond's MAC address to change during failover.

	This new option is useful for devices that cannot tolerate
multiple ports using the same MAC address simultaneously, either
because it confuses them or incurs a performance penalty (as is the
case with some LPAR-aware multiport devices).  Because the MAC of the
bond itself does not change, the "follow" option is slightly more
reliable during failover and doesn't change the MAC of the bond during
operation.

	This patch requires a previous ARP monitor change to properly
handle RTNL during failovers.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:29 -04:00
Jay Vosburgh
b2220cad58 bonding: refactor ARP active-backup monitor
Refactor ARP monitor for active-backup mode.  The motivation for
this is to take care of locking issues in a clear manner (particularly to
correctly handle RTNL vs. the bonding locks).  Currently, the a-b ARP
monitor does not hold RTNL at all, but future changes will require RTNL
during ARP monitor failovers.

	Rather than using conditional locking, this patch instead breaks
up the ARP monitor into three discrete steps: inspection, commit changes,
and probe.  The inspection phase marks slaves that require link state
changes.  The commit phase is only called if inspection detects that
changes are needed, and is called with RTNL.  Lastly, the probe phase
issues the ARP probes that the inspection phase uses to determine link
state.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:28 -04:00
Moni Shoua
7893b2491a bonding: Send more than one gratuitous ARP when slave takes over
With IPoIB, reception of gratuitous ARP by neighboring hosts
is essential for a successful change of slaves in case of failure.
Otherwise, they won't learn about the HW address change and need
to wait a long time until the neighboring system gives up and sends
an ARP request to learn the new HW address.  This patch decreases
the chance for a lost of a gratuitous ARP packet by sending it more
than once. The number retries is configurable and can be set with a
module param.

Signed-off-by: Moni Shoua <monis@voltaire.com>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:26 -04:00
Pavel Emelyanov
8047637c70 bonding: Remove unneeded list_empty checks.
Some places iterate over the checked list right after the check
itself, so even if the list is empty, the list_for_each_xxx
iterator will make everything right by himself.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:25 -04:00
Pavel Emelyanov
0883beca7f bonding: Relax unneeded _safe lists iterations.
Many places either do not modify the list under the list_for_each_xxx,
or break out of the loop as soon as the first element is removed.

Thus, this _safe iteration just occupies some unneeded .text space
and requires an additional variable.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:22 -04:00
Pavel Emelyanov
0dd646fe05 bonding: Remove redundant argument from bond_create.
While we're fixing the bond_create, I hope it's OK to polish it
a bit after the fixes.

The third argument is NULL at the first caller and is ignored by
the second one, so remove it.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:21 -04:00
Jay Vosburgh
4b8a9239ee bonding: remove test for IP in ARP monitor
Remove bond_has_ip and all references to it.  With this change,
the ARP monitor will always send ARP probes if the master is up and has
at least one slave.  If the bond has an IP address, it is used in the
ARP probe; if not, the probes are sent with all zeros in the sender's
IP address (which is consistent with an RFC 2131 4.4.1 duplicate address
probe).

	This is useful for cases when bonding itself is hidden underneath
a layer of virtual devices, e.g., with Xen.

	Change suggested by Tsutomu Fujii <t-fujii@nb.jp.nec.com>, who
included a one-line patch that only affected active-backup mode.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:18 -04:00
Jay Vosburgh
5ce0da8f03 bonding: Use msecs_to_jiffies, eliminate panic
Convert bonding to use msecs_to_jiffies instead of doing the
math.  For the ARP monitor, there was an underflow problem that could
result in an infinite loop.  The miimon already had that worked around,
but this is cleaner.

	Originally by Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Jay Vosburgh corrected a math error in the original; Nicolas' original
commit message is:

When setting arp_interval parameter to a very low value, delta_in_ticks
for next arp might become 0, causing an infinite loop.

See http://bugzilla.kernel.org/show_bug.cgi?id=10680

Same problem for miimon parameter already fixed, but fix might be
enhanced, by using msecs_to_jiffies() function.

Signed-off-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr>
Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:17 -04:00
Al Viro
d63ddcec20 misc drivers/net endianness noise
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-05-22 06:34:15 -04:00