Commit graph

555549 commits

Author SHA1 Message Date
Pavel Fedin
b43c142f22 net: smsc911x: Fix crash if loopback test fails
On certain hardware in certain situations loopback test fails and the
driver gets removed. During mdiobus_unregister() instance of PHY driver
gets disposed. But by this time it has already been started using
phy_connect_direct().

PHY driver uses DELAYED_WORK in order to maintain its state. Attempting
to dispose the driver without calling phy_disconnect() causes deallocation
of DELAYED_WORK being active. This shortly causes a bad crash in timer
code.

The problem can be discovered by enabling CONFIG_DEBUG_OBJECTS_TIMERS and
CONFIG_DEBUG_OBJECTS_FREE

Signed-off-by: Pavel Fedin <p.fedin@samsung.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:05:24 -05:00
Jon Paul Maloy
5cbb28a4bf tipc: linearize arriving NAME_DISTR and LINK_PROTO buffers
Testing of the new UDP bearer has revealed that reception of
NAME_DISTRIBUTOR, LINK_PROTOCOL/RESET and LINK_PROTOCOL/ACTIVATE
message buffers is not prepared for the case that those may be
non-linear.

We now linearize all such buffers before they are delivered up to the
generic reception layer.

In order for the commit to apply cleanly to 'net' and 'stable', we do
the change in the function tipc_udp_recv() for now. Later, we will post
a commit to 'net-next' moving the linearization to generic code, in
tipc_named_rcv() and tipc_link_proto_rcv().

Fixes: commit d0f91938be ("tipc: add ip/udp media type")
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:04:29 -05:00
Fabio Estevam
f4444574a4 fec: Use gpio_set_value_cansleep()
We are in a context where we can sleep, and the FEC PHY reset gpio
may be on an I2C expander. Use the cansleep() variant when
setting the GPIO value.

Based on a patch from Russell King for pci-mvebu.c.

Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:03:01 -05:00
David S. Miller
24c39de009 Merge branch 'csum_partial_frags'
Hannes Frederic Sowa says:

====================
net: clean up interactions of CHECKSUM_PARTIAL and fragmentation

This series fixes wrong checksums on the wire for IPv4 and IPv6. Large
send buffers and especially NFS lead to wrong checksums in both IPv4
and IPv6.

CHECKSUM_PARTIAL skbs should not receive the respective fragmentations
functions, so we add WARN_ON_ONCE to those functions to fix up those as
soon as they get reported.

Changelog:
v2: added v4 checks
v3: removed WARN_ON_ONCES (advice by Tom Herbert)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:37 -05:00
Hannes Frederic Sowa
405c92f7a5 ipv6: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Fixes: commit 32dce968dd ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:28 -05:00
Hannes Frederic Sowa
682b1a9d3f ipv6: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

The IPv6 code also intended to protect and not use CHECKSUM_PARTIAL in
the existence of IPv6 extension headers, but the condition was wrong. Fix
it up, too. Also the condition to check whether the packet fits into
one fragment was wrong and has been corrected.

Fixes: commit 32dce968dd ("ipv6: Allow for partial checksums on non-ufo packets")
See-also: commit 72e843bb09 ("ipv6: ip6_fragment() should check CHECKSUM_PARTIAL")
Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa
dbd3393c56 ipv4: add defensive check for CHECKSUM_PARTIAL skbs in ip_fragment
CHECKSUM_PARTIAL skbs should never arrive in ip_fragment. If we get one
of those warn about them once and handle them gracefully by recalculating
the checksum.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
Hannes Frederic Sowa
d749c9cbff ipv4: no CHECKSUM_PARTIAL on MSG_MORE corked sockets
We cannot reliable calculate packet size on MSG_MORE corked sockets
and thus cannot decide if they are going to be fragmented later on,
so better not use CHECKSUM_PARTIAL in the first place.

Cc: Eric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Benjamin Coddington <bcodding@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-11-01 12:01:27 -05:00
Wan Zongshun
2167ceabf3 x86/cpu: Add CLZERO detection
AMD Fam17h processors introduce support for the CLZERO
instruction. It zeroes out the 64 byte cache line specified in
RAX.

Add the bit here to allow /proc/cpuinfo to list the feature.

Boris: we're adding this as a separate ->x86_capability leaf
because CPUID_80000008_EBX is going to contain more feature bits
and it will fill out with time.

Signed-off-by: Wan Zongshun <Vincent.Wan@amd.com>
Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ Wrap code in patch form, fix comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Rui <ray.huang@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-4-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:23 +01:00
Borislav Petkov
dc34bdd236 x86/mce: Add a default case to the switch in __mcheck_cpu_ancient_init()
Caught by building with W= which enable -Wswitch-default also.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Link: http://lkml.kernel.org/r/1446207099-24948-3-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:14 +01:00
Aravind Gopalakrishnan
c7f54d21fb x86/mce: Add a Scalable MCA vendor flags bit
Scalable MCA (SMCA) is a new feature in AMD Fam17h processors
which indicates presence of MCA extensions.

MCA extensions expands existing register space for the MCE banks
and also introduces a new MSR range to accommodate new banks.

Add the detection bit.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
[ Reformat mce_vendor_flags definitions and save indentation levels. Improve comments. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ashok Raj <ashok.raj@intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1446207099-24948-2-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-01 11:26:13 +01:00
Linus Walleij
0963670aea gpio: fix up SPI submenu
- Relax dependencies on SPI_MASTER for drivers in the SPI menu
  that already has this dependency.
- Move out the expander that would be hidden for I2C access if
  SPI_MASTER was not selected. Tentatively create a separate
  menu for this.
- Move the ZX SoC driver to memory-mapped drivers, this must be
  a mistake and only worked because the system has an SPI master
  enabled at the same time.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-11-01 10:50:19 +01:00
Linus Walleij
269a46f80b gpio: drop surplus I2C dependencies
The I2C expander menu already depends on I2C, drop subdependecies
on individual drivers. Keep the instances of depends on I2C=y
though, so these are still restricted to the compiled-in case.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-11-01 10:43:16 +01:00
Linus Walleij
7768feb0f5 gpio: drop surplus X86 dependencies
Port-mapped I/O depends on X86 already, so individual drivers need
not specify this dependency.

Suggested-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
2015-11-01 10:39:07 +01:00
Linus Torvalds
56ef9db246 ARM: SoC fixes for v4.3
This should be our final batch of fixes for 4.3:
 
 - A patch from Sudeep Holla that fixes annotation of wakeup sources properly,
   old unused format seems to have spread through copying.
 - Two patches from Tony for OMAP. One dealing with MUSB setup problems due to
   runtime PM being enabled too early on the parent device. The other fixes
   IRQ numbering for OMAP1.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJWNGocAAoJEIwa5zzehBx32UIP/AiO+T1Sb7ZlF+SoQTpMloe1
 BCPtb/rHV4nBbgCTOw7ixauUAt50oI5d6d8+QimC9CgeXOch9QtdWhsp85dVfped
 hfQmlqGIfRi5ltETk7Wnem4JPiz6g9O8atZc8KaFUzhVMPBbtuxaWV9G6rfo11U2
 ps49n73D1CurAIlmGRl47+YpfZl6DvrqiY/YJShU7nEwixIGMYvObN4Wrp1Wlg2H
 1nZ/KPEuVDamS5hxGqqZL+P6I+ePnCN53QFcVfTPzAUu3brv5v/HXvHcE/AdaR5b
 wtrMGzZ3lfN7WChSS8mGgzUBMvMd+xnfF7Bb5B6zwcSGs+xU4Sey8A465m7OY30t
 qdIDbrNB1+1ez62i/yWuH/yAD4QJbQWY3sFIarCdWayqtbh2Yq3d6AoHFydEnTE/
 uHziKxFl3Zur21AcV89PQr3tPwf7Qu9CyN6DSdDgh2xyjPMSnZmDEjY4X2S9z4pu
 4N7JJUL0e4NFr57pk6nU9vkRcLuRXCQDnGqnsZCDxLxEOpsdthGxog/yaHJGeZU8
 2VmqVlwz4ST11X4wt/4XuC5HOWhBesx238UYFjQdPCcQg+YD78ANNHceFm7LUeV7
 ddtbwqjuU+MOGW8KjViPmDkimdaGPNm6eSJt+T4g0toHdWyo9ForQysFuXn2XQBX
 EVEDAgego+tNhaCCDW+p
 =jKgB
 -----END PGP SIGNATURE-----

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

Pull ARM SoC fixes from Olof Johansson:
 "This should be our final batch of fixes for 4.3:

   - A patch from Sudeep Holla that fixes annotation of wakeup sources
     properly, old unused format seems to have spread through copying.

   - Two patches from Tony for OMAP.  One dealing with MUSB setup
     problems due to runtime PM being enabled too early on the parent
     device.  The other fixes IRQ numbering for OMAP1"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  usb: musb: omap2430: Fix regression caused by driver core change
  ARM: OMAP1: fix incorrect INT_DMA_LCD
  ARM: dts: fix gpio-keys wakeup-source property
2015-10-31 21:36:07 -07:00
Linus Torvalds
060b85b0d3 SCSI fixes on 20151031
This is three essential bug fixes for various SCSI parts.  The only affected
 users are SCSI multi-path via device handler (basically all the enterprise)
 and mvsas users.  The dh bugs are an async entanglement in boot resulting in a
 serious WARN_ON trip and a use after free on remove leading to a crash with
 strict memory accounting.  The mvsas bug manifests as a null deref oops but
 only on abort sequences; however, these can commonly occur with SATA attached
 devices, hence the fix.
 
 Signed-off-by: James Bottomley <JBottomley@Odin.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWNDcYAAoJEDeqqVYsXL0M20oH/ioR0SAFJx5YJRCU2y3aTmTd
 ZCcLaRJTHkdu+rObn+gU54q6eCrg9Z7+gj1DSb/B55nAgi8pK+bMd+Im/X+w4fG5
 4WIj9P2HJX5YliDFONf68XWCeFyfAMFScEkLY8raOaFU+KtZGvpb/RQRwce0eAMF
 OiV+NK65ENZdVnfTCR3btyBq+FDvPL2dX5a3VK3NpTArrH0BF4/ZuoC7E/tYkKgo
 CQ6bGSyvx/loJlopLr3tcf4+tzCi4H8gr0cRLm9SHLbEwP0oXXpN9juVjiUbACIb
 6etoyJJ+MjlQLofem2bJ4ftnP8cAju+EllfBkJXD3VEsqTAd0r9+B6Rx3YPlo10=
 =jo6V
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is three essential bug fixes for various SCSI parts.

  The only affected users are SCSI multi-path via device handler
  (basically all the enterprise) and mvsas users.  The dh bugs are an
  async entanglement in boot resulting in a serious WARN_ON trip and a
  use after free on remove leading to a crash with strict memory
  accounting.  The mvsas bug manifests as a null deref oops but only on
  abort sequences; however, these can commonly occur with SATA attached
  devices, hence the fix"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi_dh: don't try to load a device handler during async probing
  scsi_dh: fix use-after-free when removing scsi device
  mvsas: Fix NULL pointer dereference in mvs_slot_task_free
2015-10-31 21:26:04 -07:00
Linus Torvalds
af7eba0158 two more bug fixes for md.
One bugfix for a list corruption in raid5 because of incorrect
 locking.
 
 Other for possible data corruption when a recovering device is failed,
 removed, and re-added.
 
 Both tagged for -stable.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQIcBAABCAAGBQJWNAnCAAoJEDnsnt1WYoG5VzwP/jRfljaxLKUicIGEd9qeaei5
 vVtmzPugthzYTfcJRm54ZufQOnjse/uXEBqmvoMJhBeEMbL2VIWbn1i7sMWjgq8W
 HVz/N/N8iT5DFWgzXmQQCaxIu17njbEmO8IelcNr3i6tE5wbHJ9WhV5UDGdQgwca
 6XjQ8r8QjmN3uVaiNL4JcrEtZfpE8PqgBCJzsQYRZzOp9m3jJlgyJ9YWQA/VQ2Cj
 cVXs7tTGTdpPQdgRRFHF/Z/7H+KUcp9XV5ahDRwqDryQkZHuFvwyZFfpLspqvc5P
 OJio9WY0y93QmRRsuIC/ig0+CnxDXeqHiRwbprIMvKbPxxaGn4s24VioRDa0PRzT
 v9HcjtUhs9q8iTo4TZJrggD+rPm523a9iiDU6SbM2qlM3XUpBis/fjbLN82Nnas+
 ADa0/DAsOE0I1WltoaaNUbweuflGw2NnFxnUueqq8dK23/Vabu3vw4tohiNp+/3m
 km8Is3j7lGKobQ3AKYIFlbeAqdsyASZkSDfA9IZr3SIeYBWgPljd9n+6BIPOqPNq
 S2HtOLke/XX2KEM0BHzgu4XliE9P9+B9lQETI4MehP0rMzTURGKh87du41YHckp1
 beX22aeOuv//ok11JTs6M+StREKeTrl+dTStn0U1jt6HyMzseGaNoy3Eib5ePBCb
 C2ZZRz4OR2MWvWUFxKms
 =QYvv
 -----END PGP SIGNATURE-----

Merge tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md

Pull md bug fixes from Neil Brown:
 "Two more bug fixes for md.

  One bugfix for a list corruption in raid5 because of incorrect
  locking.

  Other for possible data corruption when a recovering device is failed,
  removed, and re-added.

  Both tagged for -stable"

* tag 'md/4.3-rc7-fixes' of git://neil.brown.name/md:
  Revert "md: allow a partially recovered device to be hot-added to an array."
  md/raid5: fix locking in handle_stripe_clean_event()
2015-10-31 21:20:49 -07:00
David S. Miller
b75ec3af27 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2015-11-01 00:15:30 -04:00
Song Liu
339421def5 MD: when RAID journal is missing/faulty, block RESTART_ARRAY_RW
When RAID-4/5/6 array suffers from missing journal device, we put
the array in read only state. We should not allow trasition to
read-write states (clean and active) before replacing journal device.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
f2076e7d06 MD: set journal disk ->raid_disk
Set journal disk ->raid_disk to >=0, I choose raid_disks + 1 instead of
0, because we already have a disk with ->raid_disk 0 and this causes
sysfs entry creation conflict. A lot of places assumes disk with
->raid_disk >=0 is normal raid disk, so we add check for journal disk.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Song Liu
a3dfbdaadb MD: kick out journal disk if it's not fresh
When journal disk is faulty and we are reassemabling the raid array, the
journal disk is old. We don't allow the journal disk added to the raid
array. Since journal disk is missing in the array, the raid5 will mark
the array readonly.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
7dde2ad3c5 raid5-cache: start raid5 readonly if journal is missing
If raid array is expected to have journal (eg, journal is set in MD
superblock feature map) and the array is started without journal disk,
start the array readonly.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Song Liu
a97b789644 MD: add new bit to indicate raid array with journal
If a raid array has journal feature bit set, add a new bit to indicate
this. If the array is started without journal disk existing, we know
there is something wrong.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
6e74a9cfb5 raid5-cache: IO error handling
There are 3 places the raid5-cache dispatches IO. The discard IO error
doesn't matter, so we ignore it. The superblock write IO error can be
handled in MD core. The remaining are log write and flush. When the IO
error happens, we mark log disk faulty and fail all write IO. Read IO is
still allowed to run. Userspace will get a notification too and
corresponding daemon can choose setting raid array readonly for example.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
c2bb6242ec raid5: journal disk can't be removed
raid5-cache uses journal disk rdev->bdev, rdev->mddev in several places.
Don't allow journal disk disappear magically. On the other hand, we do
need to update superblock for other disks to bump up ->events, so next
time journal disk will be identified as stale.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
4b482044d2 raid5-cache: add trim support for log
Since superblock is updated infrequently, we do a simple trim of log
disk (a synchronous trim)

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Shaohua Li
9efdca16e0 MD: fix info output for journal disk
journal disk can be faulty. The Journal and Faulty aren't exclusive with
each other.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:29 +11:00
Christoph Hellwig
6143e2cecb raid5-cache: use bio chaining
Simplify the bio completion handler by using bio chaining and submitting
bios as soon as they are full.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
2b8ef16ec4 raid5-cache: small log->seq cleanup
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
c1b9919849 raid5-cache: new helper: r5_reserve_log_entry
Factor out code to reserve log space.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
51039cd066 raid5-cache: inline r5l_alloc_io_unit into r5l_new_meta
This is the only user, and keeping all code initializing the io_unit
structure together improves readbility.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
1e932a37cc raid5-cache: take rdev->data_offset into account early on
Set up bi_sector properly when we allocate an bio instead of updating it
at submission time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
b349feb36c raid5-cache: refactor bio allocation
Split out a helper to allocate a bio for log writes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
22581f58ed raid5-cache: clean up r5l_get_meta
Remove the only partially used local 'io' variable to simplify the code
flow.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
56fef7c6e0 raid5-cache: simplify state machine when caches flushes are not needed
For devices without a volatile write cache we don't need to send a FLUSH
command to ensure writes are stable on disk, and thus can avoid the whole
step of batching up bios for processing by the MD thread.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:28 +11:00
Christoph Hellwig
d8858f4321 raid5-cache: factor out a helper to run all stripes for an I/O unit
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Christoph Hellwig
04732f741d raid5-cache: rename flushed_ios to finished_ios
After this series we won't nessecarily have flushed the cache for these
I/Os, so give the list a more neutral name.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Christoph Hellwig
170364619a raid5-cache: free I/O units earlier
There is no good reason to keep the I/O unit structures around after the
stripe has been written back to the RAID array.  The only information
we need is the log sequence number, and the checkpoint offset of the
highest successfull writeback.  Store those in the log structure, and
free the IO units from __r5l_stripe_write_finished.

Besides simplifying the code this also avoid having to keep the allocation
for the I/O unit around for a potentially long time as superblock updates
that checkpoint the log do not happen very often.

This also fixes the previously incorrect calculation of 'free' in
r5l_do_reclaim as a side effect: previous if took the last unit which
isn't checkpointed into account.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Shaohua Li
e6c033f79a raid5-cache: move reclaim stop to quiesce
Move reclaim stop to quiesce handling, where is safer for this stuff.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Shaohua Li
ac6096e9d5 md: show journal for journal disk in disk state sysfs
Journal disk state sysfs entry should indicate it's journal

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Song Liu
0b020e85bd skip match_mddev_units check for special roles
match_mddev_units is used to check whether 2 RAID arrays share
same disk(s). Arrays that share disk(s) will not do resync at the
same time for better performance (fewer HDD seek). However, this
check should not apply to Spare, Faulty, and Journal disks, as
they do not paticipate in resync.

In this patch, match_mddev_units skips check for disks with flag
"Faulty" or "Journal" or raid_disk < 0.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Shaohua Li
253f9fd41a raid5-cache: don't delay stripe captured in log
There is a case a stripe gets delayed forever.
1. a stripe finishes construction
2. a new bio hits the stripe
3. handle_stripe runs for the stripe. The stripe gets DELAYED bit set
since construction can't run for new bio (the stripe is locked since
step 1)

Without log, handle_stripe will call ops_run_io. After IO finishes, the
stripe gets unlocked and the stripe will restart and run construction
for the new bio. With log, ops_run_io need to run two times. If the
DELAYED bit set, the stripe can't enter into the handle_list, so the
second ops_run_io doesn't run, which leaves the stripe stalled.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:27 +11:00
Shaohua Li
85f2f9a4f4 raid5-cache: check stripe finish out of order
stripes could finish out of order. Hence r5l_move_io_unit_list() of
__r5l_stripe_write_finished might not move any entry and leave
stripe_end_ios list empty.

This applies on top of http://marc.info/?l=linux-raid&m=144122700510667

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
bd18f6462f md: skip resync for raid array with journal
If a raid array has journal, the journal can guarantee the consistency,
we can skip resync after a unclean shutdown. The exception is raid
creation or user initiated resync, which we still do a raid resync.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
828cbe989e raid5-cache: optimize FLUSH IO with log enabled
With log enabled, bio is written to raid disks after the bio is settled
down in log disk. The recovery guarantees we can recovery the bio data
from log disk, so we we skip FLUSH IO.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Christoph Hellwig
509ffec708 raid5-cache: move functionality out of __r5l_set_io_unit_state
Just keep __r5l_set_io_unit_state as a small set the state wrapper, and
remove r5l_set_io_unit_state entirely after moving the real
functionality to the two callers that need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
0fd22b45b2 raid5-cache: fix a user-after-free bug
r5l_compress_stripe_end_list() can free an io_unit. This breaks the
assumption only reclaimer can free io_unit. We can add a reference count
based io_unit free, but since only reclaim can wait io_unit becoming to
STRIPE_END state, we use a simple global wait queue here.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
a8c34f9159 raid5-cache: switching to state machine for log disk cache flush
Before we write stripe data to raid disks, we must guarantee stripe data
is settled down in log disk. To do this, we flush log disk cache and
wait the flush finish. That wait introduces sleep time in raid5d thread
and impact performance. This patch moves the log disk cache flush
process to the stripe handling state machine, which can remove the wait
in raid5d.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
5c7e81c3de raid5: enable log for raid array with cache disk
Now log is safe to enable for raid array with cache disk

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00
Shaohua Li
713cf5a639 raid5: don't allow resize/reshape with cache(log) support
If cache(log) support is enabled, don't allow resize/reshape in current
stage. In the future, we can flush all data from cache(log) to raid
before resize/reshape and then allow resize/reshape.

Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: NeilBrown <neilb@suse.com>
2015-11-01 13:48:26 +11:00