Commit graph

362537 commits

Author SHA1 Message Date
Yuchung Cheng
e33099f96d tcp: implement RFC5682 F-RTO
This patch implements F-RTO (foward RTO recovery):

When the first retransmission after timeout is acknowledged, F-RTO
sends new data instead of old data. If the next ACK acknowledges
some never-retransmitted data, then the timeout was spurious and the
congestion state is reverted.  Otherwise if the next ACK selectively
acknowledges the new data, then the timeout was genuine and the
loss recovery continues. This idea applies to recurring timeouts
as well. While F-RTO sends different data during timeout recovery,
it does not (and should not) change the congestion control.

The implementaion follows the three steps of SACK enhanced algorithm
(section 3) in RFC5682. Step 1 is in tcp_enter_loss(). Step 2 and
3 are in tcp_process_loss().  The basic version is not supported
because SACK enhanced version also works for non-SACK connections.

The new implementation is functionally in parity with the old F-RTO
implementation except the one case where it increases undo events:
In addition to the RFC algorithm, a spurious timeout may be detected
without sending data in step 2, as long as the SACK confirms not
all the original data are dropped. When this happens, the sender
will undo the cwnd and perhaps enter fast recovery instead. This
additional check increases the F-RTO undo events by 5x compared
to the prior implementation on Google Web servers, since the sender
often does not have new data to send for HTTP.

Note F-RTO may detect spurious timeout before Eifel with timestamps
does so.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:51 -04:00
Yuchung Cheng
ab42d9ee3d tcp: refactor CA_Loss state processing
Consolidate all of TCP CA_Loss state processing in
tcp_fastretrans_alert() into a new function called tcp_process_loss().
This is to prepare the new F-RTO implementation in the next patch.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:51 -04:00
Yuchung Cheng
9b44190dc1 tcp: refactor F-RTO
The patch series refactor the F-RTO feature (RFC4138/5682).

This is to simplify the loss recovery processing. Existing F-RTO
was developed during the experimental stage (RFC4138) and has
many experimental features.  It takes a separate code path from
the traditional timeout processing by overloading CA_Disorder
instead of using CA_Loss state. This complicates CA_Disorder state
handling because it's also used for handling dubious ACKs and undos.
While the algorithm in the RFC does not change the congestion control,
the implementation intercepts congestion control in various places
(e.g., frto_cwnd in tcp_ack()).

The new code implements newer F-RTO RFC5682 using CA_Loss processing
path.  F-RTO becomes a small extension in the timeout processing
and interfaces with congestion control and Eifel undo modules.
It lets congestion control (module) determines how many to send
independently.  F-RTO only chooses what to send in order to detect
spurious retranmission. If timeout is found spurious it invokes
existing Eifel undo algorithms like DSACK or TCP timestamp based
detection.

The first patch removes all F-RTO code except the sysctl_tcp_frto is
left for the new implementation.  Since CA_EVENT_FRTO is removed, TCP
westwood now computes ssthresh on regular timeout CA_EVENT_LOSS event.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:47:50 -04:00
Greg Kroah-Hartman
d93acbcacd usb: fixes for v3.9-rc4
udc-core learned that it shouldn't use invalid pointers
 when unloading a gadget driver.
 
 net2272 and net2280 got a fix for a regression caused by
 the udc_start/udc_stop conversion.
 
 We're defining a static inline no-op for otg_ulpi_create()
 to prevent build errors when that driver isn't enabled.
 
 FunctionFS got a fix for an off-by-one error when binding
 and unbinding instances of FunctionFS.
 
 MUSB learned that it shouldn't try to unmap buffers which
 weren't previously mapped.
 
 f_rndis got a fix for a possible NULL pointer dereference
 in a debugging message code.
 
 MUSB's DA8xx glue layer got a build fix due to a typo.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABAgAGBQJRSu6dAAoJEIaOsuA1yqREdA8P/R4iKKZ3H23xfSJirQaEYADY
 ejeCefHeev+QFXcRVtkhTMGV1Kt6ZLAJ19wrTT3xdENjUbBZdx+9GzN/bUsr5yAO
 SjBFSzVkbbELn4es+hbUPIJgGDa+/DOs9nLgDLlzdaaWbbyvpsl5cbokGAYFgstn
 xy/36e5wctm066cZzG78+cewtuvKxXHANhZt7tTNLWfu/ARaBfiDoYH+fhobJwzq
 poZ47hZPxPE5nn9ppB16jtvpAFAXT8AQZg4SGA2yIRKXkExNNCOUh2xxIHnmWdI2
 k3Qp0YUlCUsoCifFu6k0vhJBxctbi9AVTnnBJXWokw4tX4bVt9uglDkXXZdOfQ/y
 vA3j+lY9rSjuBixmNjVlvwP77qyR85ILFF9WDhwsGrSNJyeUyV/Fmy2s7sRZFLjL
 X6ziN2Tj/3gC6uaO5Rbgmw7aURy8UyML/byBVq/uRMTu0NJPGxnC5xM2WPmGl9nz
 dWP2mcd193rxys8GYH7G6zz4MJ3WFyPgJ0VszsT/kGI/rL0ij1xTuKnw7zsP38Mk
 RINfeBcY2Msi0Mt4KnXJ8vgSIyI3XOwVCebKCAIdc4rktVAryKCJTjtXMlHD4niI
 rTztkXPkM4bcEiqa29FIhGkiYsQfQ66N8a9RbH6nWZSvhbgs1p8Y721FYfTKt+54
 oTvm35fUgthYTtpig5bN
 =wGIr
 -----END PGP SIGNATURE-----

Merge tag 'fixes-for-v3.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/balbi/usb into usb-linus

Felipe writes:

	usb: fixes for v3.9-rc4

	udc-core learned that it shouldn't use invalid pointers
	when unloading a gadget driver.

	net2272 and net2280 got a fix for a regression caused by
	the udc_start/udc_stop conversion.

	We're defining a static inline no-op for otg_ulpi_create()
	to prevent build errors when that driver isn't enabled.

	FunctionFS got a fix for an off-by-one error when binding
	and unbinding instances of FunctionFS.

	MUSB learned that it shouldn't try to unmap buffers which
	weren't previously mapped.

	f_rndis got a fix for a possible NULL pointer dereference
	in a debugging message code.

	MUSB's DA8xx glue layer got a build fix due to a typo.
2013-03-21 08:40:22 -07:00
Linus Torvalds
0a7e453103 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management fixes from Zhang Rui.

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
  thermal: exynos_thermal: return a proper error code while thermal_zone_device_register fail.
  thermal: rcar_thermal: propagate return value of thermal_zone_device_register
  Thermal: kirkwood: Convert to devm_ioremap_resource()
  Thermal: rcar: Convert to devm_ioremap_resource()
  Thermal: dove: Convert to devm_ioremap_resource()
  thermal: rcar: fix missing unlock on error in rcar_thermal_update_temp()
2013-03-21 08:37:10 -07:00
Daniel Borkmann
e306e2c13b filter: add minimal BPF JIT image disassembler
This is a minimal stand-alone user space helper, that allows for debugging or
verification of emitted BPF JIT images. This is in particular useful for
emitted opcode debugging, since minor bugs in the JIT compiler can be fatal.
The disassembler is architecture generic and uses libopcodes and libbfd.

How to get to the disassembly, example:

  1) `echo 2 > /proc/sys/net/core/bpf_jit_enable`
  2) Load a BPF filter (e.g. `tcpdump -p -n -s 0 -i eth1 host 192.168.20.0/24`)
  3) Run e.g. `bpf_jit_disasm -o` to disassemble the most recent JIT code output

`bpf_jit_disasm -o` will display the related opcodes to a particular instruction
as well. Example for x86_64:

$ ./bpf_jit_disasm
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0:	push   %rbp
   1:	mov    %rsp,%rbp
   4:	sub    $0x60,%rsp
   8:	mov    %rbx,-0x8(%rbp)
   c:	mov    0x68(%rdi),%r9d
  10:	sub    0x6c(%rdi),%r9d
  14:	mov    0xe0(%rdi),%r8
  1b:	mov    $0xc,%esi
  20:	callq  0xffffffffe0d01b71
  25:	cmp    $0x86dd,%eax
  2a:	jne    0x000000000000003d
  2c:	mov    $0x14,%esi
  31:	callq  0xffffffffe0d01b8d
  36:	cmp    $0x6,%eax
[...]
  5c:	leaveq
  5d:	retq

$ ./bpf_jit_disasm -o
94 bytes emitted from JIT compiler (pass:3, flen:9)
ffffffffa0356000 + <x>:
   0:	push   %rbp
	55
   1:	mov    %rsp,%rbp
	48 89 e5
   4:	sub    $0x60,%rsp
	48 83 ec 60
   8:	mov    %rbx,-0x8(%rbp)
	48 89 5d f8
   c:	mov    0x68(%rdi),%r9d
	44 8b 4f 68
  10:	sub    0x6c(%rdi),%r9d
	44 2b 4f 6c
[...]
  5c:	leaveq
	c9
  5d:	retq
	c3

Signed-off-by: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:35:41 -04:00
Linus Torvalds
cd82346934 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
 "A fair chunk of the linecount comes from a fix for a tracing bug that
  corrupts latency tracing buffers when the overwrite mode is changed on
  the fly - the rest is mostly assorted fewliner fixlets."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86: Add SNB/SNB-EP scheduling constraints for cycle_activity event
  kprobes/x86: Check Interrupt Flag modifier when registering probe
  kprobes: Make hash_64() as always inlined
  perf: Generate EXIT event only once per task context
  perf: Reset hwc->last_period on sw clock events
  tracing: Prevent buffer overwrite disabled for latency tracers
  tracing: Keep overwrite in sync between regular and snapshot buffers
  tracing: Protect tracer flags with trace_types_lock
  perf tools: Fix LIBNUMA build with glibc 2.12 and older.
  tracing: Fix free of probe entry by calling call_rcu_sched()
  perf/POWER7: Create a sysfs format entry for Power7 events
  perf probe: Fix segfault
  libtraceevent: Remove hard coded include to /usr/local/include in Makefile
  perf record: Fix -C option
  perf tools: check if -DFORTIFY_SOURCE=2 is allowed
  perf report: Fix build with NO_NEWT=1
  perf annotate: Fix build with NO_NEWT=1
  tracing: Fix race in snapshot swapping
2013-03-21 08:29:11 -07:00
Linus Torvalds
172a271b5e Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
 "Radeon, intel and nouveau, along with one mgag200 fix

   - intel fix for an ioctl overflow, along with a regression fix for
     some phantom irqs on Ironlake.
   - nouveau has a lockdep warning and a bunch of thermal fixes
   - radeon has new pci ids and some minor fixes."

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (26 commits)
  drm/mgag200: Bug fix: Modified pll algorithm for EH project
  drm/i915: stop using GMBUS IRQs on Gen4 chips
  drm/nv50/kms: prevent lockdep false-positive in page flipping path
  drm/nouveau/core: fix return value of nouveau_object_del()
  MAINTAINERS: intel-gfx is no longer subscribers-only
  drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()
  drm/nouveau/hwmon: do not expose a buggy temperature if it is unavailable
  drm/nouveau/therm: display the availability of the internal sensor
  drm/nouveau/therm: disable temperature management if the sensor isn't readable
  drm/nouveau/therm: disable auto fan management if temperature is not available
  drm/nv40/therm: reserve negative temperatures for errors
  drm/nv40/therm: disable temperature reading if the bios misses some parameters
  drm/nouveau/therm-ic: the temperature is off by sensor_constant, warn the user
  drm/nouveau/therm: remove some confusion introduced by therm_mode
  drm/nouveau/therm: do not make assumptions on temperature
  drm/nv40/therm: increase the sensor's settling delay to 20ms
  drm/nv40/therm: improve selection between the old and the new style
  Revert "drm/i915: try to train DP even harder"
  drm/radeon: add Richland pci ids
  drm/radeon: add support for Richland APUs
  ...
2013-03-21 08:27:58 -07:00
Linus Torvalds
85ab3c4617 A set of device-mapper fixes for 3.9.
Fix reported data loss with discards and thin snapshots; avoid a
 deadlock observed in dm verity; fix a race in the new dm cache code
 along with some other minor bugs; store the cache policy version on disk
 to make the stored hints format future-proof.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABAgAGBQJRSfH2AAoJEK2W1qbAHj1nENIP+wQCLqY5Sl1uBbtJ1s1T102U
 RDDqSNQKRvsRlR+lfL4QR+kAeUyOyU8C4K/RPh3G4BUXjT+3UxMtQ1ZeliUQX8qa
 Vt2gywtAS+eyydOHgUmywnDc/u9OT5JRcbww/uEoeUj9GnhBZJQ205t4eqPxcP6q
 cx0S3pBAQqwMu2BTmBQY1CrDP/7LQ545TU/PPbu/Y7VStgd+XyNkPA5VcuyBuwl1
 MlK1666x2QMQ5whJBSs8diGZYXZo4rRLUWQuaytJdyvANdip7Kv1jZyx/M1nMBk2
 /dtBY79RoOQEeUrptwF/BltuFrXY1YGnX5f/lhCaVtpaKusKcLX19foIM+0jnlVg
 qigMn0RVtXKckWfSVa0rA5I+XFpRwjgPtqAIbeFWYrOG6qcjamvlIB4fz2ARMYDD
 clymfIzPk/HRrbgmrFvZSa/LCwLWrU6ZXchVuPKzqjtj1doBBdKT4tgrLeIlsNJE
 Cx1q5bLKJAUXGpoW9yfNGWrAujF6rmMU0xhOk+SjmUQU5JZguCpX+5TsCqB43tup
 wW8/BamGWijlsIJzFi0VyvtqspXL/nwfXO+hx8KitXaszduj5EulDJ9dopqJOV1p
 N9VfCQ6xbWW+iTBsnkCf5bgNPSZpPa2V/CQrXM6eSmYQhnK1YMTKXmYRfCnBJtjC
 rmvMV/VpWddO82NA4bq0
 =7+/X
 -----END PGP SIGNATURE-----

Merge tag 'dm-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm

Pull device-mapper fixes from Alasdair G Kergon:
 "Fix reported data loss with discards and thin snapshots; avoid a
  deadlock observed in dm verity; fix a race in the new dm cache code
  along with some other minor bugs; store the cache policy version on
  disk to make the stored hints format future-proof."

* tag 'dm-3.9-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
  dm cache: policy ignore hints if generated by different version
  dm cache: policy change version from string to integer set
  dm cache: fix race in writethrough implementation
  dm cache: metadata clear dirty bits on clean shutdown
  dm cache: avoid calling policy destructor twice on error
  dm cache: detect cache_create failure
  dm cache: avoid 64 bit division on 32 bit
  dm verity: avoid deadlock
  dm thin: fix non power of two discard granularity calc
  dm thin: fix discard corruption
2013-03-21 08:27:03 -07:00
David S. Miller
b34870fc9f Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into wireless
John W. Linville says:

====================
This is a big pull request for new features intended for the 3.10
stream...

Regarding mac80211, Johannes says:

"First, I merged mac80211/master to avoid some conflicts. This brings in
a bunch of fixes you're already familiar with. For real -next material,
I have a whole bunch of minstrel work, minstrel_ht from Felix and legacy
minstrel from Thomas (Huehn). The other Thomas (Pedersen) did a number
of changes in mesh to allow userspace peering management even when the
mesh isn't secured. Stanislaw changes suspend/resume to always
disconnect the networks. This is typically already done by
network-manager so won't make a huge difference for most users, but
fixes a number problems, particularly with USB drivers that can easily
disconnect while suspended. Ilan has a small change to allow mac80211
drivers to differentiate remain-on-channel reasons, and Jouni extends
nl80211 to allow fast roaming with full-MAC devices. I have a fairly
large number of patches as well, many of them fairly simple cleanups,
but also allowing split wiphy dumps and adding back the full wiphy
information in nl80211, station entry change checking and more VHT work
including VHT capability overrides (mostly for testing purposes)."

And for iwlwifi, Johannes says:

"Here, I also merged iwlwifi-fixes to avoid conflicts, and otherwise have
various cleanups and improvements on the MVM driver, along with a few
throughout the driver. Other than Bluetooth Coexistence from Emmanuel
there's no over-arching theme, so listing them would pretty much
reproduce the shortlog."

Regarding NFC, Samuel says:

"The 2 features we have with this one are:

- An LLCP Service Name Lookup (SNL) netlink interface for querying LLCP
  service availability from user space.
  Along the way, Thierry also improved the existing SNL interface for
  aggregating SNL responses.

- An initial LLCP socket options implementation, for setting the Receive
  Window (RW) and the Maximum Information Unit Extension (MIUX) per socket.
  This is need for the LLCP validation tests.

We also have a microread MEI build failure here: I am not sending this one to
3.9 because the MEI bus code is not there yet, so it won't break for anyone
else than me."

And for ath6kl, Kalle says:

"I added tracing support to ath6kl, along with a new Kconfig option. Now
there's also a workaround to reset USB devices when the firmware upload
fails, this happened when host was warm rebooted. There are also quite a
few small fixes or cleanup."

On top of all that, there is the usual bundle of driver updates
with new features, new hardware support and the like mixed-in.
The ath9k, b43, brcmfmac, mwifiex, rt2800, and wil6210 drivers
are all well-represented, and a few other drivers are hit as well.
I also pulled-in the wireless fixes tree in order to resolve some
pending merge conflicts.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-21 11:16:35 -04:00
Trond Myklebust
240286725d NFSv4.1: Add a helper pnfs_commit_and_return_layout
In order to be able to safely return the layout in nfs4_proc_setattr,
we need to block new uses of the layout, wait for all outstanding
users of the layout to complete, commit the layout and then return it.

This patch adds a helper in order to do all this safely.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
2013-03-21 10:31:21 -04:00
Trond Myklebust
2495680434 NFSv4.1: Always clear the NFS_INO_LAYOUTCOMMIT in layoutreturn
Note that clearing NFS_INO_LAYOUTCOMMIT is tricky, since it requires
you to also clear the NFS_LSEG_LAYOUTCOMMIT bits from the layout
segments.
The only two sites that need to do this are the ones that call
pnfs_return_layout() without first doing a layout commit.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:21 -04:00
Trond Myklebust
a073dbff35 NFSv4.1: Fix a race in pNFS layoutcommit
We need to clear the NFS_LSEG_LAYOUTCOMMIT bits atomically with the
NFS_INO_LAYOUTCOMMIT bit, otherwise we may end up with situations
where the two are out of sync.
The first half of the problem is to ensure that pnfs_layoutcommit_inode
clears the NFS_LSEG_LAYOUTCOMMIT bit through pnfs_list_write_lseg.
We still need to keep the reference to those segments until the RPC call
is finished, so in order to make it clear _where_ those references come
from, we add a helper pnfs_list_write_lseg_done() that cleans up after
pnfs_list_write_lseg.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Benny Halevy <bhalevy@tonian.com>
Cc: stable@vger.kernel.org
2013-03-21 10:31:19 -04:00
fanchaoting
4376c94618 pnfs-block: removing DM device maybe cause oops when call dev_remove
when pnfs block using device mapper,if umounting later,it maybe
cause oops. we apply "1 + sizeof(bl_umount_request)" memory for
msg->data, the memory maybe overflow when we do "memcpy(&dataptr
[sizeof(bl_msg)], &bl_umount_request, sizeof(bl_umount_request))",
because the size of bl_msg is more than 1 byte.

Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-21 10:11:06 -04:00
Matt Fleming
e971318bbe efivars: Handle duplicate names from get_next_variable()
Some firmware exhibits a bug where the same VariableName and
VendorGuid values are returned on multiple invocations of
GetNextVariableName(). See,

    https://bugzilla.kernel.org/show_bug.cgi?id=47631

As a consequence of such a bug, Andre reports hitting the following
WARN_ON() in the sysfs code after updating the BIOS on his, "Gigabyte
Technology Co., Ltd. To be filled by O.E.M./Z77X-UD3H, BIOS F19e
11/21/2012)" machine,

[    0.581554] EFI Variables Facility v0.08 2004-May-17
[    0.584914] ------------[ cut here ]------------
[    0.585639] WARNING: at /home/andre/linux/fs/sysfs/dir.c:536 sysfs_add_one+0xd4/0x100()
[    0.586381] Hardware name: To be filled by O.E.M.
[    0.587123] sysfs: cannot create duplicate filename '/firmware/efi/vars/SbAslBufferPtrVar-01f33c25-764d-43ea-aeea-6b5a41f3f3e8'
[    0.588694] Modules linked in:
[    0.589484] Pid: 1, comm: swapper/0 Not tainted 3.8.0+ #7
[    0.590280] Call Trace:
[    0.591066]  [<ffffffff81208954>] ? sysfs_add_one+0xd4/0x100
[    0.591861]  [<ffffffff810587bf>] warn_slowpath_common+0x7f/0xc0
[    0.592650]  [<ffffffff810588bc>] warn_slowpath_fmt+0x4c/0x50
[    0.593429]  [<ffffffff8134dd85>] ? strlcat+0x65/0x80
[    0.594203]  [<ffffffff81208954>] sysfs_add_one+0xd4/0x100
[    0.594979]  [<ffffffff81208b78>] create_dir+0x78/0xd0
[    0.595753]  [<ffffffff81208ec6>] sysfs_create_dir+0x86/0xe0
[    0.596532]  [<ffffffff81347e4c>] kobject_add_internal+0x9c/0x220
[    0.597310]  [<ffffffff81348307>] kobject_init_and_add+0x67/0x90
[    0.598083]  [<ffffffff81584a71>] ? efivar_create_sysfs_entry+0x61/0x1c0
[    0.598859]  [<ffffffff81584b2b>] efivar_create_sysfs_entry+0x11b/0x1c0
[    0.599631]  [<ffffffff8158517e>] register_efivars+0xde/0x420
[    0.600395]  [<ffffffff81d430a7>] ? edd_init+0x2f5/0x2f5
[    0.601150]  [<ffffffff81d4315f>] efivars_init+0xb8/0x104
[    0.601903]  [<ffffffff8100215a>] do_one_initcall+0x12a/0x180
[    0.602659]  [<ffffffff81d05d80>] kernel_init_freeable+0x13e/0x1c6
[    0.603418]  [<ffffffff81d05586>] ? loglevel+0x31/0x31
[    0.604183]  [<ffffffff816a6530>] ? rest_init+0x80/0x80
[    0.604936]  [<ffffffff816a653e>] kernel_init+0xe/0xf0
[    0.605681]  [<ffffffff816ce7ec>] ret_from_fork+0x7c/0xb0
[    0.606414]  [<ffffffff816a6530>] ? rest_init+0x80/0x80
[    0.607143] ---[ end trace 1609741ab737eb29 ]---

There's not much we can do to work around and keep traversing the
variable list once we hit this firmware bug. Our only solution is to
terminate the loop because, as Lingzhu reports, some machines get
stuck when they encounter duplicate names,

  > I had an IBM System x3100 M4 and x3850 X5 on which kernel would
  > get stuck in infinite loop creating duplicate sysfs files because,
  > for some reason, there are several duplicate boot entries in nvram
  > getting GetNextVariableName into a circle of iteration (with
  > period > 2).

Also disable the workqueue, as efivar_update_sysfs_entries() uses
GetNextVariableName() to figure out which variables have been created
since the last iteration. That algorithm isn't going to work if
GetNextVariableName() returns duplicates. Note that we don't disable
EFI variable creation completely on the affected machines, it's just
that any pstore dump-* files won't appear in sysfs until the next
boot.

Reported-by: Andre Heider <a.heider@gmail.com>
Reported-by: Lingzhu Xiang <lxiang@redhat.com>
Tested-by: Lingzhu Xiang <lxiang@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-03-21 12:43:46 +00:00
Matt Fleming
ec50bd32f1 efivars: explicitly calculate length of VariableName
It's not wise to assume VariableNameSize represents the length of
VariableName, as not all firmware updates VariableNameSize in the same
way (some don't update it at all if EFI_SUCCESS is returned). There
are even implementations out there that update VariableNameSize with
values that are both larger than the string returned in VariableName
and smaller than the buffer passed to GetNextVariableName(), which
resulted in the following bug report from Michael Schroeder,

  > On HP z220 system (firmware version 1.54), some EFI variables are
  > incorrectly named :
  >
  > ls -d /sys/firmware/efi/vars/*8be4d* | grep -v -- -8be returns
  > /sys/firmware/efi/vars/dbxDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
  > /sys/firmware/efi/vars/KEKDefault-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
  > /sys/firmware/efi/vars/SecureBoot-pport8be4df61-93ca-11d2-aa0d-00e098032b8c
  > /sys/firmware/efi/vars/SetupMode-Information8be4df61-93ca-11d2-aa0d-00e098032b8c

The issue here is that because we blindly use VariableNameSize without
verifying its value, we can potentially read garbage values from the
buffer containing VariableName if VariableNameSize is larger than the
length of VariableName.

Since VariableName is a string, we can calculate its size by searching
for the terminating NULL character.

Reported-by: Frederic Crozat <fcrozat@suse.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Michael Schroeder <mls@suse.com>
Cc: Lee, Chun-Yi <jlee@suse.com>
Cc: Lingzhu Xiang <lxiang@redhat.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-03-21 12:43:46 +00:00
Seth Forshee
ec0971ba53 efivars: Add module parameter to disable use as a pstore backend
We know that with some firmware implementations writing too much data to
UEFI variables can lead to bricking machines. Recent changes attempt to
address this issue, but for some it may still be prudent to avoid
writing large amounts of data until the solution has been proven on a
wide variety of hardware.

Crash dumps or other data from pstore can potentially be a large data
source. Add a pstore_module parameter to efivars to allow disabling its
use as a backend for pstore. Also add a config option,
CONFIG_EFI_VARS_PSTORE_DEFAULT_DISABLE, to allow setting the default
value of this paramter to true (i.e. disabled by default).

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-03-21 12:43:46 +00:00
Seth Forshee
ed9dc8ce7a efivars: Allow disabling use as a pstore backend
Add a new option, CONFIG_EFI_VARS_PSTORE, which can be set to N to
avoid using efivars as a backend to pstore, as some users may want to
compile out the code completely.

Set the default to Y to maintain backwards compatability, since this
feature has always been enabled until now.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Cc: Josh Boyer <jwboyer@redhat.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2013-03-21 12:43:46 +00:00
Paul Bolle
eda81bea89 usb: gadget: net2272: finally convert "CONFIG_USB_GADGET_NET2272_DMA"
The Kconfig symbol USB_GADGET_NET2272_DMA was renamed to USB_NET2272_DMA
in commit 193ab2a607 ("usb: gadget: allow
multiple gadgets to be built"). That commit did not convert the only
occurrence of the corresponding Kconfig macro. Convert that macro now.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Felipe Balbi <balbi@ti.com>
2013-03-21 12:14:05 +02:00
Dave Airlie
b56fb70870 Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
Daniel writes:
Bunch of fixes, all pretty high-priority
- Fix execbuf argument checking (Kees Cook)
- Optionally obfuscate kernel addresses in dumps (Kees Cook)
- Two patches from Takashi Iwai to fix DP link training regressions he's
  seen.
- intel-gfx is no longer subscribers-only (well, just no longer moderated
  in an annoying way for non-subscribers), update MAINTAINERS
- gm45 gmbus irq fallout fix (Jiri Kosina)

* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
  drm/i915: stop using GMBUS IRQs on Gen4 chips
  MAINTAINERS: intel-gfx is no longer subscribers-only
  drm/i915: Use the fixed pixel clock for eDP in intel_dp_set_m_n()
  Revert "drm/i915: try to train DP even harder"
  drm/i915: bounds check execbuffer relocation count
  drm/i915: restrict kernel address leak in debugfs
2013-03-21 10:17:38 +10:00
Julia Lemire
260b3f1291 drm/mgag200: Bug fix: Modified pll algorithm for EH project
While testing the mgag200 kms driver on the HP ProLiant Gen8, a
bug was seen.  Once the bootloader would load the selected kernel,
the screen would go black.  At first it was assumed that the
mgag200 kms driver was hanging.  But after setting up the grub
serial output, it was seen that the driver was being loaded
properly.  After trying serval monitors, one finaly displayed
the message "Frequency Out of Range".  By comparing the kms pll
algorithm with the previous mgag200 xorg driver pll algorithm,
discrepencies were found.  Once the kms pll algorithm was
modified, the expected pll values were produced.  This fix was
tested on several monitors of varying native resolutions.

Signed-off-by: Julia Lemire <jlemire@matrox.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>
2013-03-21 10:16:58 +10:00
Alan Stern
d714aaf649 USB: EHCI: fix regression in QH unlinking
This patch (as1670) fixes a regression caused by commit
6402c796d3 (USB: EHCI: work around
silicon bug in Intel's EHCI controllers).  The workaround goes through
two IAA cycles for each QH being unlinked.  During the first cycle,
the QH is not added to the async_iaa list (because it isn't fully gone
from the hardware yet), which means that list will be empty.

Unfortunately, I forgot to update the IAA watchdog timer routine.  It
thinks that an empty async_iaa list means the timer expiration was an
error, which isn't true any more.  This problem didn't show up during
initial testing because the controllers being tested all had working
IAA interrupts.  But not all controllers do, and when the watchdog
timer expires, the empty-list check prevents the second IAA cycle from
starting.  As a result, URB unlinks never complete.  The check needs
to be removed.

Among the symptoms of the regression are processes stuck in D wait
states and hangs during system shutdown.

Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Reported-and-tested-by: Stephen Warren <swarren@wwwdotorg.org>
Reported-and-tested-by: Sven Joachim <svenjoac@gmx.de>
Reported-by: Andreas Bombe <aeb@debian.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2013-03-20 16:17:22 -07:00
Masatake YAMATO
73214f5d9f thermal: shorten too long mcast group name
The original name is too long.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 17:56:58 -04:00
Trond Myklebust
cf4ab538f1 NFSv4: Fix the string length returned by the idmapper
Functions like nfs_map_uid_to_name() and nfs_map_gid_to_group() are
expected to return a string without any terminating NUL character.
Regression introduced by commit 57e62324e4
(NFS: Store the legacy idmapper result in the keyring).

Reported-by: Dave Chiluk <dave.chiluk@canonical.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Bryan Schumaker <bjschuma@netapp.com>
Cc: stable@vger.kernel.org [>=3.4]
2013-03-20 16:45:16 -04:00
John W. Linville
5470b462c3 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2013-03-20 15:24:57 -04:00
stephen hemminger
e76d120b68 chelsio: use netdev_alloc_skb_ip_align
Use netdev_alloc_sk_ip_align in the case where packet is copied.
This handles case where NET_IP_ALIGN == 0 as well as adding required header
padding.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 15:24:40 -04:00
David S. Miller
f379fb991b Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
I present to you another batch of fixes intended for the 3.9 stream...

On the bluetooth bits, Gustavo says:

"I put together 3 fixes intended for 3.9, there are support for two
new devices and a NULL dereference fix in the SCO code."

Amitkumar Karwar fixes a command queueing race in mwifiex.

Bing Zhao provides a pair of mwifiex related to cleaning-up before
a shutdown.

Felix Fietkau provides an ath9k fix for a regression caused by an
earlier calibration fix, and another ath9k fix to avoid race conditions
that unnecessarily lead to chip resets.

Jussi Kivilinna prevents and skbuff leak in rtlwifi.

Stanislaw Gruszka corrects a length paramater for a DMA buffer mapping
operation in iwlegacy.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 15:19:32 -04:00
David S. Miller
a6f68034de net: Move selftests to common net/ subdirectory.
Suggested-by: Daniel Baluta <daniel.baluta@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 15:07:56 -04:00
Fabio Estevam
9d73adf431 fec: Fix the build as module
Since commit ff43da86c6 (NET: FEC: dynamtic check DMA desc buff type) the
following build error happens when CONFIG_FEC=m

ERROR: "fec_ptp_init" [drivers/net/ethernet/freescale/fec.ko] undefined!
ERROR: "fec_ptp_ioctl" [drivers/net/ethernet/freescale/fec.ko] undefined!
ERROR: "fec_ptp_start_cyclecounter" [drivers/net/ethernet/freescale/fec.ko] undefined!

Fix it by exporting the required fec_ptp symbols.

Reported-by: Uwe Kleine-Koenig <u.kleine-koenig@pengutronix.de>
Signed-off-by: Fabio Estevam <fabio.estevam@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 14:45:30 -04:00
John W. Linville
b9d5319041 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-20 14:26:37 -04:00
Daniel Baluta
4c1d8d0617 net: fix psock_fanout selftest bind error message
Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:42:41 -04:00
Takashi Iwai
eb49faa6a4 ALSA: hda - Fix abuse of snd_hda_lock_devices() for DSP loader
The current DSP loader code abuses snd_hda_lock_devices() for ensuring
the DSP loader not conflicting with the other normal operations.  But
this trick obviously doesn't work for the PM resume since the streams
are kept opened there where snd_hda_lock_devices() returns -EBUSY.
That means we need another lock mechanism instead of abuse.

This patch provides the new lock state to azx_dev.  Theoretically it's
possible that the DSP loader conflicts with the stream that has been
already assigned for another PCM.  If it's running, the DSP loader
should simply fail.  If not -- it's the case for PM resume --, we
should assign this stream temporarily to the DSP loader, and take it
back to the PCM after finishing DSP loading.  If the PCM is operated
during the DSP loading, it should get an error, too.

Reported-and-tested-by: Dylan Reid <dgreid@chromium.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2013-03-20 18:36:06 +01:00
Eric Dumazet
70386d40e1 chelsio: add headroom in RX path
Drivers should reserve some headroom in skb used in receive path,
to avoid future head reallocation.

One possible way to do that is to use dev_alloc_skb() instead
of alloc_skb(), so that NET_SKB_PAD bytes are reserved.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:29:34 -04:00
Chris Metcalf
8fdc929f57 dynticks: avoid flow_cache_flush() interrupting every core
Previously, if you did an "ifconfig down" or similar on one core, and
the kernel had CONFIG_XFRM enabled, every core would be interrupted to
check its percpu flow list for items that could be garbage collected.

With this change, we generate a mask of cores that actually have any
percpu items, and only interrupt those cores.  When we are trying to
isolate a set of cpus from interrupts, this is important to do.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:28:39 -04:00
Yuval Mintz
7fa6f34081 bnx2x: AER revised
Revised bnx2x implementation of PCI Express Advanced Error Recovery -
stop and free driver resources according to the AER flow (instead of the
currently implemented `hope-for-the-best' release approach), and do not make
any assumptions on the HW state after slot reset.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Ariel Elior <ariele@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:27:28 -04:00
Wei Yongjun
47a5247fdd net: fec: make local function fec_poll_controller() static
fec_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Wei Yongjun
e052a5893b net: ethernet: davinci_emac: make local function emac_poll_controller() static
emac_poll_controller() was not declared. It should be static.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Mugunthan V N <mugunthanvnm@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
9fad0c941a net: mdio-octeon: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: David Daney <ddaney@caviumnetworks.com>
Acked-by: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
f8e5fc8c20 net: mdio-gpio: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:37 -04:00
Sachin Kamat
95d158df44 net: au1k_ir: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Sachin Kamat
6d7496836d net: s6gmac: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Sachin Kamat
18e4a7374c net: ks8695net: Use module_platform_driver()
module_platform_driver macro removes some boilerplate and
simplifies the code.

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:25:36 -04:00
Jesper Derehag
2b5faa4c55 connector: Added coredumping event to the process connector
Process connector can now also detect coredumping events.

Main aim of patch is get notified at start of coredumping, instead of
having to wait for it to finish and then being notified through EXIT
event.

Could be used for instance by process-managers that want to get
notified as soon as possible about process failures, and not
necessarily beeing notified after coredump, which could be in the
order of minutes depending on size of coredump, piping and so on.

Signed-off-by: Jesper Derehag <jderehag@hotmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:23:21 -04:00
Claudiu Manoil
800c644bcd gianfar: Refactor config coalescing calls for all queues
The only place where gfar_configure_coalescing is called
with an actual bitmask (other than 0xff) is in gfar_poll
(on the hot path). So make gfar_configure_coalescing()
static for the buffer processing path, and export
gfar_configure_coalescing_all() for the remaining cases
that require to set coalescing for all the queues at once
(on the slow path).

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:53 -04:00
Claudiu Manoil
5d9657d83a gianfar: Remove redundant programming of [rt]xic registers
For Multi Q Multi Group (MQ_MG_MODE) mode, the Rx/Tx colescing registers [rt]xic
are aliased with the [rt]xic0 registers (coalescing setting regs for Q0). This
avoids programming twice in a row the coalescing registers for the Rx/Tx hw Q0.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Claudiu Manoil
6be5ed3fef gianfar: Poll only active Rx queues
Split the napi budget fairly among the active queues only, instead
of dividing it by the total number of Rx queues assigned to the
given interrupt group.
Use the h/w indication field RXFi in rstat (receive status register)
to identify the active rx queues from the current interrupt group
(i.e. receive event occured on ring i, if ring i is part of the current
interrupt group). This indication field in rstat, RXFi i=0..7,
allows us to find out on which queues of the same interrupt group
do we have incomming traffic once we entered the polling routine for
the given interrupt group. After servicing the ring i, the corresponding
bit RXFi will be written with 1 to clear the active queue indication for
that ring.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Claudiu Manoil
c233cf4074 gianfar: Fix tx napi polling
There are 2 issues with the current napi poll routine, with regards
to tx ring cleanup:
1) for multi-queue devices (MQ_MG_MODE), should tx_bit_map != rx_bit_map,
which is possible (and supported in h/w) if the DT property "fsl,tx-bit-map"
holds a different value than rx_bit_map, the current polling routine will
service the wrong Tx queues in this case (i.e. the interrupt group will
receive interrupts from tx queues that it will not service)
2) Tx cleanup completion consumes napi budget, whereas the napi budget
should be reserved for Rx work only.

The patch fixes these issues and provides a clean napi polling routine.
Napi poll completion is reached when all the Rx queues have been
serviced and there is no Tx work to do.

Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-20 13:21:52 -04:00
Mike Snitzer
ea2dd8c1ed dm cache: policy ignore hints if generated by different version
When reading the dm cache metadata from disk, ignore the policy hints
unless they were generated by the same major version number of the same
policy module.

The hints are considered to be private data belonging to the specific
module that generated them and there is no requirement for them to make
sense to different versions of the policy that generated them.
Policy modules are all required to work fine if no previous hints are
supplied (or if existing hints are lost).

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:28 +00:00
Mike Snitzer
4e7f506f64 dm cache: policy change version from string to integer set
Separate dm cache policy version string into 3 unsigned numbers
corresponding to major, minor and patchlevel and store them at the end
of the on-disk metadata so we know which version of the policy generated
the hints in case a future version wants to use them differently.

Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:27 +00:00
Joe Thornber
e2e74d617e dm cache: fix race in writethrough implementation
We have found a race in the optimisation used in the dm cache
writethrough implementation.  Currently, dm core sends the cache target
two bios, one for the origin device and one for the cache device and
these are processed in parallel.  This patch avoids the race by
changing the code back to a simpler (slower) implementation which
processes the two writes in series, one after the other, until we can
develop a complete fix for the problem.

When the cache is in writethrough mode it needs to send WRITE bios to
both the origin and cache devices.

Previously we've been implementing this by having dm core query the
cache target on every write to find out how many copies of the bio it
wants.  The cache will ask for two bios if the block is in the cache,
and one otherwise.

Then main problem with this is it's racey.  At the time this check is
made the bio hasn't yet been submitted and so isn't being taken into
account when quiescing a block for migration (promotion or demotion).
This means a single bio may be submitted when two were needed because
the block has since been promoted to the cache (catastrophic), or two
bios where only one is needed (harmless).

I really don't want to start entering bios into the quiescing system
(deferred_set) in the get_num_write_bios callback.  Instead this patch
simplifies things; only one bio is submitted by the core, this is
first written to the origin and then the cache device in series.
Obviously this will have a latency impact.

deferred_writethrough_bios is introduced to record bios that must be
later issued to the cache device from the worker thread.  This deferred
submission, after the origin bio completes, is required given that we're
in interrupt context (writethrough_endio).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-03-20 17:21:27 +00:00