Commit graph

4309 commits

Author SHA1 Message Date
Sami Tolvanen
69c5db92e4 ANDROID: dm verity fec: limit error correction recursion
If verity tree itself is sufficiently corrupted in addition to data
blocks, it's possible for error correction to end up in a deep recursive
error correction loop that eventually causes a kernel panic as follows:

[   14.728962] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138
[   14.734691] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180
[   14.740681] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0
[   14.746671] [<ffffffc0008c1508>] fec_decode_rsb+0x318/0x75c
[   14.752226] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138
[   14.757956] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180
[   14.763944] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0

This change limits the recursion to a reasonable level during a single
I/O operation.

Bug: 28943429
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Change-Id: I0a7ebff331d259c59a5e03c81918cc1613c3a766
(cherry picked from commit f4b9e40597e73942d2286a73463c55f26f61bfa7)
2016-06-16 13:44:10 +05:30
Sami Tolvanen
c4d8e3e8d2 ANDROID: dm verity fec: add missing release from fec_ktype
Add a release function to allow destroying the dm-verity device.

Bug: 27928374
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Change-Id: Ic0f7c17e4889c5580d70b52d9a709a37165a5747
(cherry picked from commit 0039ccf47c8f99888f7b71b2a36a68a027fbe357)
2016-06-07 12:50:09 -07:00
Sami Tolvanen
249d2baf9b ANDROID: dm verity fec: limit error correction recursion
If verity tree itself is sufficiently corrupted in addition to data
blocks, it's possible for error correction to end up in a deep recursive
error correction loop that eventually causes a kernel panic as follows:

[   14.728962] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138
[   14.734691] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180
[   14.740681] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0
[   14.746671] [<ffffffc0008c1508>] fec_decode_rsb+0x318/0x75c
[   14.752226] [<ffffffc0008c1a14>] verity_fec_decode+0xa8/0x138
[   14.757956] [<ffffffc0008c3ee0>] verity_verify_level+0x11c/0x180
[   14.763944] [<ffffffc0008c482c>] verity_hash_for_block+0x88/0xe0

This change limits the recursion to a reasonable level during a single
I/O operation.

Bug: 28943429
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Change-Id: I0a7ebff331d259c59a5e03c81918cc1613c3a766
(cherry picked from commit f4b9e40597e73942d2286a73463c55f26f61bfa7)
2016-06-06 14:16:11 -07:00
Alex Shi
b3f09bff3f Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-05-12 12:20:40 +08:00
Alex Shi
334ca3ed18 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-05-12 09:27:18 +08:00
Shaohua Li
f3b51a03be MD: make bio mergeable
commit 9c573de3283af007ea11c17bde1e4568d9417328 upstream.

blk_queue_split marks bio unmergeable, which makes sense for normal bio.
But if dispatching the bio to underlayer disk, the blk_queue_split
checks are invalid, hence it's possible the bio becomes mergeable.

In the reported bug, this bug causes trim against raid0 performance slash
https://bugzilla.kernel.org/show_bug.cgi?id=117051

Reported-and-tested-by: Park Ju Hyung <qkrwngud825@gmail.com>
Fixes: 6ac45aeb6bca(block: avoid to merge splitted bio)
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Neil Brown <neilb@suse.de>
Reviewed-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-11 11:21:13 +02:00
Ahmed Samy
be5cbaf31c dm cache metadata: fix cmd_read_lock() acquiring write lock
commit 6545b60baaf880b0cd29a5e89dbe745a06027e89 upstream.

Commit 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and
cleanup WRITE_LOCK macros") uses down_write() instead of down_read() in
cmd_read_lock(), yet up_read() is used to release the lock in
READ_UNLOCK().  Fix it.

Fixes: 9567366fefdd ("dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros")
Signed-off-by: Ahmed Samy <f.fallen45@gmail.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:41 -07:00
Mike Snitzer
9d58f322ee dm cache metadata: fix READ_LOCK macros and cleanup WRITE_LOCK macros
commit 9567366fefddeaea4ed1d713270535d93a3b3c76 upstream.

The READ_LOCK macro was incorrectly returning -EINVAL if
dm_bm_is_read_only() was true -- it will always be true once the cache
metadata transitions to read-only by dm_cache_metadata_set_read_only().

Wrap READ_LOCK and WRITE_LOCK multi-statement macros in do {} while(0).
Also, all accesses of the 'cmd' argument passed to these related macros
are now encapsulated in parenthesis.

A follow-up patch can be developed to eliminate the use of macros in
favor of pure C code.  Avoiding that now given that this needs to apply
to stable@.

Reported-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: d14fcf3dd79 ("dm cache: make sure every metadata function checks fail_io")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-05-04 14:48:41 -07:00
Andrey Markovytch
03a29a9ee5 md: dm-req-crypt: Increase mempool size for dm-req-crypt data
DM layer allocates pool size of 256 for request based module while
dm-req-crypt internally allocates pool for minimum 16 requests.
Increasing pool size of dm-req-crypt to be consistent with DM layer.
Also, changing GFP mask for allocation from pool, depending upon
whether call is made from atomic context or not.

Change-Id: I9dfeb46520e0d1b1fc6f850a007fce35bdc60d35
Signed-off-by: Dinesh K Garg <dineshg@codeaurora.org>
Signed-off-by: Andrey Markovytch <andreym@codeaurora.org>
2016-04-25 17:45:24 -07:00
Alex Shi
08562bfcb8 Merge branch 'linux-linaro-lsk-v4.4' into linux-linaro-lsk-v4.4-android 2016-04-13 12:02:21 +08:00
Ming Lei
2775a60447 md: multipath: don't hardcopy bio in .make_request path
commit fafcde3ac1a418688a734365203a12483b83907a upstream.

Inside multipath_make_request(), multipath maps the incoming
bio into low level device's bio, but it is totally wrong to
copy the bio into mapped bio via '*mapped_bio = *bio'. For
example, .__bi_remaining is kept in the copy, especially if
the incoming bio is chained to via bio splitting, so .bi_end_io
can't be called for the mapped bio at all in the completing path
in this kind of situation.

This patch fixes the issue by using clone style.

Reported-and-tested-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
NeilBrown
d5e30f2b93 md/raid5: preserve STRIPE_PREREAD_ACTIVE in break_stripe_batch_list
commit 550da24f8d62fe81f3c13e3ec27602d6e44d43dc upstream.

break_stripe_batch_list breaks up a batch and copies some flags from
the batch head to the members, preserving others.

It doesn't preserve or copy STRIPE_PREREAD_ACTIVE.  This is not
normally a problem as STRIPE_PREREAD_ACTIVE is cleared when a
stripe_head is added to a batch, and is not set on stripe_heads
already in a batch.

However there is no locking to ensure one thread doesn't set the flag
after it has just been cleared in another.  This does occasionally happen.

md/raid5 maintains a count of the number of stripe_heads with
STRIPE_PREREAD_ACTIVE set: conf->preread_active_stripes.  When
break_stripe_batch_list clears STRIPE_PREREAD_ACTIVE inadvertently
this could becomes incorrect and will never again return to zero.

md/raid5 delays the handling of some stripe_heads until
preread_active_stripes becomes zero.  So when the above mention race
happens, those stripe_heads become blocked and never progress,
resulting is write to the array handing.

So: change break_stripe_batch_list to preserve STRIPE_PREREAD_ACTIVE
in the members of a batch.

URL: https://bugzilla.kernel.org/show_bug.cgi?id=108741
URL: https://bugzilla.redhat.com/show_bug.cgi?id=1258153
URL: http://thread.gmane.org/5649C0E9.2030204@zoner.cz
Reported-by: Martin Svec <martin.svec@zoner.cz> (and others)
Tested-by: Tom Weber <linux@junkyard.4t2.com>
Fixes: 1b956f7a8f ("md/raid5: be more selective about distributing flags across batch.")
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
Shaohua Li
b82ed7dc00 raid10: include bio_end_io_list in nr_queued to prevent freeze_array hang
commit 23ddba80ebe836476bb2fa1f5ef305dd1c63dc0b upstream.

This is the raid10 counterpart of the bug fixed by Nate
(raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang)

Fixes: 95af587e95(md/raid10: ensure device failure recorded before write request returns)
Cc: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
Shaohua Li
ad072f6065 RAID5: revert e9e4c377e2 to fix a livelock
commit 6ab2a4b806ae21b6c3e47c5ff1285ec06d505325 upstream.

Revert commit
e9e4c377e2f563(md/raid5: per hash value and exclusive wait_for_stripe)

The problem is raid5_get_active_stripe waits on
conf->wait_for_stripe[hash]. Assume hash is 0. My test release stripes
in this order:
- release all stripes with hash 0
- raid5_get_active_stripe still sleeps since active_stripes >
  max_nr_stripes * 3 / 4
- release all stripes with hash other than 0. active_stripes becomes 0
- raid5_get_active_stripe still sleeps, since nobody wakes up
  wait_for_stripe[0]
The system live locks. The problem is active_stripes isn't a per-hash
count. Revert the patch makes the live lock go away.

Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
Shaohua Li
8568767fe1 RAID5: check_reshape() shouldn't call mddev_suspend
commit 27a353c026a879a1001e5eac4bda75b16262c44a upstream.

check_reshape() is called from raid5d thread. raid5d thread shouldn't
call mddev_suspend(), because mddev_suspend() waits for all IO finish
but IO is handled in raid5d thread, we could easily deadlock here.

This issue is introduced by
738a273 ("md/raid5: fix allocation of 'scribble' array.")

Reported-and-tested-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
Jes Sorensen
fad8b6fc04 md/raid5: Compare apples to apples (or sectors to sectors)
commit e7597e69dec59b65c5525db1626b9d34afdfa678 upstream.

'max_discard_sectors' is in sectors, while 'stripe' is in bytes.

This fixes the problem where DISCARD would get disabled on some larger
RAID5 configurations (6 or more drives in my testing), while it worked
as expected with smaller configurations.

Fixes: 620125f2bf ("MD: raid5 trim support")
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:57 -07:00
Nate Dailey
2b9eb2b223 raid1: include bio_end_io_list in nr_queued to prevent freeze_array hang
commit ccfc7bf1f09d6190ef86693ddc761d5fe3fa47cb upstream.

If raid1d is handling a mix of read and write errors, handle_read_error's
call to freeze_array can get stuck.

This can happen because, though the bio_end_io_list is initially drained,
writes can be added to it via handle_write_finished as the retry_list
is processed. These writes contribute to nr_pending but are not included
in nr_queued.

If a later entry on the retry_list triggers a call to handle_read_error,
freeze array hangs waiting for nr_pending == nr_queued+extra. The writes
on the bio_end_io_list aren't included in nr_queued so the condition will
never be satisfied.

To prevent the hang, include bio_end_io_list writes in nr_queued.

There's probably a better way to handle decrementing nr_queued, but this
seemed like the safest way to avoid breaking surrounding code.

I'm happy to supply the script I used to repro this hang.

Fixes: 55ce74d4bfe1b(md/raid1: ensure device failure recorded before write request returns.)
Signed-off-by: Nate Dailey <nate.dailey@stratus.com>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:56 -07:00
Eric Wheeler
ca75edc440 bcache: fix cache_set_flush() NULL pointer dereference on OOM
commit f8b11260a445169989d01df75d35af0f56178f95 upstream.

When bch_cache_set_alloc() fails to kzalloc the cache_set, the
asyncronous closure handling tries to dereference a cache_set that
hadn't yet been allocated inside of cache_set_flush() which is called
by __cache_set_unregister() during cleanup.  This appears to happen only
during an OOM condition on bcache_register.

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:53 -07:00
Eric Wheeler
b58e781068 bcache: fix race of writeback thread starting before complete initialization
commit 07cc6ef8edc47f8b4fc1e276d31127a0a5863d4d upstream.

The bch_writeback_thread might BUG_ON in read_dirty() if
dc->sb==BDEV_STATE_DIRTY and bch_sectors_dirty_init has not yet completed
its related initialization.  This patch downs the dc->writeback_lock until
after initialization is complete, thus preventing bch_writeback_thread
from proceeding prematurely.

See this thread:
  http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3453

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:53 -07:00
Eric Wheeler
32bb118509 bcache: cleaned up error handling around register_cache()
commit 9b299728ed777428b3908ac72ace5f8f84b97789 upstream.

Fix null pointer dereference by changing register_cache() to return an int
instead of being void.  This allows it to return -ENOMEM or -ENODEV and
enables upper layers to handle the OOM case without NULL pointer issues.

See this thread:
  http://thread.gmane.org/gmane.linux.kernel.bcache.devel/3521

Fixes this error:
  gargamel:/sys/block/md5/bcache# echo /dev/sdh2 > /sys/fs/bcache/register

  bcache: register_cache() error opening sdh2: cannot allocate memory
  BUG: unable to handle kernel NULL pointer dereference at 00000000000009b8
  IP: [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]
  PGD 120dff067 PUD 1119a3067 PMD 0
  Oops: 0000 [#1] SMP
  Modules linked in: veth ip6table_filter ip6_tables
  (...)
  CPU: 4 PID: 3371 Comm: kworker/4:3 Not tainted 4.4.2-amd64-i915-volpreempt-20160213bc1 #3
  Hardware name: System manufacturer System Product Name/P8H67-M PRO, BIOS 3904 04/27/2013
  Workqueue: events cache_set_flush [bcache]
  task: ffff88020d5dc280 ti: ffff88020b6f8000 task.ti: ffff88020b6f8000
  RIP: 0010:[<ffffffffc05a7e8d>]  [<ffffffffc05a7e8d>] cache_set_flush+0x102/0x15c [bcache]

Signed-off-by: Eric Wheeler <bcache@linux.ewheeler.net>
Tested-by: Marc MERLIN <marc@merlins.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:53 -07:00
Bryn M. Reeves
8907d8a6fd dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request()
commit 98dbc9c6c61698792e3a66f32f3bf066201d42d7 upstream.

An "old" (.request_fn) DM 'struct request' stores a pointer to the
associated 'struct dm_rq_target_io' in rq->special.

dm_requeue_original_request(), previously named
dm_requeue_unmapped_original_request(), called dm_unprep_request() to
reset rq->special to NULL.  But rq_end_stats() would go on to hit a NULL
pointer deference because its call to tio_from_request() returned NULL.

Fix this by calling rq_end_stats() _before_ dm_unprep_request()

Signed-off-by: Bryn M. Reeves <bmr@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Fixes: e262f34741 ("dm stats: add support for request-based DM devices")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:40 -07:00
Joe Thornber
7f47aea487 dm cache: make sure every metadata function checks fail_io
commit d14fcf3dd79c0b8a8d0ba469c44a6b04f3a1403b upstream.

Otherwise operations may be attempted that will only ever go on to crash
(since the metadata device is either missing or unreliable if 'fail_io'
is set).

Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:40 -07:00
Joe Thornber
291e2b3900 dm thin metadata: don't issue prefetches if a transaction abort has failed
commit 2eae9e4489b4cf83213fa3bd508b5afca3f01780 upstream.

If a transaction abort has failed then we can no longer use the metadata
device.  Typically this happens if the superblock is unreadable.

This fix addresses a crash seen during metadata device failure testing.

Fixes: 8a01a6af75 ("dm thin: prefetch missing metadata pages")
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:40 -07:00
Mike Snitzer
5504a47088 dm: fix excessive dm-mq context switching
commit 6acfe68bac7e6f16dc312157b1fa6e2368985013 upstream.

Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
than if an underlying null_blk device were used directly.  One of the
reasons for this drop in performance is that blk_insert_clone_request()
was calling blk_mq_insert_request() with @async=true.  This forced the
use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
which ushered in ping-ponging between process context (fio in this case)
and kblockd's kworker to submit the cloned request.  The ftrace
function_graph tracer showed:

  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...
  kworker-2013  =>   fio-12190
  fio-12190    =>  kworker-2013
  ...

Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
_not_ use kblockd to submit the cloned requests isn't enough to
eliminate the observed context switches.

In addition to this dm-mq specific blk-core fix, there are 2 DM core
fixes to dm-mq that (when paired with the blk-core fix) completely
eliminate the observed context switching:

1)  don't blk_mq_run_hw_queues in blk-mq request completion

    Motivated by desire to reduce overhead of dm-mq, punting to kblockd
    just increases context switches.

    In my testing against a really fast null_blk device there was no benefit
    to running blk_mq_run_hw_queues() on completion (and no other blk-mq
    driver does this).  So hopefully this change doesn't induce the need for
    yet another revert like commit 621739b00e !

2)  use blk_mq_complete_request() in dm_complete_request()

    blk_complete_request() doesn't offer the traditional q->mq_ops vs
    .request_fn branching pattern that other historic block interfaces
    do (e.g. blk_get_request).  Using blk_mq_complete_request() for
    blk-mq requests is important for performance.  It should be noted
    that, like blk_complete_request(), blk_mq_complete_request() doesn't
    natively handle partial completions -- but the request-based
    DM-multipath target does provide the required partial completion
    support by dm.c:end_clone_bio() triggering requeueing of the request
    via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.

dm-mq fix #2 is _much_ more important than #1 for eliminating the
context switches.
Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472

With these changes multithreaded async read IOPs improved from ~950K
to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
IOPs of the underlying null_blk device for the same workload is ~1950K.

Fixes: 7fb4898e0 ("block: add blk-mq support to blk_insert_cloned_request()")
Fixes: bfebd1cdb ("dm: add full blk-mq support to request-based DM")
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:40 -07:00
DingXiang
15c3af026b dm snapshot: disallow the COW and origin devices from being identical
commit 4df2bf466a9c9c92f40d27c4aa9120f4e8227bfc upstream.

Otherwise loading a "snapshot" table using the same device for the
origin and COW devices, e.g.:

echo "0 20971520 snapshot 253:3 253:3 P 8" | dmsetup create snap

will trigger:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
[ 1958.979934] IP: [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1958.989655] PGD 0
[ 1958.991903] Oops: 0000 [#1] SMP
...
[ 1959.059647] CPU: 9 PID: 3556 Comm: dmsetup Tainted: G          IO    4.5.0-rc5.snitm+ #150
...
[ 1959.083517] task: ffff8800b9660c80 ti: ffff88032a954000 task.ti: ffff88032a954000
[ 1959.091865] RIP: 0010:[<ffffffffa040efba>]  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.104295] RSP: 0018:ffff88032a957b30  EFLAGS: 00010246
[ 1959.110219] RAX: 0000000000000000 RBX: 0000000000000008 RCX: 0000000000000001
[ 1959.118180] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff880329334a00
[ 1959.126141] RBP: ffff88032a957b50 R08: 0000000000000000 R09: 0000000000000001
[ 1959.134102] R10: 000000000000000a R11: f000000000000000 R12: ffff880330884d80
[ 1959.142061] R13: 0000000000000008 R14: ffffc90001c13088 R15: ffff880330884d80
[ 1959.150021] FS:  00007f8926ba3840(0000) GS:ffff880333440000(0000) knlGS:0000000000000000
[ 1959.159047] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1959.165456] CR2: 0000000000000098 CR3: 000000032f48b000 CR4: 00000000000006e0
[ 1959.173415] Stack:
[ 1959.175656]  ffffc90001c13040 ffff880329334a00 ffff880330884ed0 ffff88032a957bdc
[ 1959.183946]  ffff88032a957bb8 ffffffffa040f225 ffff880329334a30 ffff880300000000
[ 1959.192233]  ffffffffa04133e0 ffff880329334b30 0000000830884d58 00000000569c58cf
[ 1959.200521] Call Trace:
[ 1959.203248]  [<ffffffffa040f225>] dm_exception_store_create+0x1d5/0x240 [dm_snapshot]
[ 1959.211986]  [<ffffffffa040d310>] snapshot_ctr+0x140/0x630 [dm_snapshot]
[ 1959.219469]  [<ffffffffa0005c44>] ? dm_split_args+0x64/0x150 [dm_mod]
[ 1959.226656]  [<ffffffffa0005ea7>] dm_table_add_target+0x177/0x440 [dm_mod]
[ 1959.234328]  [<ffffffffa0009203>] table_load+0x143/0x370 [dm_mod]
[ 1959.241129]  [<ffffffffa00090c0>] ? retrieve_status+0x1b0/0x1b0 [dm_mod]
[ 1959.248607]  [<ffffffffa0009e35>] ctl_ioctl+0x255/0x4d0 [dm_mod]
[ 1959.255307]  [<ffffffff813304e2>] ? memzero_explicit+0x12/0x20
[ 1959.261816]  [<ffffffffa000a0c3>] dm_ctl_ioctl+0x13/0x20 [dm_mod]
[ 1959.268615]  [<ffffffff81215eb6>] do_vfs_ioctl+0xa6/0x5c0
[ 1959.274637]  [<ffffffff81120d2f>] ? __audit_syscall_entry+0xaf/0x100
[ 1959.281726]  [<ffffffff81003176>] ? do_audit_syscall_entry+0x66/0x70
[ 1959.288814]  [<ffffffff81216449>] SyS_ioctl+0x79/0x90
[ 1959.294450]  [<ffffffff8167e4ae>] entry_SYSCALL_64_fastpath+0x12/0x71
...
[ 1959.323277] RIP  [<ffffffffa040efba>] dm_exception_store_set_chunk_size+0x7a/0x110 [dm_snapshot]
[ 1959.333090]  RSP <ffff88032a957b30>
[ 1959.336978] CR2: 0000000000000098
[ 1959.344121] ---[ end trace b049991ccad1169e ]---

Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1195899
Signed-off-by: Ding Xiang <dingxiang@huawei.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-04-12 09:08:39 -07:00
Sami Tolvanen
73a20ae313 ANDROID: dm verity fec: add sysfs attribute fec/corrected
Add a sysfs entry that allows user space to determine whether dm-verity
has come across correctable errors on the underlying block device.

Bug: 22655252
Bug: 27928374
Change-Id: I80547a2aa944af2fb9ffde002650482877ade31b
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 7911fad5f0a2cf5afc2215657219a21e6630e001)
2016-04-07 16:50:07 +05:30
Sami Tolvanen
4cd7f1ecf7 UPSTREAM: dm verity: add ignore_zero_blocks feature
If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
matching a zero hash without validating the content.

Change-Id: I728fa4b2586b29f2793ea5cb014289892819d249
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 0cc37c2df4fa0aa702f9662edce4b7ce12c86b7a)
2016-04-07 16:50:06 +05:30
Sami Tolvanen
d5fe9722ab UPSTREAM: dm verity: add support for forward error correction
Add support for correcting corrupted blocks using Reed-Solomon.

This code uses RS(255, N) interleaved across data and hash
blocks. Each error-correcting block covers N bytes evenly
distributed across the combined total data, so that each byte is a
maximum distance away from the others. This makes it possible to
recover from several consecutive corrupted blocks with relatively
small space overhead.

In addition, using verity hashes to locate erasures nearly doubles
the effectiveness of error correction. Being able to detect
corrupted blocks also improves performance, because only corrupted
blocks need to corrected.

For a 2 GiB partition, RS(255, 253) (two parity bytes for each
253-byte block) can correct up to 16 MiB of consecutive corrupted
blocks if erasures can be located, and 8 MiB if they cannot, with
16 MiB space overhead.

Change-Id: Ife4f8889f7fbf0974bf3ed4be6d3322ae9b4cb0e
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit a739ff3f543afbb4a041c16cd0182c8e8d366e70)
2016-04-07 16:50:06 +05:30
Sami Tolvanen
c81331d2f4 UPSTREAM: dm verity: factor out verity_for_bv_block()
verity_for_bv_block() will be re-used by optional dm-verity object.

Change-Id: I80e0f8e7c9f234fce3fbdf21cb05aba3041d7f98
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit bb4d73ac5e4f0a6c4853f35824f6cb2d396a2f9c)
2016-04-07 16:50:06 +05:30
Sami Tolvanen
0be563d613 UPSTREAM: dm verity: factor out structures and functions useful to separate object
Prepare for an optional verity object to make use of existing dm-verity
structures and functions.

Change-Id: Ib14c3834bfed222b33e068908fb5f71a53e1187b
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit ffa393807cd69656d5b6bc9d9622e205071cbab8)
2016-04-07 16:50:05 +05:30
Sami Tolvanen
612ce5278f UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.c
Prepare for extending dm-verity with an optional object.  Follows the
naming convention used by other DM targets (e.g. dm-cache and dm-era).

Change-Id: If6d2f27b290adf14fa77f3745fdc13aaa417c8dc
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 03045cbafa2d663ad8d0a583ac219d202d824344)
2016-04-07 16:50:05 +05:30
Sami Tolvanen
e6d1b9a713 UPSTREAM: dm verity: separate function for parsing opt args
Move optional argument parsing into a separate function to make it
easier to add more of them without making verity_ctr even longer.

Change-Id: I9cd9df41c3326824f8cca5764075501987e78a52
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 753c1fd02807cb43a1c5d01d75d454054d46bdad)
2016-04-07 16:50:05 +05:30
Sami Tolvanen
00e28c2ab8 UPSTREAM: dm verity: clean up duplicate hashing code
Handle dm-verity salting in one place to simplify the code.

Change-Id: If923a01dc63ae5123af13ba1b0863b73e33ddf46
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 6dbeda3469ced777bc3138ed5918f7ae79670b7b)
2016-04-07 16:50:05 +05:30
Mikulas Patocka
47733418c9 UPSTREAM: dm: don't save and restore bi_private
Device mapper used the field bi_private to point to dm_target_io. However,
since kernel 3.15, the bi_private field is unused, and so the targets do
not need to save and restore this field.

This patch removes code that saves and restores bi_private from dm-cache,
dm-snapshot and dm-verity.

Change-Id: Ic72905ccb6d58ff94eafaa47ba54b2688d92d3d1
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit fe3265b180d6282648f03bc6ac3958c733df01c2)
2016-04-07 16:50:04 +05:30
Tim Murray
e9eb108900 ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
(cherry pick from commit ad3ac5180979e5dd1f84e4a807f76fb9fb19f814)

Running dm-crypt in a standard workqueue results in IO competing for CPU
time with standard user apps, which can lead to pipeline bubbles and
seriously degraded performance. Move to a WQ_HIGHPRI workqueue to
protect against that.

Signed-off-by: Tim Murray <timmurray@google.com>
Bug: 25392275
Change-Id: I2828587c754a7c2cafdd78b3323b9896cb8cd4e7
2016-04-07 16:49:58 +05:30
Sami Tolvanen
e7d40d2bf8 ANDROID: dm verity fec: add sysfs attribute fec/corrected
Add a sysfs entry that allows user space to determine whether dm-verity
has come across correctable errors on the underlying block device.

Bug: 22655252
Bug: 27928374
Change-Id: I80547a2aa944af2fb9ffde002650482877ade31b
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit 7911fad5f0a2cf5afc2215657219a21e6630e001)
2016-03-31 09:13:28 -07:00
Sami Tolvanen
f75998fe84 UPSTREAM: dm verity: add ignore_zero_blocks feature
If ignore_zero_blocks is enabled dm-verity will return zeroes for blocks
matching a zero hash without validating the content.

Change-Id: I728fa4b2586b29f2793ea5cb014289892819d249
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 0cc37c2df4fa0aa702f9662edce4b7ce12c86b7a)
2016-03-31 09:12:15 -07:00
Sami Tolvanen
83c1827e58 UPSTREAM: dm verity: add support for forward error correction
Add support for correcting corrupted blocks using Reed-Solomon.

This code uses RS(255, N) interleaved across data and hash
blocks. Each error-correcting block covers N bytes evenly
distributed across the combined total data, so that each byte is a
maximum distance away from the others. This makes it possible to
recover from several consecutive corrupted blocks with relatively
small space overhead.

In addition, using verity hashes to locate erasures nearly doubles
the effectiveness of error correction. Being able to detect
corrupted blocks also improves performance, because only corrupted
blocks need to corrected.

For a 2 GiB partition, RS(255, 253) (two parity bytes for each
253-byte block) can correct up to 16 MiB of consecutive corrupted
blocks if erasures can be located, and 8 MiB if they cannot, with
16 MiB space overhead.

Change-Id: Ife4f8889f7fbf0974bf3ed4be6d3322ae9b4cb0e
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit a739ff3f543afbb4a041c16cd0182c8e8d366e70)
2016-03-31 09:11:43 -07:00
Sami Tolvanen
890b7865c5 UPSTREAM: dm verity: factor out verity_for_bv_block()
verity_for_bv_block() will be re-used by optional dm-verity object.

Change-Id: I80e0f8e7c9f234fce3fbdf21cb05aba3041d7f98
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit bb4d73ac5e4f0a6c4853f35824f6cb2d396a2f9c)
2016-03-31 09:10:55 -07:00
Sami Tolvanen
3ec912f8ef UPSTREAM: dm verity: factor out structures and functions useful to separate object
Prepare for an optional verity object to make use of existing dm-verity
structures and functions.

Change-Id: Ib14c3834bfed222b33e068908fb5f71a53e1187b
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit ffa393807cd69656d5b6bc9d9622e205071cbab8)
2016-03-31 09:10:15 -07:00
Sami Tolvanen
6558fe3cbe UPSTREAM: dm verity: move dm-verity.c to dm-verity-target.c
Prepare for extending dm-verity with an optional object.  Follows the
naming convention used by other DM targets (e.g. dm-cache and dm-era).

Change-Id: If6d2f27b290adf14fa77f3745fdc13aaa417c8dc
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 03045cbafa2d663ad8d0a583ac219d202d824344)
2016-03-31 09:09:40 -07:00
Sami Tolvanen
7475611cec UPSTREAM: dm verity: separate function for parsing opt args
Move optional argument parsing into a separate function to make it
easier to add more of them without making verity_ctr even longer.

Change-Id: I9cd9df41c3326824f8cca5764075501987e78a52
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 753c1fd02807cb43a1c5d01d75d454054d46bdad)
2016-03-31 09:09:02 -07:00
Sami Tolvanen
ab14138e10 UPSTREAM: dm verity: clean up duplicate hashing code
Handle dm-verity salting in one place to simplify the code.

Change-Id: If923a01dc63ae5123af13ba1b0863b73e33ddf46
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit 6dbeda3469ced777bc3138ed5918f7ae79670b7b)
2016-03-31 09:08:26 -07:00
Mikulas Patocka
a9c7ddcf03 UPSTREAM: dm: don't save and restore bi_private
Device mapper used the field bi_private to point to dm_target_io. However,
since kernel 3.15, the bi_private field is unused, and so the targets do
not need to save and restore this field.

This patch removes code that saves and restores bi_private from dm-cache,
dm-snapshot and dm-verity.

Change-Id: Ic72905ccb6d58ff94eafaa47ba54b2688d92d3d1
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
(cherry picked from commit fe3265b180d6282648f03bc6ac3958c733df01c2)
2016-03-31 09:07:30 -07:00
Gilad Broner
c3b854ad6c dm: add snapshot of dm-req-crypt
This snapshot is taken as of msm-3.18 commit:
5684450d70 ("Promotion of kernel.lnx.3.18-151201").
dm-req-crypt is necessary for full disk encryption.

Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-23 21:24:15 -07:00
Tim Murray
dc87462304 ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
(cherry pick from commit ad3ac5180979e5dd1f84e4a807f76fb9fb19f814)

Running dm-crypt in a standard workqueue results in IO competing for CPU
time with standard user apps, which can lead to pipeline bubbles and
seriously degraded performance. Move to a WQ_HIGHPRI workqueue to
protect against that.

Signed-off-by: Tim Murray <timmurray@google.com>
Bug: 25392275
Change-Id: I2828587c754a7c2cafdd78b3323b9896cb8cd4e7
2016-03-08 15:45:34 -08:00
Mark Salyzyn
9f5ef4b3ad Revert "ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue"
This reverts commit 46050a93ff.

Change-Id: I3e37cf37c9ea0270737608dcd92ab5c311ea5ab8
2016-03-08 21:19:52 +00:00
Tim Murray
46050a93ff ANDROID: dm-crypt: run in a WQ_HIGHPRI workqueue
(cherry pick from commit ad3ac5180979e5dd1f84e4a807f76fb9fb19f814)

Running dm-crypt in a standard workqueue results in IO competing for CPU
time with standard user apps, which can lead to pipeline bubbles and
seriously degraded performance. Move to a WQ_HIGHPRI workqueue to
protect against that.

Signed-off-by: Tim Murray <timmurray@google.com>
Bug: 25392275
Change-Id: I589149a31c7b5d322fe2ed5b2476b1f6e3d5ee6f
2016-03-08 20:42:29 +00:00
Mike Snitzer
5c6f666742 dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths
commit 4328daa2e79ed904a42ce00a9f38b9c36b44b21a upstream.

Using request-based DM mpath configured with the following stacking
(.request_fn DM mpath ontop of scsi-mq paths):

echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
echo N > /sys/module/dm_mod/parameters/use_blk_mq

'struct dm_rq_target_io' would leak if a request is requeued before a
blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
wasn't being called.

kmemleak reported:

unreferenced object 0xffff8800b90b98c0 (size 112):
  comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
  hex dump (first 32 bytes):
    00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
    e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
  backtrace:
    [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
    [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
    [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
    [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
    [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
    [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
    [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
    [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
    [<ffffffff812f9585>] blk_delay_work+0x25/0x40
    [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
    [<ffffffff81097715>] worker_thread+0x125/0x4b0
    [<ffffffff8109ce88>] kthread+0xd8/0xf0
    [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
    [<ffffffffffffffff>] 0xffffffffffffffff

crash> struct -o dm_rq_target_io
struct dm_rq_target_io {
    ...
}
SIZE: 112

Fixes: e5863d9ad7 ("dm: allocate requests in target when stacking on blk-mq devices")
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:14 -08:00
Mikulas Patocka
1264fbd72f dm snapshot: fix hung bios when copy error occurs
commit 385277bfb57faac44e92497104ba542cdd82d5fe upstream.

When there is an error copying a chunk dm-snapshot can incorrectly hold
associated bios indefinitely, resulting in hung IO.

The function copy_callback sets pe->error if there was error copying the
chunk, and then calls complete_exception.  complete_exception calls
pending_complete on error, otherwise it calls commit_exception with
commit_callback (and commit_callback calls complete_exception).

The persistent exception store (dm-snap-persistent.c) assumes that calls
to prepare_exception and commit_exception are paired.
persistent_prepare_exception increases ps->pending_count and
persistent_commit_exception decreases it.

If there is a copy error, persistent_prepare_exception is called but
persistent_commit_exception is not.  This results in the variable
ps->pending_count never returning to zero and that causes some pending
exceptions (and their associated bios) to be held forever.

Fix this by unconditionally calling commit_exception regardless of
whether the copy was successful.  A new "valid" parameter is added to
commit_exception -- when the copy fails this parameter is set to zero so
that the chunk that failed to copy (and all following chunks) is not
recorded in the snapshot store.  Also, remove commit_callback now that
it is merely a wrapper around pending_complete.

Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-03-03 15:07:14 -08:00