Commit graph

2706 commits

Author SHA1 Message Date
Venkat Gopalakrishnan
1686d75c12 block: test-iosched: Fix compilation error in end_test_bio
The function signature of bio_end_io_t is changed in 4.4 kernel and
the error value is assigned in bi_error field of the bio struct, so
just free the bio after bio completion.

Change-Id: I08f64d8d51ae401fa608351b90b1120d8b84605f
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:03 -07:00
Subhash Jadavani
09ed2d9d3c block: don't allow nr_pending to go negative
nr_pending can go negative if we attempt to decrement it without matching
increment call earlier. If nr_pending goes negative, LLD's runtime suspend
might race with the ongoing request. This change allows decrementing
nr_pending only if it is non-zero.

Change-Id: I5f1e93ab0e0f950307e2e3c4f95c7cb01e83ffdd
Signed-off-by: Subhash Jadavani <subhashj@codeaurora.org>
2016-03-22 11:02:02 -07:00
Venkat Gopalakrishnan
a627c73479 block/fs: make tracking dirty task debug only
Adding a new element "tsk_dirty" to struct page increases the size
of mem_map/vmemmap, restrict this to a debug only functionality to
save few MB of memory.

Considering a system with 1G of RAM, there will be nearly 262144
pages and thus that many number of page structures in mem_map/vmemmap.
With pointer size of 8 bytes on a 64 bit system, adding this
pointer to "struct page" means an increase of "2MB" for mem_map.

CRs-Fixed: 738692
Change-Id: Idf3217dcbe17cf1ab4d462d2aa8d39da1ffd8b13
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflict]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:01 -07:00
Venkat Gopalakrishnan
014929f975 block/fs: keep track of the task that dirtied the page
Background writes happen in the context of a background thread.
It is very useful to identify the actual task that generated the
request instead of background task that submited the request.
Hence keep track of the task when a page gets dirtied and dump
this task info while tracing. Not all the pages in the bio are
dirtied by the same task but most likely it will be, since the
sectors accessed on the device must be adjacent.

Change-Id: I6afba85a2063dd3350a0141ba87cf8440ce9f777
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflicts]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:02:00 -07:00
Gilad Broner
a27d5a2f83 block: test-iosched: fix spinlock recursion
spin_lock_irq() / spin_unlock_irq() is used so interrupts are
enabled after unlocking the spinlock. However, it is not guaranteed
they were enabled before.
This change uses the proper irqsave / irqrestore variants instead.
Without it, a spinlock recursion on the scsi request completion path
is possible if completion interrupt occurs when used for UFS testing.

Change-Id: I25a9bf6faaa2bbfedc807111fbcb32276cccea2f
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22 11:01:59 -07:00
Lee Susman
b5bec249a9 block: test-iosched: expose sector_range variable to user
Expose "sector_range", which will indicate to the low-level driver
unit-tests the size (in sectors, starting from "start_sector") of the
address space in which they can perform I/O operations. This user-defined
variable can be used to change the address space size from the default
512MiB.

Change-Id: I515a6849eb39b78e653f4018993a2c8e64e2a77f
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2016-03-22 11:01:58 -07:00
Gilad Broner
f3664f905d block: test-iosched: fix bio allocation for test requests
Unit tests submit large requests of 512KB made of 128 bios.
Current allocation was done via kmalloc which may not be able
to allocate such a large buffer which is also physically contiguous.
Using kmalloc to allocate each bio separately is also problematic as
it might not be page aligned. Some bio may end up using more than a
single memory segment which will fail the mapping of the bios to
the request which supports up to 128 physical segments.
To avoid such failure, allocate a separate page for each bio
(bio size is single page size).

Change-Id: Id0da394d458942e093d924bc7aa23aa3231cdca7
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:58 -07:00
Gilad Broner
60f3461f17 block: test-iosched: enable running of simultaneous tests
Current test-iosched design enables running only a single test
for a single block device.
This change modifies the test-iosched framework to allow running
several tests on several block devices.

Change-Id: I051d842733873488b64e89053d9c4e30e1249870
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
[merez@codeaurora.org: fix conflicts due to removal of BKOPs UT]
Signed-off-by: Maya Erez <merez@codeaurora.org>
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to ufs_test.c and
mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:57 -07:00
Gilad Broner
ddb20cd6f0 block: test-iosched: fix spinlock recursion
blk_run_queue() takes the queue spinlock and disabled irqs.
Consider the following callstack:

blk_run_queue
 ->__blk_run_queue
  -> scsi_request_fn
   -> blk_peek_request
    -> __elv_next_request
     -> elevator_dispatch_fn
      -> test_dispatch_requests
       -> test_dispatch_from

test_dispatch_from() will release the test-iosched spinlock
using spin_unlock_irq which will enable interrupts, however,
caller is assuming interrupts are disabled.
An interrupt can occur now and scsi soft-irq may be scheduled
with the following call stack:

scsi_softirq_done
 -> scsi_finish_command
  -> scsi_device_unbusy

scsi_device_unbusy() tries to lock the queue spinlock which was
previously locked when blk_run_queue was called, resulting in a
spinlock recursion.

Change test_dispatch_from() to use the spinlock irq save/restore variants
to prevent enabling the irq in case they were previously disabled.

Change-Id: Icaea4f9ba54771edb0302c6005047fcc5478ce8d
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22 11:01:56 -07:00
Gilad Broner
2e5c505a50 block: test-iosched: remove test timeout timer
When running a test, a timer was set to detect test timeout
and to unblock the wait_event() function which is waiting for the
test to finish. This is redundant as wait_event timeout variant
gives the same functionality without the overhead of managing a
timer for this purpose and improve code readability.

Change-Id: Icbd3cb0f3fcb5854673f4506b102b0c80e97d6bb
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
2016-03-22 11:01:55 -07:00
Maya Erez
e355904bbf block: test-iosched: expose APIs to allow compiling ufs_test as a module
The UFS tests are used for testing the functionality and performance
of the UFS driver. Some of the tests call compare_buffer_to_pattern
for data integrity checking. This function should be exposed in order
to allow compilation of ufs_test as a module.

Change-Id: I2279b0ae9dbdf4ecad073fab2b15116be2ea1713
Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22 11:01:54 -07:00
Dolev Raviv
949aad284e scsi: ufs: mixed long sequential
The test will verify correctness of sequential data pattern
written to the device while new data (with same pattern) is
written simultaneously.
First this test will run a long sequential write scenario.
This first stage will write the pattern that will be read
later. Second, sequential read requests will read and
compare the same data. The second stage reads, will issue in
Parallel to write requests with the same LBA and size.
NOTE: The test requires a long timeout.

The purpose of this test is to mix read and write requests on the same
LBA while checking for the read data correctness.

Change-Id: I6a437ce689b66233af3055d07a7f62f1e7b40765
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
[venkatg@codeaurora.org: Changes to ufs_test.c are already
present as part of earlier commit, hence drop them here]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:53 -07:00
Dolev Raviv
0c46efdd45 scsi: ufs: add support for test specific completion check
Introduce a new callback 'check_test_completion_fn' to test-iosched
framework. This callback is necessary to determine if a test has
completed or not in situation where the request queue is empty, but the
test was not completed.

Change-Id: I60bd8cccffacab11a5a7cba78caccf53fea3e1d8
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
[venkatg@codeaurora.org: Changes to ufs_test.c are already
present as part of earlier commit, hence drop them here]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:52 -07:00
Lee Susman
d4670e5bb0 scsi: ufs: long sequential read/write tests
This test adds the ability to test the UFS task management feature
in the driver. It loads the queue with requests in order to allow
the task management to operate in full capacity.

Modify test-iosched infrastructure to support the new tests:
- expose  check_test_completion()

Note: we submit 16-bio requests since the current HW is very slow
and we don't want to exceed the timeout duration.

Change-Id: I8ee752cba3c6838d8edc05747fa0288c4b347ef6
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: fix trivial conflicts in ufs_test.c]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Changes to ufs_test.c are already
present as part of earlier commit, hence drop them here]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:51 -07:00
Maya Erez
2dcf7d92a9 block: test-iosched: fix test-iosched compilation
Fix test-iosched compilation issues due to differences in data
structures in kernel-3.14.

Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22 11:01:51 -07:00
Lee Susman
fcb6ce0633 block: add test bio size define to test-iosched
Add a define for the test bio size (which is the size of a page),
this is used for allocating the right sized buffer for the bio during
test request creation.

Change-Id: I9505c85c4352009bdee442172eb8ae8f4254cfb0
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
2016-03-22 11:01:50 -07:00
Maya Erez
819ba4dd19 mmc: Unit test fix for logging
Update logging with:
- prefix with module name
- add '\n' in the end
- test_pr_* removed

Change-Id: I465c9809def9d294dcbb3f7cf7f474c189f5fdbf
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
[merez@codeaurora.org: fix conflicts due to removal of bkops tests]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:49 -07:00
Dolev Raviv
dbe45fb01e block: test-iosched: disable statistic flag on request
The flag REQ_IO_STAT is enabled by default this assumes statistics are
initialized and might cause NULL references in the kernel. To avoid it
this flag is cleared in the request and stats are not updated.

Change-Id: I6a1890dde51dfa8ffdd376b13f4466c9db0ae05b
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22 11:01:48 -07:00
Lee Susman
ef94ce156f mmc: card: change long_sequential_test time measurements to ktime
Change time measurements in long_sequential_test from jiffies to ktime,
and make the relevant change in test-iosched infrastructure.

In long_sequential_test we measure throughput, and the jiffies resolution
is not sensitive enough for this calculation.

Change-Id: If7c9a03c687f61996609c014e056bcd7132b9012
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:47 -07:00
Dolev Raviv
2bf9babb4d block: fix test crashing due to synchronization issue
The __blk_run_queue function is called from several contexts. The fix is
replacing it with blk_run_queue function, this function is guarded with
a lock, thus making it thread safe and prevents the crashing.

Change-Id: I3e12fa9c8b9e161375fffa3570abfa46b223a60b
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22 11:01:46 -07:00
Lee Susman
da2d513f18 mmc: enhance long_sequential_test for higher throughput
Change the test design so that requests are dynamically created
and freed. This enables running tests with more than 128 requests,
therefore more than 50MiB can be written/read and makes it possible
to measure driver write/read throughput more accurately.

Change-Id: I56c9d6c1afba5c91a0621a16d97feafd4689521d
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: fix conflicts due to BKOPS tests removal]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:45 -07:00
Dolev Raviv
a3c942633c block: test-iosched: Add support for setting rq_disk
Some block devices requires the rq_disk field to be assigned.
This patch exposes a new API to the block device test utility for
getting the rq_disk assigned, in the created request.

Change-Id: I61dc4dad50eb7600728156a6cd08bb1ee134df0d
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22 11:01:45 -07:00
Lee Susman
28e42664db mmc: new request notification unit-test
The new request notification test checks the following scenario:
A new request arrives after a NULL request was sent to the mmc_queue,
which is waiting for completion of a former request.

Change-Id: I05db0959ded400e292eb5e84e1ecfc579b78ee62
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: fixed conflicts due to removal of BKOPS tests]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:44 -07:00
Lee Susman
ed448b7a78 block: test-iosched infrastructure enhancement
Add functionality to test-iosched so that it could simulate the
ROW scheduler behaviour. The main additions are:
- 3 distinct requests queue with counters
- support for urgent request pending
- reinsert request implementation (callback + dispatch behavior)

Change-Id: I83b5d9e3d2b8cd9a2353afa6a3e6a4cbc83b0cd4
Signed-off-by: Konstantin Dorfman <kdorfman@codeaurora.org>
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: fixed conflicts due to bkops tests removal]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Dropping elevator is_urgent_fn and
reinsert_req_fn ops fn as they are not present in 3.18 kernel]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:43 -07:00
Maya Erez
613fc2f667 mmc: Enable eMMC unit-tests
Enable the compilation of eMMC4.5 unit-tests, required by APT team.
This will allow the APT team to test the storage activity on released
builds.
The storage tests are disabled in normal operation and in order to
activate them a test I/O scheduler should be chosen and the test should
be triggered via debugfs. Therefore they have no effect on normal eMMC
driver operation.

Change-Id: I179c567f67cc8fab9ed1edab8246483de18bc76a
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Fixed trivial merge conflict]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:41 -07:00
Lee Susman
4f84e1c313 mmc: improve mmc_block_test printouts
Change the printout format to be more readable.
Specifically, add quotes around the test case name
strings.

Change-Id: I51b0c1b94389e4b51af84c5e993207b18efc2226
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: fix conflicts as BKOPS tests were removed]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:41 -07:00
Lee Susman
0162f826f3 mmc: card: Add long sequential read test to test-iosched
Long sequential read test measures read throughput
at the driver level by reading large requests sequentially.

Change-Id: I3b6d685930e1d0faceabbc7d20489111734cc9d4
Signed-off-by: Lee Susman <lsusman@codeaurora.org>
[merez@codeaurora.org: Fix conflicts as BKOPS tests were removed]
Signed-off-by: Maya Erez <merez@codeaurora.org>
[venkatg@codeaurora.org: Drop changes to mmc_block_test.c]
Signed-off-by: Venkat Gopalakrishnan <venkatg@codeaurora.org>
2016-03-22 11:01:40 -07:00
Tatyana Brokhman
a98628c1c4 block: test-iosched: Sleep before each test
In order to be sure that the packing statistics collected after the test
reflect *only* requests issued by the test (and not real request from
FS) - sleep before each test in order to give an already dispatched
requests time to complete.

Change-Id: If2f40efad1d79084a8ea85afe93cce58e49ff698
CRs-Fixed: 453712
Signed-off-by: Tatyana Brokhman <tlinder@codeaurora.org>
2016-03-22 11:01:39 -07:00
Maya Erez
ae83bbb322 block: test-iosched error handling fixes
- Fix test-iosched crash when running multiple tests
- Free the BIOs memory when a request is not completed

Change-Id: I1baa916c04ae73c809dee7e67ec63f4546dc71aa
Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22 11:01:38 -07:00
Maya Erez
395df99674 block: Add test-iosched scheduler
The test scheduler allows testing a block device by dispatching
specific requests according to the test case and declare PASS/FAIL
according to the requests completion error code

Change-Id: Ief91f9fed6e3c3c75627d27264d5252ea14f10ad
Signed-off-by: Maya Erez <merez@codeaurora.org>
2016-03-22 11:01:37 -07:00
Dolev Raviv
5904916689 block: blk-flush: Add support for Barrier flag
A barrier request is used to control ordering of write requests without
clearing the device's cache. LLD support for barrier is optional. If LLD
doesn't support barrier, flush will be issued instead to insure logical
correctness.
To maintain this fallback flush s/w path and flags are appended.
This patch implements the necessary requests marking in order to support
the barrier feature in the block layer.

This patch implements two major changes required for the barrier support.
(1) A new flush execution-policy is added to support "ordered" requests
    and a fallback , in case barrier is not supported by LLD.
(2) If there is a flush pending in the flush-queue, the received barrier
    is ignored, in order not to miss a demand for an actual flush.

Change-Id: I6072d759e5c3bd983105852d81732e949da3d448
Signed-off-by: Dolev Raviv <draviv@codeaurora.org>
2016-03-22 11:01:36 -07:00
San Mehat
00074131ac block: genhd: Add disk/partition specific uevent callbacks for partition info
For disk devices, a new uevent parameter 'NPARTS' specifies the number
of partitions detected by the kernel. Partition devices get 'PARTN' which
specifies the partitions index in the table, and 'PARTNAME', which
specifies PARTNAME specifices the partition name of a partition device

Signed-off-by: Dima Zavin <dima@android.com>
2016-02-16 13:51:12 -08:00
Jens Axboe
6126eb2483 Revert "block: Split bios on chunk boundaries"
This reverts commit d380561113.

If we end up splitting on the first segment, we don't adjust
the sector count. That results in hitting a BUG() with attempting
to split 0 sectors.

As this is just a performance issue and not a regression since
4.3 release, let's just rever this change. That gives us more
time to test a real fix for 4.5, which would be marked for
stable anyway.
2016-01-08 09:00:29 -07:00
Jens Axboe
21491412f2 block: add blk_start_queue_async()
We currently only have an inline/sync helper to restart a stopped
queue. If drivers need an async version, they have to roll their
own. Add a generic helper instead.

Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-28 13:07:07 -07:00
Keith Busch
d380561113 block: Split bios on chunk boundaries
For h/w that advertise their block storage's underlying chunk size, it's
a big performance win to not submit commands that cross them. This patch
uses that criteria if it is provided. If it is not provided, this patch
uses the max sectors as before.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 17:19:25 -07:00
Linus Torvalds
24bc3ea5df Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "Three small fixes for 4.4 final. Specifically:

   - The segment issue fix from Junichi, where the old IO path does a
     bio limit split before potentially bouncing the pages.  We need to
     do that in the right order, to ensure that limitations are met.

   - A NVMe surprise removal IO hang fix from Keith.

   - A use-after-free in null_blk, introduced by a previous patch in
     this series.  From Mike Krinkin"

* 'for-linus' of git://git.kernel.dk/linux-block:
  null_blk: fix use-after-free error
  block: ensure to split after potentially bouncing a bio
  NVMe: IO ending fixes on surprise removal
2015-12-22 16:00:25 -08:00
Junichi Nomura
23688bf4f8 block: ensure to split after potentially bouncing a bio
blk_queue_bio() does split then bounce, which makes the segment
counting based on pages before bouncing and could go wrong. Move
the split to after bouncing, like we do for blk-mq, and the we
fix the issue of having the bio count for segments be wrong.

Fixes: 54efd50bfd ("block: make generic_make_request handle arbitrarily sized bios")
Cc: stable@vger.kernel.org
Tested-by: Artem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-22 10:26:53 -07:00
Linus Torvalds
7807563183 Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A set of fixes for the current series.  This contains:

   - A bunch of fixes for lightnvm, should be the last round for this
     series.  From Matias and Wenwei.

   - A writeback detach inode fix from Ilya, also marked for stable.

   - A block (though it says SCSI) fix for an OOPS in SCSI runtime power
     management.

   - Module init error path fixes for null_blk from Minfei"

* 'for-linus' of git://git.kernel.dk/linux-block:
  null_blk: Fix error path in module initialization
  lightnvm: do not compile in debugging by default
  lightnvm: prevent gennvm module unload on use
  lightnvm: fix media mgr registration
  lightnvm: replace req queue with nvmdev for lld
  lightnvm: comments on constants
  lightnvm: check mm before use
  lightnvm: refactor spin_unlock in gennvm_get_blk
  lightnvm: put blks when luns configure failed
  lightnvm: use flags in rrpc_get_blk
  block: detach bdev inode from its wb in __blkdev_put()
  SCSI: Fix NULL pointer dereference in runtime PM
2015-12-12 10:24:00 -08:00
Tejun Heo
0b98f0c042 Merge branch 'master' into for-4.4-fixes
The following commit which went into mainline through networking tree

  3b13758f51 ("cgroups: Allow dynamically changing net_classid")

conflicts in net/core/netclassid_cgroup.c with the following pending
fix in cgroup/for-4.4-fixes.

  1f7dd3e5a6 ("cgroup: fix handling of multi-destination migration from subtree_control enabling")

The former separates out update_classid() from cgrp_attach() and
updates it to walk all fds of all tasks in the target css so that it
can be used from both migration and config change paths.  The latter
drops @css from cgrp_attach().

Resolve the conflict by making cgrp_attach() call update_classid()
with the css from the first task.  We can revive @tset walking in
cgrp_attach() but given that net_cls is v1 only where there always is
only one target css during migration, this is fine.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Nina Schiff <ninasc@fb.com>
2015-12-07 10:09:03 -05:00
Linus Torvalds
19190f5ea9 SCSI fixes on 20151205
This is quite a bumper crop of fixes: Three from Arnd correcting various build
 issues in some configurations, a lock recursion in qla2xxx.  Two potentially
 exploitable issues in hpsa and mvsas, a potential null deref in st, A revert
 of a bdi registration fix that turned out to cause even more problems, a set
 of fixes to allow people who only defined MPT2SAS to still work after the
 mpt2/mpt3sas merger and a couple of fixes for issues turned up by the hyper-v
 storvsc driver.
 
 Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2
 
 iQEcBAABAgAGBQJWY8fDAAoJEDeqqVYsXL0MPhwH/Rs05LUU2vnlaZDdDpH5zY56
 YQgKh5duF+ZH+Y4NxX5kkLLo05wpE6xD5xp2yzzmnjTA0Uf/yLVHNdb5D6tRZgSo
 mZjAX+/wGDb/ErwvwTk/K2mhEvB0iZJJVMyWcG3F9dKgciRCF/p1Gn5EarGmc+vM
 w/9xrs1j24Pw7ipHgBj9zU13w+SPMI7LunR0oYL9CJg24jgXG9sAbrwLkox5kHLo
 FFBCrZhev1mzFKa1C+Ln3s0iSf0yEQMd4khzPJAUElkw812PZ7I6r4bCP0ZPKDed
 JR8zex9jo77RyWZwA7fIathA0/ujv0AeIRXgvzb0/io1Yk577r98vt+S3koQVK8=
 =ptgb
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is quite a bumper crop of fixes: three from Arnd correcting
  various build issues in some configurations, a lock recursion in
  qla2xxx.  Two potentially exploitable issues in hpsa and mvsas, a
  potential null deref in st, a revert of a bdi registration fix that
  turned out to cause even more problems, a set of fixes to allow people
  who only defined MPT2SAS to still work after the mpt2/mpt3sas merger
  and a couple of fixes for issues turned up by the hyper-v storvsc
  driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  mpt3sas: fix Kconfig dependency problem for mpt2sas back compatibility
  Revert "scsi: Fix a bdi reregistration race"
  mpt3sas: Add dummy Kconfig option for backwards compatibility
  Fix a memory leak in scsi_host_dev_release()
  block/sd: Fix device-imposed transfer length limits
  scsi_debug: fix prevent_allow+verify regressions
  MAINTAINERS: Add myself as co-maintainer of the SCSI subsystem.
  sd: Make discard granularity match logical block size when LBPRZ=1
  scsi: hpsa: select CONFIG_SCSI_SAS_ATTR
  scsi: advansys needs ISA dma api for ISA support
  scsi_sysfs: protect against double execution of __scsi_remove_device()
  st: fix potential null pointer dereference.
  scsi: report 'INQUIRY result too short' once per host
  advansys: fix big-endian builds
  qla2xxx: Fix rwlock recursion
  hpsa: logical vs bitwise AND typo
  mvsas: don't allow negative timeouts
  mpt3sas: Fix use sas_is_tlr_enabled API before enabling MPI2_SCSIIO_CONTROL_TLR_ON flag
2015-12-06 08:02:25 -08:00
Ken Xue
4fd41a8552 SCSI: Fix NULL pointer dereference in runtime PM
The routines in scsi_pm.c assume that if a runtime-PM callback is
invoked for a SCSI device, it can only mean that the device's driver
has asked the block layer to handle the runtime power management (by
calling blk_pm_runtime_init(), which among other things sets q->dev).

However, this assumption turns out to be wrong for things like the ses
driver.  Normally ses devices are not allowed to do runtime PM, but
userspace can override this setting.  If this happens, the kernel gets
a NULL pointer dereference when blk_post_runtime_resume() tries to use
the uninitialized q->dev pointer.

This patch fixes the problem by checking q->dev in block layer before
handle runtime PM. Since ses doesn't define any PM callbacks and call
blk_pm_runtime_init(), the crash won't occur.

This fixes Bugzilla #101371.
https://bugzilla.kernel.org/show_bug.cgi?id=101371

More discussion can be found from below link.
http://marc.info/?l=linux-scsi&m=144163730531875&w=2

Signed-off-by: Ken Xue <Ken.Xue@amd.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Cc: Xiangliang Yu <Xiangliang.Yu@amd.com>
Cc: James E.J. Bottomley <JBottomley@odin.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Michael Terry <Michael.terry@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-12-03 20:35:02 -07:00
James Bottomley
be9e2f775f Merge branch 'mkp-fixes' into fixes 2015-12-03 09:32:33 -08:00
Tejun Heo
1f7dd3e5a6 cgroup: fix handling of multi-destination migration from subtree_control enabling
Consider the following v2 hierarchy.

  P0 (+memory) --- P1 (-memory) --- A
                                 \- B
       
P0 has memory enabled in its subtree_control while P1 doesn't.  If
both A and B contain processes, they would belong to the memory css of
P1.  Now if memory is enabled on P1's subtree_control, memory csses
should be created on both A and B and A's processes should be moved to
the former and B's processes the latter.  IOW, enabling controllers
can cause atomic migrations into different csses.

The core cgroup migration logic has been updated accordingly but the
controller migration methods haven't and still assume that all tasks
migrate to a single target css; furthermore, the methods were fed the
css in which subtree_control was updated which is the parent of the
target csses.  pids controller depends on the migration methods to
move charges and this made the controller attribute charges to the
wrong csses often triggering the following warning by driving a
counter negative.

 WARNING: CPU: 1 PID: 1 at kernel/cgroup_pids.c:97 pids_cancel.constprop.6+0x31/0x40()
 Modules linked in:
 CPU: 1 PID: 1 Comm: systemd Not tainted 4.4.0-rc1+ #29
 ...
  ffffffff81f65382 ffff88007c043b90 ffffffff81551ffc 0000000000000000
  ffff88007c043bc8 ffffffff810de202 ffff88007a752000 ffff88007a29ab00
  ffff88007c043c80 ffff88007a1d8400 0000000000000001 ffff88007c043bd8
 Call Trace:
  [<ffffffff81551ffc>] dump_stack+0x4e/0x82
  [<ffffffff810de202>] warn_slowpath_common+0x82/0xc0
  [<ffffffff810de2fa>] warn_slowpath_null+0x1a/0x20
  [<ffffffff8118e031>] pids_cancel.constprop.6+0x31/0x40
  [<ffffffff8118e0fd>] pids_can_attach+0x6d/0xf0
  [<ffffffff81188a4c>] cgroup_taskset_migrate+0x6c/0x330
  [<ffffffff81188e05>] cgroup_migrate+0xf5/0x190
  [<ffffffff81189016>] cgroup_attach_task+0x176/0x200
  [<ffffffff8118949d>] __cgroup_procs_write+0x2ad/0x460
  [<ffffffff81189684>] cgroup_procs_write+0x14/0x20
  [<ffffffff811854e5>] cgroup_file_write+0x35/0x1c0
  [<ffffffff812e26f1>] kernfs_fop_write+0x141/0x190
  [<ffffffff81265f88>] __vfs_write+0x28/0xe0
  [<ffffffff812666fc>] vfs_write+0xac/0x1a0
  [<ffffffff81267019>] SyS_write+0x49/0xb0
  [<ffffffff81bcef32>] entry_SYSCALL_64_fastpath+0x12/0x76

This patch fixes the bug by removing @css parameter from the three
migration methods, ->can_attach, ->cancel_attach() and ->attach() and
updating cgroup_taskset iteration helpers also return the destination
css in addition to the task being migrated.  All controllers are
updated accordingly.

* Controllers which don't care whether there are one or multiple
  target csses can be converted trivially.  cpu, io, freezer, perf,
  netclassid and netprio fall in this category.

* cpuset's current implementation assumes that there's single source
  and destination and thus doesn't support v2 hierarchy already.  The
  only change made by this patchset is how that single destination css
  is obtained.

* memory migration path already doesn't do anything on v2.  How the
  single destination css is obtained is updated and the prep stage of
  mem_cgroup_can_attach() is reordered to accomodate the change.

* pids is the only controller which was affected by this bug.  It now
  correctly handles multi-destination migrations and no longer causes
  counter underflow from incorrect accounting.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-and-tested-by: Daniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Aleksa Sarai <cyphar@cyphar.com>
2015-12-03 10:18:21 -05:00
Ming Lei
a88d32af18 blk-merge: fix computing bio->bi_seg_front_size in case of single segment
When bio has only one physical segment, we should set bio's
bi_seg_front_size as the real(final) size of the single segment.

Fixes: 02e707424c2ea(blk-merge: fix blk_bio_segment_split)
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Tested-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-30 13:02:36 -07:00
Hannes Reinecke
bf4e6b4e75 block: Always check queue limits for cloned requests
When a cloned request is retried on other queues it always needs
to be checked against the queue limits of that queue.
Otherwise the calculations for nr_phys_segments might be wrong,
leading to a crash in scsi_init_sgtable().

To clarify this the patch renames blk_rq_check_limits()
to blk_cloned_rq_check_limits() and removes the symbol
export, as the new function should only be used for
cloned requests and never exported.

Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Ewan Milne <emilne@redhat.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Fixes: e2a60da74 ("block: Clean up special command handling logic")
Cc: stable@vger.kernel.org # 3.7+
Acked-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-29 14:37:27 -07:00
Eric Sandeen
77032ca66f Return EBUSY from BLKRRPART for mounted whole-dev fs
Today, blockdev --rereadpt /dev/sda will fail with EBUSY if any
partition of sda is mounted (and will fail with EINVAL if pointed
at a partition).  But it will pass if the entire block device is
formatted with a filesystem and mounted.  I don't think this makes
sense; partitioning should surely not ever change out from under
a mounted device.

So check for bdev->bd_super, and fail that with -EBUSY as well.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-25 20:49:24 -07:00
Martin K. Petersen
ca369d51b3 block/sd: Fix device-imposed transfer length limits
Commit 4f258a4634 ("sd: Fix maximum I/O size for BLOCK_PC requests")
had the unfortunate side-effect of removing an implicit clamp to
BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer
code. This caused problems for some SMR drives.

Debugging this issue revealed a few problems with the existing
infrastructure since the block layer didn't know how to deal with
device-imposed limits, only limits set by the I/O controller.

 - Introduce a new queue limit, max_dev_sectors, which is used by the
   ULD to signal the maximum sectors for a REQ_TYPE_FS request.

 - Ensure that max_dev_sectors is correctly stacked and taken into
   account when overriding max_sectors through sysfs.

 - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer
   values for later processing.

 - In sd_revalidate() set the queue's max_dev_sectors based on the
   MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value
   is not reported, fall back to a cap based on the CDB TRANSFER LENGTH
   field size.

 - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits
   VPD--if reported and sane--to signal the preferred device transfer
   size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS.

 - blk_limits_max_hw_sectors() is no longer used and can be removed.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: sweeneygj@gmx.com
Tested-by: Arzeets <anatol.pomozov@gmail.com>
Tested-by: David Eisner <david.eisner@oriel.oxon.org>
Tested-by: Mario Kicherer <dev@kicherer.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2015-11-25 21:38:58 -05:00
Jens Axboe
dcd8376c36 Revert "blk-flush: Queue through IO scheduler when flush not required"
This reverts commit 1b2ff19e6a.

Jan writes:

--

Thanks for report! After some investigation I found out we allocate
elevator specific data in __get_request() only for non-flush requests. And
this is actually required since the flush machinery uses the space in
struct request for something else. Doh. So my patch is just wrong and not
easy to fix since at the time __get_request() is called we are not sure
whether the flush machinery will be used in the end. Jens, please revert
1b2ff19e6a. Thanks!

I'm somewhat surprised that you can reliably hit the race where flushing
gets disabled for the device just while the request is in flight. But I
guess during boot it makes some sense.

--

So let's just revert it, we can fix the queue run manually after the
fact. This race is rare enough that it didn't trigger in testing, it
requires the specific disable-while-in-flight scenario to trigger.
2015-11-25 10:12:54 -07:00
Christoph Hellwig
55ce0da1da block: fix blk_abort_request for blk-mq drivers
We only added the request to the request list for the !blk-mq case,
so we should only delete it in that case as well.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
2015-11-24 15:24:10 -07:00
Linus Torvalds
4ce01c518e Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
 "A round of fixes/updates for the current series.

  This looks a little bigger than it is, but that's mainly because we
  pushed the lightnvm enabled null_blk change out of the merge window so
  it could be updated a bit.  The rest of the volume is also mostly
  lightnvm.  In particular:

   - Lightnvm.  Various fixes, additions, updates from Matias and
     Javier, as well as from Wenwei Tao.

   - NVMe:
        - Fix for potential arithmetic overflow from Keith.
        - Also from Keith, ensure that we reap pending completions from
          a completion queue before deleting it.  Fixes kernel crashes
          when resetting a device with IO pending.
        - Various little lightnvm related tweaks from Matias.

   - Fixup flushes to go through the IO scheduler, for the cases where a
     flush is not required.  Fixes a case in CFQ where we would be
     idling and not see this request, hence not break the idling.  From
     Jan Kara.

   - Use list_{first,prev,next} in elevator.c for cleaner code.  From
     Gelian Tang.

   - Fix for a warning trigger on btrfs and raid on single queue blk-mq
     devices, where we would flush plug callbacks with preemption
     disabled.  From me.

   - A mac partition validation fix from Kees Cook.

   - Two merge fixes from Ming, marked stable.  A third part is adding a
     new warning so we'll notice this quicker in the future, if we screw
     up the accounting.

   - Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes"

* 'for-linus' of git://git.kernel.dk/linux-block: (32 commits)
  blk-merge: warn if figured out segment number is bigger than nr_phys_segments
  blk-merge: fix blk_bio_segment_split
  block: fix segment split
  blk-mq: fix calling unplug callbacks with preempt disabled
  mac: validate mac_partition is within sector
  mtip32xx: use formatting capability of kthread_create_on_node
  NVMe: reap completion entries when deleting queue
  lightnvm: add free and bad lun info to show luns
  lightnvm: keep track of block counts
  nvme: lightnvm: use admin queues for admin cmds
  lightnvm: missing free on init error
  lightnvm: wrong return value and redundant free
  null_blk: do not del gendisk with lightnvm
  null_blk: use device addressing mode
  null_blk: use ppa_cache pool
  NVMe: Fix possible arithmetic overflow for max segments
  blk-flush: Queue through IO scheduler when flush not required
  null_blk: register as a LightNVM device
  elevator: use list_{first,prev,next}_entry
  lightnvm: cleanup queue before target removal
  ...
2015-11-24 10:26:30 -08:00