android_kernel_oneplus_msm8998/block
Gabriel Krisman Bertazi f4522e36ed blk-mq: Avoid memory reclaim when remapping queues
commit 36e1f3d107867b25c616c2fd294f5a1c9d4e5d09 upstream.

While stressing memory and IO at the same time we changed SMT settings,
we were able to consistently trigger deadlocks in the mm system, which
froze the entire machine.

I think that under memory stress conditions, the large allocations
performed by blk_mq_init_rq_map may trigger a reclaim, which stalls
waiting on the block layer remmaping completion, thus deadlocking the
system.  The trace below was collected after the machine stalled,
waiting for the hotplug event completion.

The simplest fix for this is to make allocations in this path
non-reclaimable, with GFP_NOIO.  With this patch, We couldn't hit the
issue anymore.

This should apply on top of Jens's for-next branch cleanly.

Changes since v1:
  - Use GFP_NOIO instead of GFP_NOWAIT.

 Call Trace:
[c000000f0160aaf0] [c000000f0160ab50] 0xc000000f0160ab50 (unreliable)
[c000000f0160acc0] [c000000000016624] __switch_to+0x2e4/0x430
[c000000f0160ad20] [c000000000b1a880] __schedule+0x310/0x9b0
[c000000f0160ae00] [c000000000b1af68] schedule+0x48/0xc0
[c000000f0160ae30] [c000000000b1b4b0] schedule_preempt_disabled+0x20/0x30
[c000000f0160ae50] [c000000000b1d4fc] __mutex_lock_slowpath+0xec/0x1f0
[c000000f0160aed0] [c000000000b1d678] mutex_lock+0x78/0xa0
[c000000f0160af00] [d000000019413cac] xfs_reclaim_inodes_ag+0x33c/0x380 [xfs]
[c000000f0160b0b0] [d000000019415164] xfs_reclaim_inodes_nr+0x54/0x70 [xfs]
[c000000f0160b0f0] [d0000000194297f8] xfs_fs_free_cached_objects+0x38/0x60 [xfs]
[c000000f0160b120] [c0000000003172c8] super_cache_scan+0x1f8/0x210
[c000000f0160b190] [c00000000026301c] shrink_slab.part.13+0x21c/0x4c0
[c000000f0160b2d0] [c000000000268088] shrink_zone+0x2d8/0x3c0
[c000000f0160b380] [c00000000026834c] do_try_to_free_pages+0x1dc/0x520
[c000000f0160b450] [c00000000026876c] try_to_free_pages+0xdc/0x250
[c000000f0160b4e0] [c000000000251978] __alloc_pages_nodemask+0x868/0x10d0
[c000000f0160b6f0] [c000000000567030] blk_mq_init_rq_map+0x160/0x380
[c000000f0160b7a0] [c00000000056758c] blk_mq_map_swqueue+0x33c/0x360
[c000000f0160b820] [c000000000567904] blk_mq_queue_reinit+0x64/0xb0
[c000000f0160b850] [c00000000056a16c] blk_mq_queue_reinit_notify+0x19c/0x250
[c000000f0160b8a0] [c0000000000f5d38] notifier_call_chain+0x98/0x100
[c000000f0160b8f0] [c0000000000c5fb0] __cpu_notify+0x70/0xe0
[c000000f0160b930] [c0000000000c63c4] notify_prepare+0x44/0xb0
[c000000f0160b9b0] [c0000000000c52f4] cpuhp_invoke_callback+0x84/0x250
[c000000f0160ba10] [c0000000000c570c] cpuhp_up_callbacks+0x5c/0x120
[c000000f0160ba60] [c0000000000c7cb8] _cpu_up+0xf8/0x1d0
[c000000f0160bac0] [c0000000000c7eb0] do_cpu_up+0x120/0x150
[c000000f0160bb40] [c0000000006fe024] cpu_subsys_online+0x64/0xe0
[c000000f0160bb90] [c0000000006f5124] device_online+0xb4/0x120
[c000000f0160bbd0] [c0000000006f5244] online_store+0xb4/0xc0
[c000000f0160bc20] [c0000000006f0a68] dev_attr_store+0x68/0xa0
[c000000f0160bc60] [c0000000003ccc30] sysfs_kf_write+0x80/0xb0
[c000000f0160bca0] [c0000000003cbabc] kernfs_fop_write+0x17c/0x250
[c000000f0160bcf0] [c00000000030fe6c] __vfs_write+0x6c/0x1e0
[c000000f0160bd90] [c000000000311490] vfs_write+0xd0/0x270
[c000000f0160bde0] [c0000000003131fc] SyS_write+0x6c/0x110
[c000000f0160be30] [c000000000009204] system_call+0x38/0xec

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Douglas Miller <dougmill@linux.vnet.ibm.com>
Cc: linux-block@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-04-18 07:14:37 +02:00
..
partitions mac: validate mac_partition is within sector 2015-11-20 08:49:28 -07:00
bio-integrity.c block: blk_flush_integrity() for bio-based drivers 2015-10-21 14:43:44 -06:00
bio.c blk: Ensure users for current->bio_list can see the full list. 2017-04-08 09:53:32 +02:00
blk-cgroup.c blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL 2016-10-28 03:01:33 -04:00
blk-core.c blk: Ensure users for current->bio_list can see the full list. 2017-04-08 09:53:32 +02:00
blk-exec.c block: move PM request support to IDE 2015-05-05 13:40:42 -06:00
blk-flush.c Revert "blk-flush: Queue through IO scheduler when flush not required" 2015-11-25 10:12:54 -07:00
blk-integrity.c block, libnvdimm, nvme: provide a built-in blk_integrity nop profile 2015-10-21 14:43:45 -06:00
blk-ioc.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-iopoll.c Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-03 12:57:53 -07:00
blk-lib.c block: re-add discard_granularity and alignment checks 2015-10-28 09:12:58 +09:00
blk-map.c Don't feed anything but regular iovec's to blk_rq_map_user_iov 2016-12-10 19:07:26 +01:00
blk-merge.c block: make sure a big bio is split into at most 256 bvecs 2016-09-15 08:27:51 +02:00
blk-mq-cpu.c blk-mq: add file comments and update copyright notices 2014-05-28 10:15:41 -06:00
blk-mq-cpumap.c blk-mq: avoid inserting requests before establishing new mapping 2015-09-29 11:32:50 -06:00
blk-mq-sysfs.c block: add block polling support 2015-11-07 10:40:47 -07:00
blk-mq-tag.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
blk-mq-tag.h blk-mq: factor out a helper to iterate all tags for a request_queue 2015-10-01 10:10:57 +02:00
blk-mq.c blk-mq: Avoid memory reclaim when remapping queues 2017-04-18 07:14:37 +02:00
blk-mq.h blk-mq: mark __blk_mq_complete_request() static 2015-11-11 09:36:56 -07:00
blk-settings.c block: Initialize max_dev_sectors to 0 2016-03-09 15:34:49 -08:00
blk-softirq.c block: fix regression with block enabled tagging 2014-04-09 21:54:06 -06:00
blk-sysfs.c Merge branch 'mkp-fixes' into fixes 2015-12-03 09:32:33 -08:00
blk-tag.c block: support different tag allocation policy 2015-01-23 14:15:46 -07:00
blk-throttle.c cgroup: replace cgroup_on_dfl() tests in controllers with cgroup_subsys_on_dfl() 2015-09-18 11:56:28 -04:00
blk-timeout.c block: fix blk_abort_request for blk-mq drivers 2015-11-24 15:24:10 -07:00
blk.h block: protect rw_page against device teardown 2015-11-19 13:47:10 -08:00
bounce.c Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2015-09-19 18:57:09 -07:00
bsg-lib.c bsg: Remove unused function bsg_goose_queue() 2012-12-06 14:33:02 +01:00
bsg.c sg_write()/bsg_write() is not fit to be called under KERNEL_DS 2017-01-09 08:07:53 +01:00
cfq-iosched.c block: cfq_cpd_alloc() should use @gfp 2017-01-19 20:17:22 +01:00
cmdline-parser.c block: remove unrelated header files and export symbol 2014-01-21 20:18:26 -08:00
compat_ioctl.c block, bdi: an active gendisk always has a request_queue associated with it 2014-09-08 10:00:35 -06:00
deadline-iosched.c block: Stop abusing csd.list for fifo_time 2014-02-24 14:46:32 -08:00
elevator.c block: check bio_mergeable() early before merging 2015-10-21 15:00:54 -06:00
genhd.c block: fix bdi vs gendisk lifetime mismatch 2016-08-20 18:09:24 +02:00
ioctl.c block: add an API for Persistent Reservations 2015-10-21 14:46:56 -06:00
ioprio.c block: fix use-after-free in sys_ioprio_get() 2016-08-10 11:49:28 +02:00
Kconfig block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
Kconfig.iosched blkcg: make CONFIG_BLK_CGROUP bool 2012-03-06 21:27:21 +01:00
Makefile block: Add T10 Protection Information functions 2014-09-27 09:14:59 -06:00
noop-iosched.c elevator: use list_{first,prev,next}_entry 2015-11-16 15:21:48 -07:00
partition-generic.c block: partition: initialize percpuref before sending out KOBJ_ADD 2016-05-04 14:48:39 -07:00
scsi_ioctl.c block: allow WRITE_SAME commands with the SG_IO ioctl 2017-03-30 09:35:20 +02:00
t10-pi.c block: Consolidate static integrity profile properties 2015-10-21 14:42:38 -06:00