With zcache using zbud, strange locking scenarios are
observed. The first problem seen is:
Core 2 waiting on mapping->tree_lock which is taken by core 6
do_raw_spin_lock
raw_spin_lock_irq
atomic_cmpxchg
page_freeze_refs
__remove_mapping
shrink_page_list
Core 6 after taking mapping->tree_lock is waiting on zbud pool lock
which is held by core 5
zbud_alloc
zcache_store_page
__cleancache_put_page
cleancache_put_page
__delete_from_page_cache
spin_unlock_irq
__remove_mapping
shrink_page_list
shrink_inactive_list
Core 5 after taking zbud pool lock from zbud_free received an IRQ, and
after IRQ exit, softirqs were scheduled and end_page_writeback tried to
lock on mapping->tree_lock which is already held by Core 6. Deadlock.
do_raw_spin_lock
raw_spin_lock_irqsave
test_clear_page_writeba
end_page_writeback
ext4_finish_bio
ext4_end_bio
bio_endio
blk_update_request
end_clone_bio
bio_endio
blk_update_request
blk_update_bidi_request
blk_end_bidi_request
blk_end_request
mmc_blk_cmdq_complete_r
mmc_cmdq_softirq_done
blk_done_softirq
static_key_count
static_key_false
trace_softirq_exit
__do_softirq()
tick_irq_exit
irq_exit()
set_irq_regs
__handle_domain_irq
gic_handle_irq
el1_irq
exception
__list_del_entry
list_del
zbud_free
zcache_load_page
__cleancache_get_page(?
This shows that allowing softirqs while holding zbud pool lock
can result in deadlocks. To fix this, 'commit 6a1fdaa36272
("mm: zbud: prevent softirq during zbud alloc, free and reclaim")'
decided to take spin_lock_bh during zbud_free, zbud_alloc and
zbud_reclaim. But this resulted in another deadlock.
spin_bug()
do_raw_spin_lock()
_raw_spin_lock_irqsave()
test_clear_page_writeback()
end_page_writeback()
ext4_finish_bio()
ext4_end_bio()
bio_endio()
blk_update_request()
end_clone_bio()
bio_endio()
blk_update_request()
blk_update_bidi_request()
blk_end_request()
mmc_blk_cmdq_complete_rq()
mmc_cmdq_softirq_done()
blk_done_softirq()
__do_softirq()
do_softirq()
__local_bh_enable_ip()
_raw_spin_unlock_bh()
zbud_alloc()
zcache_store_page()
__cleancache_put_page()
__delete_from_page_cache()
__remove_mapping()
shrink_page_list()
Here, the spin_unlock_bh resulted in explicit invocation of do_sofirq,
which resulted in the acquisition of mapping->tree_lock which was already
taken by __remove_mapping.
The new fix considers the following facts.
1) zcache_store_page is always be called from __delete_from_page_cache
with mapping->tree_lock held and interrupts disabled. Thus zbud_alloc
which is called only from zcache_store_page is always called with
interrupts disabled.
2) zbud_free and zbud_reclaim_page can be called from zcache with or
without interrupts disabled. So an interrupt while holding zbud pool
lock can result in do_softirq and acquisition of mapping->tree_lock.
(1) implies zbud_alloc need not explicitly disable bh. But disable
interrupts to make sure zbud_alloc is safe with zcache, irrespective
of future changes. This will fix the second scenario.
(2) implies zbud_free and zbud_reclaim_page should use spin_lock_irqsave,
so that interrupts will not be triggered and inturn softirqs.
spin_lock_bh can't be used because a spin_unlock_bh can triger a softirq
even in interrupt context. This will fix the first scenario.
Change-Id: Ibc810525dddf97614db41643642fec7472bd6a2c
Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org>