Commit graph

579939 commits

Author SHA1 Message Date
Minchan Kim
3730d2be4d mm: prevent double decrease of nr_reserved_highatomic
There is race between page freeing and unreserved highatomic.

 CPU 0				    CPU 1

    free_hot_cold_page
      mt = get_pfnblock_migratetype
      set_pcppage_migratetype(page, mt)
    				    unreserve_highatomic_pageblock
    				    spin_lock_irqsave(&zone->lock)
    				    move_freepages_block
    				    set_pageblock_migratetype(page)
    				    spin_unlock_irqrestore(&zone->lock)
      free_pcppages_bulk
        __free_one_page(mt) <- mt is stale

By above race, a page on CPU 0 could go non-highorderatomic free list
since the pageblock's type is changed.  By that, unreserve logic of
highorderatomic can decrease reserved count on a same pageblock severak
times and then it will make mismatch between nr_reserved_highatomic and
the number of reserved pageblock.

So, this patch verifies whether the pageblock is highatomic or not and
decrease the count only if the pageblock is highatomic.

Change-Id: Ieb4b6c0c98d1797339a94dd4b8033048552c9aad
Link: http://lkml.kernel.org/r/1476259429-18279-3-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 4855e4a7f29d6d10b0b9c84e189c770c9a94e91e
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2017-01-23 13:58:53 -08:00
Minchan Kim
9242902d60 mm: don't steal highatomic pageblock
Patch series "use up highorder free pages before OOM", v3.

I got OOM report from production team with v4.4 kernel.  It had enough
free memory but failed to allocate GFP_KERNEL order-0 page and finally
encountered OOM kill.  It occured during QA process which launches
several apps, switching and so on.  It happned rarely.  IOW, In normal
situation, it was not a problem but if we are unluck so that several
apps uses peak memory at the same time, it can happen.  If we manage to
pass the phase, the system can go working well.

I could reproduce it with my test(memory spike easily.  Look at below.

The reason is free pages(19M) of DMA32 zone are reserved for
HIGHORDERATOMIC and doesn't unreserved before the OOM.

  balloon invoked oom-killer: gfp_mask=0x24280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), order=0, oom_score_adj=0
  balloon cpuset=/ mems_allowed=0
  CPU: 1 PID: 8473 Comm: balloon Tainted: G        W  OE   4.8.0-rc7-00219-g3f74c9559583-dirty #3161
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    dump_header+0x5c/0x1ce
    oom_kill_process+0x22e/0x400
    out_of_memory+0x1ac/0x210
    __alloc_pages_nodemask+0x101e/0x1040
    handle_mm_fault+0xa0a/0xbf0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:383949 inactive_anon:106724 isolated_anon:0
   active_file:15 inactive_file:44 isolated_file:0
   unevictable:0 dirty:0 writeback:24 unstable:0
   slab_reclaimable:2483 slab_unreclaimable:3326
   mapped:0 shmem:0 pagetables:1906 bounce:0
   free:6898 free_pcp:291 free_cma:0
  Node 0 active_anon:1535796kB inactive_anon:426896kB active_file:60kB inactive_file:176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:0kB dirty:0kB writeback:96kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:1418 all_unreclaimable? no
  DMA free:8188kB min:44kB low:56kB high:68kB active_anon:7648kB inactive_anon:0kB active_file:0kB inactive_file:4kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:20kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:19404kB min:5628kB low:7624kB high:9620kB active_anon:1528148kB inactive_anon:426896kB active_file:60kB inactive_file:420kB unevictable:0kB writepending:96kB present:2080640kB managed:2030092kB mlocked:0kB slab_reclaimable:9932kB slab_unreclaimable:13284kB kernel_stack:2496kB pagetables:7624kB bounce:0kB free_pcp:900kB local_pcp:112kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 0*4kB 0*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 2*4096kB (H) = 8192kB
  DMA32: 7*4kB (H) 8*8kB (H) 30*16kB (H) 31*32kB (H) 14*64kB (H) 9*128kB (H) 2*256kB (H) 2*512kB (H) 4*1024kB (H) 5*2048kB (H) 0*4096kB = 19484kB
  51131 total pagecache pages
  50795 pages in swap cache
  Swap cache stats: add 3532405601, delete 3532354806, find 124289150/1822712228
  Free swap  = 8kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12658 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

Another example exceeded the limit by the race is

  in:imklog: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
  CPU: 0 PID: 476 Comm: in:imklog Tainted: G            E   4.8.0-rc7-00217-g266ef83c51e5-dirty #3135
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
  Call Trace:
    dump_stack+0x63/0x90
    warn_alloc_failed+0xdb/0x130
    __alloc_pages_nodemask+0x4d6/0xdb0
    new_slab+0x339/0x490
    ___slab_alloc.constprop.74+0x367/0x480
    __slab_alloc.constprop.73+0x20/0x40
    __kmalloc+0x1a4/0x1e0
    alloc_indirect.isra.14+0x1d/0x50
    virtqueue_add_sgs+0x1c4/0x470
    __virtblk_add_req+0xae/0x1f0
    virtio_queue_rq+0x12d/0x290
    __blk_mq_run_hw_queue+0x239/0x370
    blk_mq_run_hw_queue+0x8f/0xb0
    blk_mq_insert_requests+0x18c/0x1a0
    blk_mq_flush_plug_list+0x125/0x140
    blk_flush_plug_list+0xc7/0x220
    blk_finish_plug+0x2c/0x40
    __do_page_cache_readahead+0x196/0x230
    filemap_fault+0x448/0x4f0
    ext4_filemap_fault+0x36/0x50
    __do_fault+0x75/0x140
    handle_mm_fault+0x84d/0xbe0
    __do_page_fault+0x1dd/0x4d0
    trace_do_page_fault+0x43/0x130
    do_async_page_fault+0x1a/0xa0
    async_page_fault+0x28/0x30
  Mem-Info:
  active_anon:363826 inactive_anon:121283 isolated_anon:32
   active_file:65 inactive_file:152 isolated_file:0
   unevictable:0 dirty:0 writeback:46 unstable:0
   slab_reclaimable:2778 slab_unreclaimable:3070
   mapped:112 shmem:0 pagetables:1822 bounce:0
   free:9469 free_pcp:231 free_cma:0
  Node 0 active_anon:1455304kB inactive_anon:485132kB active_file:260kB inactive_file:608kB unevictable:0kB isolated(anon):128kB isolated(file):0kB mapped:448kB dirty:0kB writeback:184kB shmem:0kB writeback_tmp:0kB unstable:0kB pages_scanned:13641 all_unreclaimable? no
  DMA free:7748kB min:44kB low:56kB high:68kB active_anon:7944kB inactive_anon:104kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:108kB kernel_stack:0kB pagetables:4kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
  lowmem_reserve[]: 0 1952 1952 1952
  DMA32 free:30128kB min:5628kB low:7624kB high:9620kB active_anon:1447360kB inactive_anon:485028kB active_file:260kB inactive_file:608kB unevictable:0kB writepending:184kB present:2080640kB managed:2030132kB mlocked:0kB slab_reclaimable:11112kB slab_unreclaimable:12172kB kernel_stack:2400kB pagetables:7284kB bounce:0kB free_pcp:924kB local_pcp:72kB free_cma:0kB
  lowmem_reserve[]: 0 0 0 0
  DMA: 7*4kB (UE) 3*8kB (UH) 1*16kB (M) 0*32kB 2*64kB (U) 1*128kB (M) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 1*4096kB (H) = 7748kB
  DMA32: 10*4kB (H) 3*8kB (H) 47*16kB (H) 38*32kB (H) 5*64kB (H) 1*128kB (H) 2*256kB (H) 3*512kB (H) 3*1024kB (H) 3*2048kB (H) 4*4096kB (H) = 30128kB
  2775 total pagecache pages
  2536 pages in swap cache
  Swap cache stats: add 206786828, delete 206784292, find 7323106/106686077
  Free swap  = 108744kB
  Total swap = 255996kB
  524158 pages RAM
  0 pages HighMem/MovableOnly
  12648 pages reserved
  0 pages cma reserved
  0 pages hwpoisoned

During the investigation, I found some problems with highatomic so this
patch aims to solve the problems and the final goal is to unreserve
every highatomic free pages before the OOM kill.

This patch (of 4):

In page freeing path, migratetype is racy so that a highorderatomic page
could free into non-highorderatomic free list.  If that page is
allocated, VM can change the pageblock from higorderatomic to something.
In that case, highatomic pageblock accounting is broken so it doesn't
work(e.g., VM cannot reserve highorderatomic pageblocks any more
although it doesn't reach 1% limit).

So, this patch prohibits the changing from highatomic to other type.
It's no problem because MIGRATE_HIGHATOMIC is not listed in fallback
array so stealing will only happen due to unexpected races which is
really rare.  Also, such prohibiting keeps highatomic pageblock more
longer so it would be better for highorderatomic page allocation.

Change-Id: I15c2f91965eb4c35a2a53dc43f9acb8945922198
Link: http://lkml.kernel.org/r/1476259429-18279-2-git-send-email-minchan@kernel.org
Signed-off-by: Minchan Kim <minchan@kernel.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Sangseok Lee <sangseok.lee@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Git-commit: 88ed365ea227aa28841a8d6e196c9a261c76fffd
Git-repo: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Signed-off-by: Liam Mark <lmark@codeaurora.org>
2017-01-23 13:57:38 -08:00
Linux Build Service Account
e76c8c8e73 Merge "qcom-charger: Add ship mode support" 2016-12-20 23:45:23 -08:00
Linux Build Service Account
14e5892a7e Merge "power: power_supply: Add property for ship mode" 2016-12-20 23:45:22 -08:00
Linux Build Service Account
27e623187a Merge "ARM: dts: msm: Add initial device tree for APQ FALCON" 2016-12-20 23:45:20 -08:00
Linux Build Service Account
451146c26b Merge "ARM: dts: msm: Add msmfalcon device tree files for internal codec" 2016-12-20 23:45:20 -08:00
Linux Build Service Account
dba91385de Merge "sched: Avoid packing tasks with low sleep time" 2016-12-20 23:45:19 -08:00
Linux Build Service Account
7d4b9fcecb Merge "ARM: dts: msm: Add compute context banks for msmfalcon" 2016-12-20 23:45:18 -08:00
Linux Build Service Account
aa94ab5384 Merge "ARM: dts: msm: enable SSC based sensors for msmfalcon MTP/CDP" 2016-12-20 23:45:17 -08:00
Linux Build Service Account
86fb073705 Merge "msm: kgsl: Get pages from the system incase mempool is not configured" 2016-12-20 23:45:15 -08:00
Linux Build Service Account
bcf71a62b0 Merge "usb: phy: qusb: Keep LDOs ON during disconnect if PMI voted for it" 2016-12-20 23:45:14 -08:00
Linux Build Service Account
cc4405e27b Merge "msm: cpp: Use the micro reset binding to decide if micro can be reset" 2016-12-20 23:45:13 -08:00
Linux Build Service Account
70e6c2b913 Merge "ARM: dts: msm: Add cpp micro reset flag for 8998 and 8996" 2016-12-20 23:45:13 -08:00
Linux Build Service Account
4a3a2bdbcd Merge "ARM: dts: msm: Update the QoS settings for msmfalcon" 2016-12-20 23:45:11 -08:00
Linux Build Service Account
ab81637373 Merge "msm: msm_bus: Add new bus master id for pimem" 2016-12-20 23:45:10 -08:00
Linux Build Service Account
18f836e908 Merge "thermal: tsens: Add support to parse critical interrupt properties" 2016-12-20 23:45:10 -08:00
Linux Build Service Account
11418bd51e Merge "ARM: dts: msm: add pinctrl configuration for Touchscreen GPIOs on MSMFALCON" 2016-12-20 23:45:09 -08:00
Linux Build Service Account
2137d589df Merge "ARM: dts: msm: Disable lpm sleep modes for msmtriton" 2016-12-20 23:45:07 -08:00
Linux Build Service Account
2943e1c110 Merge "ARM: dts: msm: add iommu test device nodes for msmfalcon" 2016-12-20 23:45:06 -08:00
Linux Build Service Account
865922fc2b Merge "usb: phy: qusb2: Enable phy auto-resume" 2016-12-20 23:45:05 -08:00
Linux Build Service Account
c8ea5eb005 Merge "usb: pd: Register power_supply notifier after completing init" 2016-12-20 23:45:01 -08:00
Linux Build Service Account
9bbfc2736c Merge "ASoC: msm: qdsp6v2: Modify wait event and cmd state check" 2016-12-20 23:45:00 -08:00
Linux Build Service Account
e188618fe0 Merge "ARM: dts: msm: Add audio nodes for msmfalcon internal codec" 2016-12-20 23:44:59 -08:00
Linux Build Service Account
878cd8b31c Merge "ASoC: msm: migrate to cdc pinctrl functions" 2016-12-20 23:44:58 -08:00
Linux Build Service Account
ac56f609e6 Merge "ASoC: codecs: Update internal codec as split codecs" 2016-12-20 23:44:57 -08:00
Linux Build Service Account
dc22d4ffc3 Merge "diag: Initialize spin lock once per memory device channel" 2016-12-20 23:44:56 -08:00
Linux Build Service Account
fd1d20a988 Merge "hwmon: qpnp-adc: Initialize variables in get_devicetree function" 2016-12-20 23:44:55 -08:00
Linux Build Service Account
186d081d94 Merge "diag: dci: Protect the client list and command entries" 2016-12-20 23:44:55 -08:00
Linux Build Service Account
02ea5fbed2 Merge "time: sched_clock: record cycle count in suspend and resume" 2016-12-20 23:44:54 -08:00
Linux Build Service Account
2af3bd09b0 Merge "arm: Move topology_init to postcore" 2016-12-20 23:44:53 -08:00
Linux Build Service Account
f322ad9eea Merge "cfg80211: add checks for beacon rate, extend to mesh" 2016-12-20 23:44:52 -08:00
Linux Build Service Account
bd21566fda Merge "scsi: ufs-qcom: skip err message for optional clk" 2016-12-20 23:44:51 -08:00
Linux Build Service Account
668b4dc9cf Merge "msm: mdss: Fix null pointer dereference and unintialisation of variables" 2016-12-20 23:44:51 -08:00
Linux Build Service Account
e68fd6cb62 Merge "ARM: dts: msm: add a new panel driver to enable display for QVR8998" 2016-12-20 23:44:50 -08:00
Linux Build Service Account
1924d2d6cc Merge "msm: sde: return success if no callback function in r1 ctl" 2016-12-20 23:44:49 -08:00
Linux Build Service Account
388069770c Merge "ASoC: msm: Fix memory leakage in dts eagle" 2016-12-20 23:44:48 -08:00
Linux Build Service Account
ab12376e5f Merge "msm: camera: add logic to support sensor compatibility" 2016-12-20 23:44:47 -08:00
Linux Build Service Account
4802720f7b Merge "spi: spi_qsd: Improve latencies for small transfers" 2016-12-20 23:44:47 -08:00
Linux Build Service Account
4367ff18d1 Merge "icnss: Remove hardware reset sequence" 2016-12-20 23:44:46 -08:00
Linux Build Service Account
f6b6eb31d6 Merge "ARM: dts: msm: Update smem id of CDSP PIL for MSMFALCON" 2016-12-20 23:44:45 -08:00
Linux Build Service Account
f316b881fb Merge "USB: dwc3: gadget: Don't queue endless req through generic ep_queue" 2016-12-20 23:44:44 -08:00
Linux Build Service Account
f21063458a Merge "diag: Add support for CDSP" 2016-12-20 23:44:43 -08:00
Linux Build Service Account
9660a520b9 Merge "USB: dwc3-msm: Perform DBM config/unconfig under spinlock protection" 2016-12-20 23:44:42 -08:00
Linux Build Service Account
81cd19901e Merge "USB: dwc3-msm: Disable DBM endpoint in msm_ep_unconfig if no req queued" 2016-12-20 23:44:42 -08:00
Linux Build Service Account
e3acdfc559 Merge "spcom: check size before calling copy_to_user()" 2016-12-20 23:44:41 -08:00
Linux Build Service Account
8473f55faa Merge "scsi: ufs: fixed DUN size for ICE encryption to be 4k" 2016-12-20 23:44:40 -08:00
Linux Build Service Account
243d79cf40 Merge "USB: dwc3-msm: Perform HW reinitialization on HC died error" 2016-12-20 14:05:07 -08:00
Linux Build Service Account
b9d5c739ad Merge "defconfig: msm: enable remote debugger driver" 2016-12-20 14:05:05 -08:00
Linux Build Service Account
67e606e70e Merge "msm-3.18: drivers : added validation of input/output buffer sizes" 2016-12-20 14:05:05 -08:00
Linux Build Service Account
7636267d38 Merge "tty: serial: msm: Add suspend resume support" 2016-12-20 14:05:02 -08:00