android_kernel_oneplus_msm8998/arch
Toshi Kani 6deb0ec93d x86/mm: Fix vmalloc_fault() to handle large pages properly
commit f4eafd8bcd5229e998aa252627703b8462c3b90f upstream.

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

     BUG: unable to handle kernel paging request at ffff880840000ff8
     IP: vmalloc_fault+0x1be/0x300
     PGD c7f03a067 PUD 0
     Oops: 0000 [#1] SM
     Call Trace:
        __do_page_fault+0x285/0x3e0
        do_page_fault+0x2f/0x80
        ? put_prev_entity+0x35/0x7a0
        page_fault+0x28/0x30
        ? memcpy_erms+0x6/0x10
        ? schedule+0x35/0x80
        ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
        ? schedule_timeout+0x183/0x240
        btt_log_read+0x63/0x140 [nd_btt]
         :
        ? __symbol_put+0x60/0x60
        ? kernel_read+0x50/0x80
        SyS_finit_module+0xb9/0xf0
        entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes.  pgd_ctor() sets the kernel's PGD entries to
user's during fork().  When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it.  If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

 - No change for the PGD sync operation as it handles large
   pages already.
 - Add pud_huge() and pmd_huge() to the validation code to
   handle large pages.
 - Change pud_page_vaddr() to pud_pfn() since an ioremap range
   is not directly mapped (while the if-statement still works
   with a bogus addr).
 - Change pmd_page() to pmd_pfn() since an ioremap range is not
   backed by struct page (while the if-statement still works
   with a bogus addr).

32-bit:
 - No change for the sync operation since the index3 PGD entry
   covers the entire vmalloc range, which is always valid.
   (A separate change to sync PGD entry is necessary if this
    memory layout is changed regardless of the page size.)
 - Add pmd_huge() to the validation code to handle large pages.
   This is for completeness since vmalloc_fault() won't happen
   in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2016-02-25 12:01:13 -08:00
..
alpha mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage 2015-11-05 19:34:48 -08:00
arc ARC: dw2 unwind: Catch Dwarf SNAFUs early 2015-12-21 14:01:49 +05:30
arm ARM: SoC fixes for 4.4-rc 2016-01-08 16:11:05 -08:00
arm64 arm64: kernel: fix architected PMU registers unconditional access 2016-01-31 11:29:04 -08:00
avr32 dmaengine updates for 4.4-rc1 2015-11-10 10:05:17 -08:00
blackfin treewide: Remove old email address 2015-11-23 09:44:58 +01:00
c6x irqdomain: Use irq_domain_get_of_node() instead of direct field access 2015-10-13 19:01:23 +02:00
cris cris: Drop reference to get_cmos_time() 2015-11-02 20:03:05 +01:00
frv kmap_atomic_to_page() has no users, remove it 2015-11-09 15:11:24 -08:00
h8300 h8300 update for v4.4 2015-11-12 15:26:39 -08:00
hexagon
ia64 [IA64] Enable mlock2 syscall for ia64 2015-12-14 10:30:02 -08:00
m32r m32r: add io*_rep helpers 2015-12-29 17:45:49 -08:00
m68k m68k: Wire up mlock2 2015-11-22 11:35:26 +01:00
metag Metag architecture changes for v4.4 2015-11-10 16:24:25 -08:00
microblaze Revert "scatterlist: use sg_phys()" 2015-12-15 12:54:06 -08:00
mips Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2016-01-06 16:15:03 -08:00
mn10300 mn10300: Select CONFIG_HAVE_UID16 to fix build failure 2015-11-30 07:01:40 -08:00
nios2 nios2: fix cache coherency 2015-11-26 22:25:58 +08:00
openrisc
parisc parisc: Fix __ARCH_SI_PREAMBLE_SIZE 2016-02-17 12:30:57 -08:00
powerpc powerpc/module: Handle R_PPC64_ENTRY relocations 2016-01-31 11:29:03 -08:00
s390 s390/dis: Fix handling of format specifiers 2015-12-18 14:43:21 +01:00
score
sh sh64: fix __NR_fgetxattr 2015-12-12 10:15:34 -08:00
sparc net: filter: make JITs zero A for SKF_AD_ALU_XOR_X 2016-01-06 00:43:52 -05:00
tile tile: provide CONFIG_PAGE_SIZE_64KB etc for tilepro 2016-01-05 08:16:09 -05:00
um um: fix returns without va_end 2015-12-08 22:26:00 +01:00
unicore32 pwm: Changes for v4.4-rc1 2015-11-11 09:16:10 -08:00
x86 x86/mm: Fix vmalloc_fault() to handle large pages properly 2016-02-25 12:01:13 -08:00
xtensa Merge branch 'for-4.4/io-poll' of git://git.kernel.dk/linux-block 2015-11-10 17:23:49 -08:00
.gitignore
Kconfig