Commit graph

2211 commits

Author SHA1 Message Date
Greg Kroah-Hartman
5cc8c2ec61 This is the 4.4.110 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpPj0wACgkQONu9yGCS
 aT5QOhAAu3PoT3472I7zuWDUG0KQo5r0wdUO+YPW31VIHrxQ2H3sxR44rSHc5jW/
 tTg2TIYNBkNoj4jJDJ9J7f6PSnN1vGFglFW4GzxE3cr2+W7u5M5ex8yCYMcBIY9U
 56hbyqX5lf5KjGWJiQThwYsMBokrBJW2igAFN3cW39nNABhl0W39kiysGA9vbNrV
 +QMA4+ZADA2EeIRcdJmj8uc/cez/7sGAfrSktvATkI+HFamnTs0mrx9cl0eQKvjm
 y5PCxYUCbi4kqD4WM+UCYO3zpUD+r4iMDXwXBwLWkFvbumY4mVTItP+gq5M4Fb1g
 MSauGUGH7BDsT9gspricCMcAmjcTn6hth7/7/ZhlNq3NZv89pOquhpE0JOSAmYbA
 P4WaIRRWwpVrRt+THU7vZpAQWpFSwGmtE7tBfPMt2J7zqY3lMYmO3DoA+gejw3CV
 igbvmV0UY2uYSFnjawUUJ+k+ggYfGyRkUl2DfcllPhZFqE1XEi3NyjI0wi8vtXTd
 UlrU55TqsldCw1bjXH3lWrpoNybWvqUD2a249ZVs/h06Q5NKwNL8mTye+2BBQtCP
 QzAqHYbkBKv/f8M6Kg+HtTzgqUbWxVCeQTWFXHMAPVo4bCwGvVGrXbGJIj15lBuQ
 GWqc3dt69zxpn1tlcRHKH0P3KnkC67dARtY+8F8+D+HAHVY71Bg=
 =Kpwd
 -----END PGP SIGNATURE-----

Merge 4.4.110 into android-4.4

Changes in 4.4.110
	x86/boot: Add early cmdline parsing for options with arguments
	KAISER: Kernel Address Isolation
	kaiser: merged update
	kaiser: do not set _PAGE_NX on pgd_none
	kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
	kaiser: fix build and FIXME in alloc_ldt_struct()
	kaiser: KAISER depends on SMP
	kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
	kaiser: fix perf crashes
	kaiser: ENOMEM if kaiser_pagetable_walk() NULL
	kaiser: tidied up asm/kaiser.h somewhat
	kaiser: tidied up kaiser_add/remove_mapping slightly
	kaiser: kaiser_remove_mapping() move along the pgd
	kaiser: cleanups while trying for gold link
	kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
	kaiser: delete KAISER_REAL_SWITCH option
	kaiser: vmstat show NR_KAISERTABLE as nr_overhead
	kaiser: enhanced by kernel and user PCIDs
	kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
	kaiser: PCID 0 for kernel and 128 for user
	kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
	kaiser: paranoid_entry pass cr3 need to paranoid_exit
	kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
	kaiser: fix unlikely error in alloc_ldt_struct()
	kaiser: add "nokaiser" boot option, using ALTERNATIVE
	x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
	x86/kaiser: Check boottime cmdline params
	kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
	kaiser: drop is_atomic arg to kaiser_pagetable_walk()
	kaiser: asm/tlbflush.h handle noPGE at lower level
	kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
	x86/paravirt: Dont patch flush_tlb_single
	x86/kaiser: Reenable PARAVIRT
	kaiser: disabled on Xen PV
	x86/kaiser: Move feature detection up
	KPTI: Rename to PAGE_TABLE_ISOLATION
	KPTI: Report when enabled
	x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
	x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
	x86/kasan: Clear kasan_zero_page after TLB flush
	kaiser: Set _PAGE_NX only if supported
	Linux 4.4.110

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-06 10:53:18 +01:00
Guenter Roeck
b33c3c64c4 kaiser: Set _PAGE_NX only if supported
This resolves a crash if loaded under qemu + haxm under windows.
See https://www.spinics.net/lists/kernel/msg2689835.html for details.
Here is a boot log (the log is from chromeos-4.4, but Tao Wu says that
the same log is also seen with vanilla v4.4.110-rc1).

[    0.712750] Freeing unused kernel memory: 552K
[    0.721821] init: Corrupted page table at address 57b029b332e0
[    0.722761] PGD 80000000bb238067 PUD bc36a067 PMD bc369067 PTE 45d2067
[    0.722761] Bad pagetable: 000b [#1] PREEMPT SMP 
[    0.722761] Modules linked in:
[    0.722761] CPU: 1 PID: 1 Comm: init Not tainted 4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761] task: ffff8800bc290000 ti: ffff8800bc28c000 task.ti: ffff8800bc28c000
[    0.722761] RIP: 0010:[<ffffffff83f4129e>]  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761] RSP: 0000:ffff8800bc28fcf8  EFLAGS: 00010202
[    0.722761] RAX: 0000000000000000 RBX: 00000000000001a4 RCX: 00000000000001a4
[    0.722761] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000057b029b332e0
[    0.722761] RBP: ffff8800bc28fd08 R08: ffff8800bc290000 R09: ffff8800bb2f4000
[    0.722761] R10: ffff8800bc290000 R11: ffff8800bb2f4000 R12: 000057b029b332e0
[    0.722761] R13: 0000000000000000 R14: 000057b029b33340 R15: ffff8800bb1e2a00
[    0.722761] FS:  0000000000000000(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
[    0.722761] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[    0.722761] CR2: 000057b029b332e0 CR3: 00000000bb2f8000 CR4: 00000000000006e0
[    0.722761] Stack:
[    0.722761]  000057b029b332e0 ffff8800bb95fa80 ffff8800bc28fd18 ffffffff83f4120c
[    0.722761]  ffff8800bc28fe18 ffffffff83e9e7a1 ffff8800bc28fd68 0000000000000000
[    0.722761]  ffff8800bc290000 ffff8800bc290000 ffff8800bc290000 ffff8800bc290000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.722761]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.722761]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.722761]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.722761]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.722761]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.722761]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.722761]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.722761] Code: 86 84 be 12 00 00 00 e8 87 0d e8 ff 66 66 90 48 89 d8 48 c1
eb 03 4c 89 e7 83 e0 07 48 89 d9 be 08 00 00 00 31 d2 48 85 c9 74 0a <48> 89 17
48 01 f7 ff c9 75 f6 48 89 c1 85 c9 74 09 88 17 48 ff 
[    0.722761] RIP  [<ffffffff83f4129e>] __clear_user+0x42/0x67
[    0.722761]  RSP <ffff8800bc28fcf8>
[    0.722761] ---[ end trace def703879b4ff090 ]---
[    0.722761] BUG: sleeping function called from invalid context at /mnt/host/source/src/third_party/kernel/v4.4/kernel/locking/rwsem.c:21
[    0.722761] in_atomic(): 0, irqs_disabled(): 1, pid: 1, name: init
[    0.722761] CPU: 1 PID: 1 Comm: init Tainted: G      D         4.4.96 #31
[    0.722761] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
[    0.722761]  0000000000000086 dcb5d76098c89836 ffff8800bc28fa30 ffffffff83f34004
[    0.722761]  ffffffff84839dc2 0000000000000015 ffff8800bc28fa40 ffffffff83d57dc9
[    0.722761]  ffff8800bc28fa68 ffffffff83d57e6a ffffffff84a53640 0000000000000000
[    0.722761] Call Trace:
[    0.722761]  [<ffffffff83f34004>] dump_stack+0x4d/0x63
[    0.722761]  [<ffffffff83d57dc9>] ___might_sleep+0x13a/0x13c
[    0.722761]  [<ffffffff83d57e6a>] __might_sleep+0x9f/0xa6
[    0.722761]  [<ffffffff84502788>] down_read+0x20/0x31
[    0.722761]  [<ffffffff83cc5d9b>] __blocking_notifier_call_chain+0x35/0x63
[    0.722761]  [<ffffffff83cc5ddd>] blocking_notifier_call_chain+0x14/0x16
[    0.800374] usb 1-1: new full-speed USB device number 2 using uhci_hcd
[    0.722761]  [<ffffffff83cefe97>] profile_task_exit+0x1a/0x1c
[    0.802309]  [<ffffffff83cac84e>] do_exit+0x39/0xe7f
[    0.802309]  [<ffffffff83ce5938>] ? vprintk_default+0x1d/0x1f
[    0.802309]  [<ffffffff83d7bb95>] ? printk+0x57/0x73
[    0.802309]  [<ffffffff83c46e25>] oops_end+0x80/0x85
[    0.802309]  [<ffffffff83c7b747>] pgtable_bad+0x8a/0x95
[    0.802309]  [<ffffffff83ca7f4a>] __do_page_fault+0x8c/0x352
[    0.802309]  [<ffffffff83eefba5>] ? file_has_perm+0xc4/0xe5
[    0.802309]  [<ffffffff83ca821c>] do_page_fault+0xc/0xe
[    0.802309]  [<ffffffff84507682>] page_fault+0x22/0x30
[    0.802309]  [<ffffffff83f4129e>] ? __clear_user+0x42/0x67
[    0.802309]  [<ffffffff83f4127f>] ? __clear_user+0x23/0x67
[    0.802309]  [<ffffffff83f4120c>] clear_user+0x2e/0x30
[    0.802309]  [<ffffffff83e9e7a1>] load_elf_binary+0xa7f/0x18f7
[    0.802309]  [<ffffffff83de2088>] search_binary_handler+0x86/0x19c
[    0.802309]  [<ffffffff83de389e>] do_execveat_common.isra.26+0x909/0xf98
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.802309]  [<ffffffff83de40be>] do_execve+0x23/0x25
[    0.802309]  [<ffffffff83c002e3>] run_init_process+0x2b/0x2d
[    0.802309]  [<ffffffff844fec4d>] kernel_init+0x6d/0xda
[    0.802309]  [<ffffffff84505b2f>] ret_from_fork+0x3f/0x70
[    0.802309]  [<ffffffff844febe0>] ? rest_init+0x87/0x87
[    0.830559] Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009
[    0.830559] 
[    0.831305] Kernel Offset: 0x2c00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[    0.831305] ---[ end Kernel panic - not syncing: Attempted to kill init!  exitcode=0x00000009

The crash part of this problem may be solved with the following patch
(thanks to Hugh for the hint). There is still another problem, though -
with this patch applied, the qemu session aborts with "VCPU Shutdown
request", whatever that means.

Cc: lepton <ytht.net@gmail.com>
Signed-off-by: Guenter Roeck <groeck@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:27 +01:00
Andrey Ryabinin
2b24fe5c57 x86/kasan: Clear kasan_zero_page after TLB flush
commit 69e0210fd01ff157d332102219aaf5c26ca8069b upstream.

Currently we clear kasan_zero_page before __flush_tlb_all(). This
works with current implementation of native_flush_tlb[_global]()
because it doesn't cause do any writes to kasan shadow memory.
But any subtle change made in native_flush_tlb*() could break this.
Also current code seems doesn't work for paravirt guests (lguest).

Only after the TLB flush we can be sure that kasan_zero_page is not
used as early shadow anymore (instrumented code will not write to it).
So it should cleared it only after the TLB flush.

Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1452516679-32040-2-git-send-email-aryabinin@virtuozzo.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jamie Iles <jamie.iles@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:27 +01:00
Kees Cook
bfd51a4d71 KPTI: Report when enabled
Make sure dmesg reports when KPTI is enabled.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Kees Cook
3e1457d6bf KPTI: Rename to PAGE_TABLE_ISOLATION
This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Borislav Petkov
7f79599df9 x86/kaiser: Move feature detection up
... before the first use of kaiser_enabled as otherwise funky
things happen:

  about to get started...
  (XEN) d0v0 Unhandled page fault fault/trap [#14, ec=0000]
  (XEN) Pagetable walk from ffff88022a449090:
  (XEN)  L4[0x110] = 0000000229e0e067 0000000000001e0e
  (XEN)  L3[0x008] = 0000000000000000 ffffffffffffffff
  (XEN) domain_crash_sync called from entry.S: fault at ffff82d08033fd08
  entry.o#create_bounce_frame+0x135/0x14d
  (XEN) Domain 0 (vcpu#0) crashed on cpu#0:
  (XEN) ----[ Xen-4.9.1_02-3.21  x86_64  debug=n   Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e033:[<ffffffff81007460>]
  (XEN) RFLAGS: 0000000000000286   EM: 1   CONTEXT: pv guest (d0v0)

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Jiri Kosina
e4ba212ec6 kaiser: disabled on Xen PV
Kaiser cannot be used on paravirtualized MMUs (namely reading and writing CR3).
This does not work with KAISER as the CR3 switch from and to user space PGD
would require to map the whole XEN_PV machinery into both.

More importantly, enabling KAISER on Xen PV doesn't make too much sense, as PV
guests use distinct %cr3 values for kernel and user already.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Hugh Dickins
8eaca4c7d9 kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
Let kaiser_flush_tlb_on_return_to_user() do the X86_FEATURE_PCID
check, instead of each caller doing it inline first: nobody needs
to optimize for the noPCID case, it's clearer this way, and better
suits later changes.  Replace those no-op X86_CR3_PCID_KERN_FLUSH lines
by a BUILD_BUG_ON() in load_new_mm_cr3(), in case something changes.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Hugh Dickins
28c6de5441 kaiser: drop is_atomic arg to kaiser_pagetable_walk()
I have not observed a might_sleep() warning from setup_fixmap_gdt()'s
use of kaiser_add_mapping() in our tree (why not?), but like upstream
we have not provided a way for that to pass is_atomic true down to
kaiser_pagetable_walk(), and at startup it's far from a likely source
of trouble: so just delete the walk's is_atomic arg and might_sleep().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Hugh Dickins
2dff99eb03 kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
Now that we're playing the ALTERNATIVE game, use that more efficient
method: instead of user-mapping an extra page, and reading an extra
cacheline each time for x86_cr3_pcid_noflush.

Neel has found that __stringify(bts $X86_CR3_PCID_NOFLUSH_BIT, %rax)
is a working substitute for the "bts $63, %rax" in these ALTERNATIVEs;
but the one line with $63 in looks clearer, so let's stick with that.

Worried about what happens with an ALTERNATIVE between the jump and
jump label in another ALTERNATIVE?  I was, but have checked the
combinations in SWITCH_KERNEL_CR3_NO_STACK at entry_SYSCALL_64,
and it does a good job.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:26 +01:00
Borislav Petkov
e405a064bd x86/kaiser: Check boottime cmdline params
AMD (and possibly other vendors) are not affected by the leak
KAISER is protecting against.

Keep the "nopti" for traditional reasons and add pti=<on|off|auto>
like upstream.

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Borislav Petkov
dea9aa9ffa x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
Concentrate it in arch/x86/mm/kaiser.c and use the upstream string "nopti".

Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
e345dcc948 kaiser: add "nokaiser" boot option, using ALTERNATIVE
Added "nokaiser" boot option: an early param like "noinvpcid".
Most places now check int kaiser_enabled (#defined 0 when not
CONFIG_KAISER) instead of #ifdef CONFIG_KAISER; but entry_64.S
and entry_64_compat.S are using the ALTERNATIVE technique, which
patches in the preferred instructions at runtime.  That technique
is tied to x86 cpu features, so X86_FEATURE_KAISER is fabricated.

Prior to "nokaiser", Kaiser #defined _PAGE_GLOBAL 0: revert that,
but be careful with both _PAGE_GLOBAL and CR4.PGE: setting them when
nokaiser like when !CONFIG_KAISER, but not setting either when kaiser -
neither matters on its own, but it's hard to be sure that _PAGE_GLOBAL
won't get set in some obscure corner, or something add PGE into CR4.
By omitting _PAGE_GLOBAL from __supported_pte_mask when kaiser_enabled,
all page table setup which uses pte_pfn() masks it out of the ptes.

It's slightly shameful that the same declaration versus definition of
kaiser_enabled appears in not one, not two, but in three header files
(asm/kaiser.h, asm/pgtable.h, asm/tlbflush.h).  I felt safer that way,
than with #including any of those in any of the others; and did not
feel it worth an asm/kaiser_enabled.h - kernel/cpu/common.c includes
them all, so we shall hear about it if they get out of synch.

Cleanups while in the area: removed the silly #ifdef CONFIG_KAISER
from kaiser.c; removed the unused native_get_normal_pgd(); removed
the spurious reg clutter from SWITCH_*_CR3 macro stubs; corrected some
comments.  But more interestingly, set CR4.PSE in secondary_startup_64:
the manual is clear that it does not matter whether it's 0 or 1 when
4-level-pts are enabled, but I was distracted to find cr4 different on
BSP and auxiliaries - BSP alone was adding PSE, in probe_page_size_mask().

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
d41f46f778 kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
Synthetic filesystem mempressure testing has shown softlockups, with
hour-long page allocation stalls, and pgd_alloc() trying for order:1
with __GFP_REPEAT in one of the backtraces each time.

That's _pgd_alloc() going for a Kaiser double-pgd, using the __GFP_REPEAT
common to all page table allocations, but actually having no effect on
order:0 (see should_alloc_oom() and should_continue_reclaim() in this
tree, but beware that ports to another tree might behave differently).

Order:1 stack allocation has been working satisfactorily without
__GFP_REPEAT forever, and page table allocation only asks __GFP_REPEAT
for awkward occasions in a long-running process: it's not appropriate
at fork or exec time, and seems to be doing much more harm than good:
getting those contiguous pages under very heavy mempressure can be
hard (though even without it, Kaiser does generate more mempressure).

Mask out that __GFP_REPEAT inside _pgd_alloc().  Why not take it out
of the PGALLOG_GFP altogether, as v4.7 commit a3a9a59d2067 ("x86: get
rid of superfluous __GFP_REPEAT") did?  Because I think that might
make a difference to our page table memcg charging, which I'd prefer
not to interfere with at this time.

hughd adds: __alloc_pages_slowpath() in the 4.4.89-stable tree handles
__GFP_REPEAT a little differently than in prod kernel or 3.18.72-stable,
so it may not always be exactly a no-op on order:0 pages, as said above;
but I think still appropriate to omit it from Kaiser or non-Kaiser pgd.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
20268a10ff kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
Mostly this commit is just unshouting X86_CR3_PCID_KERN_VAR and
X86_CR3_PCID_USER_VAR: we usually name variables in lower-case.

But why does x86_cr3_pcid_noflush need to be __aligned(PAGE_SIZE)?
Ah, it's a leftover from when kaiser_add_user_map() once complained
about mapping the same page twice.  Make it __read_mostly instead.
(I'm a little uneasy about all the unrelated data which shares its
page getting user-mapped too, but that was so before, and not a big
deal: though we call it user-mapped, it's not mapped with _PAGE_USER.)

And there is a little change around the two calls to do_nmi().
Previously they set the NOFLUSH bit (if PCID supported) when
forcing to kernel context before do_nmi(); now they also have the
NOFLUSH bit set (if PCID supported) when restoring context after:
nothing done in do_nmi() should require a TLB to be flushed here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
3b4ce0e1a1 kaiser: PCID 0 for kernel and 128 for user
Why was 4 chosen for kernel PCID and 6 for user PCID?
No good reason in a backport where PCIDs are only used for Kaiser.

If we continue with those, then we shall need to add Andy Lutomirski's
4.13 commit 6c690ee1039b ("x86/mm: Split read_cr3() into read_cr3_pa()
and __read_cr3()"), which deals with the problem of read_cr3() callers
finding stray bits in the cr3 that they expected to be page-aligned;
and for hibernation, his 4.14 commit f34902c5c6c0 ("x86/hibernate/64:
Mask off CR3's PCID bits in the saved CR3").

But if 0 is used for kernel PCID, then there's no need to add in those
commits - whenever the kernel looks, it sees 0 in the lower bits; and
0 for kernel seems an obvious choice.

And I naughtily propose 128 for user PCID.  Because there's a place
in _SWITCH_TO_USER_CR3 where it takes note of the need for TLB FLUSH,
but needs to reset that to NOFLUSH for the next occasion.  Currently
it does so with a "movb $(0x80)" into the high byte of the per-cpu
quadword, but that will cause a machine without PCID support to crash.
Now, if %al just happened to have 0x80 in it at that point, on a
machine with PCID support, but 0 on a machine without PCID support...

(That will go badly wrong once the pgd can be at a physical address
above 2^56, but even with 5-level paging, physical goes up to 2^52.)

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
0731188fc7 kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
We have many machines (Westmere, Sandybridge, Ivybridge) supporting
PCID but not INVPCID: on these load_new_mm_cr3() simply crashed.

Flushing user context inside load_new_mm_cr3() without the use of
invpcid is difficult: momentarily switch from kernel to user context
and back to do so?  I'm not sure whether that can be safely done at
all, and would risk polluting user context with kernel internals,
and kernel context with stale user externals.

Instead, follow the hint in the comment that was there: change
X86_CR3_PCID_USER_VAR to be a per-cpu variable, then load_new_mm_cr3()
can leave a note in it, for SWITCH_USER_CR3 on return to userspace to
flush user context TLB, instead of default X86_CR3_PCID_USER_NOFLUSH.

Which works well enough that there's no need to do it this way only
when invpcid is unsupported: it's a good alternative to invpcid here.
But there's a couple of inlines in asm/tlbflush.h that need to do the
same trick, so it's best to localize all this per-cpu business in
mm/kaiser.c: moving that part of the initialization from setup_pcid()
to kaiser_setup_pcid(); with kaiser_flush_tlb_on_return_to_user() the
function for noting an X86_CR3_PCID_USER_FLUSH.  And let's keep a
KAISER_SHADOW_PGD_OFFSET in there, to avoid the extra OR on exit.

I did try to make the feature tests in asm/tlbflush.h more consistent
with each other: there seem to be far too many ways of performing such
tests, and I don't have a good grasp of their differences.  At first
I converted them all to be static_cpu_has(): but that proved to be a
mistake, as the comment in __native_flush_tlb_single() hints; so then
I reversed and made them all this_cpu_has().  Probably all gratuitous
change, but that's the way it's working at present.

I am slightly bothered by the way non-per-cpu X86_CR3_PCID_KERN_VAR
gets re-initialized by each cpu (before and after these changes):
no problem when (as usual) all cpus on a machine have the same
features, but in principle incorrect.  However, my experiment
to per-cpu-ify that one did not end well...

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Dave Hansen
eb82151d0b kaiser: enhanced by kernel and user PCIDs
Merged performance improvements to Kaiser, using distinct kernel
and user Process Context Identifiers to minimize the TLB flushing.

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:25 +01:00
Hugh Dickins
3e3d38fd98 kaiser: vmstat show NR_KAISERTABLE as nr_overhead
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
f127705d26 kaiser: kaiser_remove_mapping() move along the pgd
When removing the bogus comment from kaiser_remove_mapping(),
I really ought to have checked the extent of its bogosity: as
Neel points out, there is nothing to stop unmap_pud_range_nofree()
from continuing beyond the end of a pud (and starting in the wrong
position on the next).

Fix kaiser_remove_mapping() to constrain the extent and advance pgd
pointer correctly: use pgd_addr_end() macro as used throughout base
mm (but don't assume page-rounded start and size in this case).

But this bug was very unlikely to trigger in this backport: since
any buddy allocation is contained within a single pud extent, and
we are not using vmapped stacks (and are only mapping one page of
stack anyway): the only way to hit this bug here would be when
freeing a large modified ldt.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
0c68228f7b kaiser: tidied up kaiser_add/remove_mapping slightly
Yes, unmap_pud_range_nofree()'s declaration ought to be in a
header file really, but I'm not sure we want to use it anyway:
so for now just declare it inside kaiser_remove_mapping().
And there doesn't seem to be such a thing as unmap_p4d_range(),
even in a 5-level paging tree.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
407c3ff6a2 kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser_add_user_map() took no notice when kaiser_pagetable_walk() failed.
And avoid its might_sleep() when atomic (though atomic at present unused).

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:24 +01:00
Hugh Dickins
edde73205b kaiser: do not set _PAGE_NX on pgd_none
native_pgd_clear() uses native_set_pgd(), so native_set_pgd() must
avoid setting the _PAGE_NX bit on an otherwise pgd_none() entry:
usually that just generated a warning on exit, but sometimes
more mysterious and damaging failures (our production machines
could not complete booting).

The original fix to this just avoided adding _PAGE_NX to
an empty entry; but eventually more problems surfaced with kexec,
and EFI mapping expected to be a problem too.  So now instead
change native_set_pgd() to update shadow only if _PAGE_USER:

A few places (kernel/machine_kexec_64.c, platform/efi/efi_64.c for sure)
use set_pgd() to set up a temporary internal virtual address space, with
physical pages remapped at what Kaiser regards as userspace addresses:
Kaiser then assumes a shadow pgd follows, which it will try to corrupt.

This appears to be responsible for the recent kexec and kdump failures;
though it's unclear how those did not manifest as a problem before.
Ah, the shadow pgd will only be assumed to "follow" if the requested
pgd is on an even-numbered page: so I suppose it was going wrong 50%
of the time all along.

What we need is a flag to set_pgd(), to tell it we're dealing with
userspace.  Er, isn't that what the pgd's _PAGE_USER bit is saying?
Add a test for that.  But we cannot do the same for pgd_clear()
(which may be called to clear corrupted entries - set aside the
question of "corrupt in which pgd?" until later), so there just
rely on pgd_clear() not being called in the problematic cases -
with a WARN_ON_ONCE() which should fire half the time if it is.

But this is getting too big for an inline function: move it into
arch/x86/mm/kaiser.c (which then demands a boot/compressed mod);
and de-void and de-space native_get_shadow/normal_pgd() while here.

Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Dave Hansen
bed9bb7f3e kaiser: merged update
Merged fixes and cleanups, rebased to 4.4.89 tree (no 5-level paging).

Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Richard Fellner
8a43ddfb93 KAISER: Kernel Address Isolation
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.

More information about the patch can be found on:

        https://github.com/IAIK/KAISER

From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5

To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>

After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).

With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.

If there are any questions we would love to answer them.
We also appreciate any comments!

Cheers,
Daniel (+ the KAISER team from Graz University of Technology)

[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf

[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]

Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-05 15:44:23 +01:00
Greg Kroah-Hartman
8cbe01c651 This is the 4.4.109 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpL3okACgkQONu9yGCS
 aT6p5g/8CAG9NU/fLu7IMcIlyqfVvdOhzxn44oHCxq08eycqoggdnb3TZXxBUBgY
 +w8uZk8yxNdjXR39GjkMSUy06WRvl2XDSrd36sDGRCBP62Fi8l5scmlRaNEnI/E8
 ltBSB93P16SmnpKa/3Zscz+7LcaoXHpU5Xhs8Zmf4I69qmzOFX2qSKsUyzVT+gNI
 ZoSN/mYuXf7+dzrcKhVdYzm4ZdMRvxdT0WefeoeZMekfAtU9D8zaFOA9jTIAMHSZ
 adNn18s7UKmaipZf/01mW9srvZce4nPKiUC8WVGstiyl27ws+IDleKVmDnqFALjy
 2LIxDvjDth/x8jfqTb7F6bFh6dVtMJjwUmd3KL7hgPuTddoQQe/GfKnjSHkbNxyR
 qNxNtbOgQ2EVOf59fejxWshCP/fButNo8uvCI1ERdm4axGXcf9hiucdlwzCYezHs
 UN0xrxAXprhqTq4hQFB9E4C49e8nMPNsyXTMZwSZRPe2z53spD53JR/0sl5Z2RWe
 ueO21tBZ6ev9jPNi+lJrCVw1oBO+PKOmdNPAaSynUVm96grRnW6grUI3mX9FqMXb
 r62UWG3YCWWBgxA3iQQrMxf/3S2YZXz59TBbp9GU8xOYJZLhKL29/iB7Rv4ANtkR
 aMDrABjWqrCZpIazqkZ5uwbsNl6Q51e3Mji3EfwkBaMqjc41++I=
 =B52+
 -----END PGP SIGNATURE-----

Merge 4.4.109 into android-4.4

Changes in 4.4.109
	ACPI: APEI / ERST: Fix missing error handling in erst_reader()
	crypto: mcryptd - protect the per-CPU queue with a lock
	mfd: cros ec: spi: Don't send first message too soon
	mfd: twl4030-audio: Fix sibling-node lookup
	mfd: twl6040: Fix child-node lookup
	ALSA: rawmidi: Avoid racy info ioctl via ctl device
	ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
	PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
	parisc: Hide Diva-built-in serial aux and graphics card
	spi: xilinx: Detect stall with Unknown commands
	KVM: X86: Fix load RFLAGS w/o the fixed bit
	kvm: x86: fix RSM when PCID is non-zero
	powerpc/perf: Dereference BHRB entries safely
	net: mvneta: clear interface link status on port disable
	tracing: Remove extra zeroing out of the ring buffer page
	tracing: Fix possible double free on failure of allocating trace buffer
	tracing: Fix crash when it fails to alloc ring buffer
	ring-buffer: Mask out the info bits when returning buffer page length
	iw_cxgb4: Only validate the MSN for successful completions
	ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
	ASoC: twl4030: fix child-node lookup
	ALSA: hda: Drop useless WARN_ON()
	ALSA: hda - fix headset mic detection issue on a Dell machine
	x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
	x86/mm: Remove flush_tlb() and flush_tlb_current_task()
	x86/mm: Make flush_tlb_mm_range() more predictable
	x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
	x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
	x86/mm: Disable PCID on 32-bit kernels
	x86/mm: Add the 'nopcid' boot option to turn off PCID
	x86/mm: Enable CR4.PCIDE on supported systems
	x86/mm/64: Fix reboot interaction with CR4.PCIDE
	kbuild: add '-fno-stack-check' to kernel build options
	ipv4: igmp: guard against silly MTU values
	ipv6: mcast: better catch silly mtu values
	net: igmp: Use correct source address on IGMPv3 reports
	netlink: Add netns check on taps
	net: qmi_wwan: add Sierra EM7565 1199:9091
	net: reevalulate autoflowlabel setting after sysctl setting
	tcp md5sig: Use skb's saddr when replying to an incoming segment
	tg3: Fix rx hang on MTU change with 5717/5719
	net: ipv4: fix for a race condition in raw_sendmsg
	net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
	sctp: Replace use of sockets_allocated with specified macro.
	ipv4: Fix use-after-free when flushing FIB tables
	net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
	net: Fix double free and memory corruption in get_net_ns_by_id()
	net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
	sock: free skb in skb_complete_tx_timestamp on error
	usbip: fix usbip bind writing random string after command in match_busid
	usbip: stub: stop printing kernel pointer addresses in messages
	usbip: vhci: stop printing kernel pointer addresses in messages
	USB: serial: ftdi_sio: add id for Airbus DS P8GR
	USB: serial: qcserial: add Sierra Wireless EM7565
	USB: serial: option: add support for Telit ME910 PID 0x1101
	USB: serial: option: adding support for YUGA CLM920-NC5
	usb: Add device quirk for Logitech HD Pro Webcam C925e
	usb: add RESET_RESUME for ELSA MicroLink 56K
	USB: Fix off by one in type-specific length check of BOS SSP capability
	usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
	nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
	x86/smpboot: Remove stale TLB flush invocations
	n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
	mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
	Linux 4.4.109

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2018-01-02 20:58:26 +01:00
Andy Lutomirski
b2e24274d5 x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
commit ce4a4e565f5264909a18c733b864c3f74467f69e upstream.

The UP asm/tlbflush.h generates somewhat nicer code than the SMP version.
Aside from that, it's fallen quite a bit behind the SMP code:

 - flush_tlb_mm_range() didn't flush individual pages if the range
   was small.

 - The lazy TLB code was much weaker.  This usually wouldn't matter,
   but, if a kernel thread flushed its lazy "active_mm" more than
   once (due to reclaim or similar), it wouldn't be unlazied and
   would instead pointlessly flush repeatedly.

 - Tracepoints were missing.

Aside from that, simply having the UP code around was a maintanence
burden, since it means that any change to the TLB flush code had to
make sure not to break it.

Simplify everything by deleting the UP code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
3efba6062a x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
commit ca6c99c0794875c6d1db6e22f246699691ab7e6b upstream.

flush_tlb_page() was very similar to flush_tlb_mm_range() except that
it had a couple of issues:

 - It was missing an smp_mb() in the case where
   current->active_mm != mm.  (This is a longstanding bug reported by Nadav Amit)

 - It was missing tracepoints and vm counter updates.

The only reason that I can see for keeping it at as a separate
function is that it could avoid a few branches that
flush_tlb_mm_range() needs to decide to flush just one page.  This
hardly seems worthwhile.  If we decide we want to get rid of those
branches again, a better way would be to introduce an
__flush_tlb_mm_range() helper and make both flush_tlb_page() and
flush_tlb_mm_range() use it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/3cc3847cf888d8907577569b8bac3f01992ef8f9.1495492063.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
9f4d1ba1d4 x86/mm: Make flush_tlb_mm_range() more predictable
commit ce27374fabf553153c3f53efcaa9bfab9216bd8c upstream.

I'm about to rewrite the function almost completely, but first I
want to get a functional change out of the way.  Currently, if
flush_tlb_mm_range() does not flush the local TLB at all, it will
never do individual page flushes on remote CPUs.  This seems to be
an accident, and preserving it will be awkward.  Let's change it
first so that any regressions in the rewrite will be easier to
bisect and so that the rewrite can attempt to change no visible
behavior at all.

The fix is simple: we can simply avoid short-circuiting the
calculation of base_pages_to_flush.

As a side effect, this also eliminates a potential corner case: if
tlb_single_page_flush_ceiling == TLB_FLUSH_ALL, flush_tlb_mm_range()
could have ended up flushing the entire address space one page at a
time.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Acked-by: Dave Hansen <dave.hansen@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/4b29b771d9975aad7154c314534fec235618175a.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Andy Lutomirski
227d6f0e79 x86/mm: Remove flush_tlb() and flush_tlb_current_task()
commit 29961b59a51f8c6838a26a45e871a7ed6771809b upstream.

I was trying to figure out what how flush_tlb_current_task() would
possibly work correctly if current->mm != current->active_mm, but I
realized I could spare myself the effort: it has no callers except
the unused flush_tlb() macro.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/e52d64c11690f85e9f1d69d7b48cc2269cd2e94b.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-01-02 20:33:23 +01:00
Greg Kroah-Hartman
55b3b8c2b5 This is the 4.4.108 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpA+4sACgkQONu9yGCS
 aT4RphAAkoRI16GF7c2U1aLVGcYA5zezxnYEjtFXjoM7sLwmfEBeSTTR8OqCZGaa
 hLvhrqCbF6qjgu0dKAKaLasnoUk+/ZEpxSE0JAlQ0ZdmCP9YH+Sd2PqjD48QGJ80
 O2JktsYT3MjaNKFHNeSElf2ffk3mDzRpeeHEtSduJE3/Pqvoch2qV36qJWstOpLR
 QKEQCKVP9Xa71V5Zu5tDAMFI0IQ09HRjBjyjlAxnJL+/wgYM/NTraZ1/3Ju8nGDU
 Ohamr80SdIZ0+36Gl9mFTDQRl5yuh2SllqaD4Hq4P41HPIWaXuTuKQwA0LD7U8U6
 N7I5byN7QVmOszERc0jzkQPd4aN1FWfDqAR5i6S+fDbianDiHvBNsybihayWqzay
 MgX3VnzhFrnkh5UcijKOjsRHp/CkDIAfUseb5dFekAjuctWtNK7qkE+bGySZJ9ET
 XO2b8xlRz/nYuWodsAXT6FhVaVd45ba+VR/nDwSzijf+7SY2Ub9/lwChePbI8VJj
 vSiBURRtZhEgwXXJYH3JT4L+MSqfKA+1Jd+G7BqvZaSQ6S8RLvts/JZAX1nGs0JK
 B2a84CD0lXd5RcMBRFXkfzCEZDA1hB/oLpmVVXxNROseSlSc/ExQ3xZ9UPXlWCnN
 dB7XCV5GoV9Fqx5FX0lg6eEkNzUrkFV/My/FKaJtZR8U0TJB1nk=
 =uslW
 -----END PGP SIGNATURE-----

Merge 4.4.108 into android-4.4

Changes in 4.4.108
	arm64: Initialise high_memory global variable earlier
	cxl: Check if vphb exists before iterating over AFU devices
	x86/mm: Add INVPCID helpers
	x86/mm: Fix INVPCID asm constraint
	x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
	x86/mm: If INVPCID is available, use it to flush global mappings
	mm/rmap: batched invalidations should use existing api
	mm/mmu_context, sched/core: Fix mmu_context.h assumption
	sched/core: Add switch_mm_irqs_off() and use it in the scheduler
	x86/mm: Build arch/x86/mm/tlb.c even on !SMP
	x86/mm, sched/core: Uninline switch_mm()
	x86/mm, sched/core: Turn off IRQs in switch_mm()
	ARM: Hide finish_arch_post_lock_switch() from modules
	sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
	x86/irq: Do not substract irq_tlb_count from irq_call_count
	ALSA: hda - add support for docking station for HP 820 G2
	ALSA: hda - add support for docking station for HP 840 G3
	arm: kprobes: Fix the return address of multiple kretprobes
	arm: kprobes: Align stack to 8-bytes in test code
	cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
	r8152: fix the list rx_done may be used without initialization
	crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
	sch_dsmark: fix invalid skb_cow() usage
	bna: integer overflow bug in debugfs
	net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
	usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
	usb: gadget: udc: remove pointer dereference after free
	netfilter: nfnl_cthelper: fix runtime expectation policy updates
	netfilter: nfnl_cthelper: Fix memory leak
	inet: frag: release spinlock before calling icmp_send()
	pinctrl: st: add irq_request/release_resources callbacks
	scsi: lpfc: Fix PT2PT PRLI reject
	KVM: x86: correct async page present tracepoint
	KVM: VMX: Fix enable VPID conditions
	ARM: dts: ti: fix PCI bus dtc warnings
	hwmon: (asus_atk0110) fix uninitialized data access
	HID: xinmo: fix for out of range for THT 2P arcade controller.
	r8152: prevent the driver from transmitting packets with carrier off
	s390/qeth: no ETH header for outbound AF_IUCV
	bna: avoid writing uninitialized data into hw registers
	net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
	i40e: Do not enable NAPI on q_vectors that have no rings
	RDMA/iser: Fix possible mr leak on device removal event
	irda: vlsi_ir: fix check for DMA mapping errors
	netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
	netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
	ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
	KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
	isdn: kcapi: avoid uninitialized data
	xhci: plat: Register shutdown for xhci_plat
	netfilter: nfnetlink_queue: fix secctx memory leak
	ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
	cpuidle: powernv: Pass correct drv->cpumask for registration
	bnxt_en: Fix NULL pointer dereference in reopen failure path
	backlight: pwm_bl: Fix overflow condition
	crypto: crypto4xx - increase context and scatter ring buffer elements
	rtc: pl031: make interrupt optional
	net: phy: at803x: Change error to EINVAL for invalid MAC
	PCI: Avoid bus reset if bridge itself is broken
	scsi: cxgb4i: fix Tx skb leak
	scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
	PCI: Create SR-IOV virtfn/physfn links before attaching driver
	igb: check memory allocation failure
	ixgbe: fix use of uninitialized padding
	PCI/AER: Report non-fatal errors only to the affected endpoint
	scsi: lpfc: Fix secure firmware updates
	scsi: lpfc: PLOGI failures during NPIV testing
	fm10k: ensure we process SM mbx when processing VF mbx
	tcp: fix under-evaluated ssthresh in TCP Vegas
	rtc: set the alarm to the next expiring timer
	cpuidle: fix broadcast control when broadcast can not be entered
	thermal: hisilicon: Handle return value of clk_prepare_enable
	MIPS: math-emu: Fix final emulation phase for certain instructions
	Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
	ALSA: hda - Clear the leftover component assignment at snd_hdac_i915_exit()
	ALSA: hda - Degrade i915 binding failure message
	ALSA: hda - Fix yet another i915 pointer leftover in error path
	alpha: fix build failures
	Linux 4.4.108

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-27 13:36:00 +01:00
Andy Lutomirski
4ead44fd25 x86/mm, sched/core: Turn off IRQs in switch_mm()
commit 078194f8e9fe3cf54c8fd8bded48a1db5bd8eb8a upstream.

Potential races between switch_mm() and TLB-flush or LDT-flush IPIs
could be very messy.  AFAICT the code is currently okay, whether by
accident or by careful design, but enabling PCID will make it
considerably more complicated and will no longer be obviously safe.

Fix it with a big hammer: run switch_mm() with IRQs off.

To avoid a performance hit in the scheduler, we take advantage of
our knowledge that the scheduler already has IRQs disabled when it
calls switch_mm().

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f19baf759693c9dcae64bbff76189db77cb13398.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Andy Lutomirski
70a39c7fd1 x86/mm, sched/core: Uninline switch_mm()
commit 69c0319aabba45bcf33178916a2f06967b4adede upstream.

It's fairly large and it has quite a few callers.  This may also
help untangle some headers down the road.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/54f3367803e7f80b2be62c8a21879aa74b1a5f57.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Andy Lutomirski
83cc4b50e3 x86/mm: Build arch/x86/mm/tlb.c even on !SMP
commit e1074888c326038340a1ada9129d679e661f2ea6 upstream.

Currently all of the functions that live in tlb.c are inlined on
!SMP builds.  One can debate whether this is a good idea (in many
respects the code in tlb.c is better than the inlined UP code).

Regardless, I want to add code that needs to be built on UP and SMP
kernels and relates to tlb flushing, so arrange for tlb.c to be
compiled unconditionally.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/f0d778f0d828fc46e5d1946bca80f0aaf9abf032.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Nadav Amit
8d5ee51a6b mm/rmap: batched invalidations should use existing api
commit 858eaaa711700ce4595e039441e239e56d7b9514 upstream.

The recently introduced batched invalidations mechanism uses its own
mechanism for shootdown.  However, it does wrong accounting of
interrupts (e.g., inc_irq_stat is called for local invalidations),
trace-points (e.g., TLB_REMOTE_SHOOTDOWN for local invalidations) and
may break some platforms as it bypasses the invalidation mechanisms of
Xen and SGI UV.

This patch reuses the existing TLB flushing mechnaisms instead.  We use
NULL as mm to indicate a global invalidation is required.

Fixes 72b252aed5 ("mm: send one IPI per CPU to TLB flush all entries after unmapping pages")
Signed-off-by: Nadav Amit <namit@vmware.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-25 14:22:09 +01:00
Dmitry Vyukov
9b83f370dc BACKPORT: kernel: add kcov code coverage
kcov provides code coverage collection for coverage-guided fuzzing
(randomized testing).  Coverage-guided fuzzing is a testing technique
that uses coverage feedback to determine new interesting inputs to a
system.  A notable user-space example is AFL
(http://lcamtuf.coredump.cx/afl/).  However, this technique is not
widely used for kernel testing due to missing compiler and kernel
support.

kcov does not aim to collect as much coverage as possible.  It aims to
collect more or less stable coverage that is function of syscall inputs.
To achieve this goal it does not collect coverage in soft/hard
interrupts and instrumentation of some inherently non-deterministic or
non-interesting parts of kernel is disbled (e.g.  scheduler, locking).

Currently there is a single coverage collection mode (tracing), but the
API anticipates additional collection modes.  Initially I also
implemented a second mode which exposes coverage in a fixed-size hash
table of counters (what Quentin used in his original patch).  I've
dropped the second mode for simplicity.

This patch adds the necessary support on kernel side.  The complimentary
compiler support was added in gcc revision 231296.

We've used this support to build syzkaller system call fuzzer, which has
found 90 kernel bugs in just 2 months:

  https://github.com/google/syzkaller/wiki/Found-Bugs

We've also found 30+ bugs in our internal systems with syzkaller.
Another (yet unexplored) direction where kcov coverage would greatly
help is more traditional "blob mutation".  For example, mounting a
random blob as a filesystem, or receiving a random blob over wire.

Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
typical coverage can be just a dozen of basic blocks (e.g.  an invalid
input).  In such context gcov becomes prohibitively expensive as
reset/collect coverage steps depend on total number of basic
blocks/edges in program (in case of kernel it is about 2M).  Cost of
kcov depends only on number of executed basic blocks/edges.  On top of
that, kernel requires per-thread coverage because there are always
background threads and unrelated processes that also produce coverage.
With inlined gcov instrumentation per-thread coverage is not possible.

kcov exposes kernel PCs and control flow to user-space which is
insecure.  But debugfs should not be mapped as user accessible.

Based on a patch by Quentin Casasnovas.

[akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
[akpm@linux-foundation.org: unbreak allmodconfig]
[akpm@linux-foundation.org: follow x86 Makefile layout standards]
Signed-off-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Tavis Ormandy <taviso@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Kees Cook <keescook@google.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: David Drysdale <drysdale@google.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Bug: 64145065
(cherry-picked from 5c9a8750a6409c63a0f01d51a9024861022f6593)
Change-Id: I17b5e04f6e89b241924e78ec32ead79c38b860ce
Signed-off-by: Paul Lawrence <paullawrence@google.com>
2017-12-18 09:41:57 -08:00
Greg Kroah-Hartman
2fea0397a8 This is the 4.4.106 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo06IYACgkQONu9yGCS
 aT6M4hAAhACzW/fsu/NDmfsx8qroVSfugMaZd2kWd1Hne6lx4SXK/Fy61UFRLC04
 oImmBfzkkDekMg3wserA+pQmUaB1ZZl3wowh7J1M9wgfNdaNvPe5mN/9tU+LRGKH
 wOjZT1UWZ9Vf4a2JavsyujIL+H7QiOrsvZMaOKdUjD+chg3wexIQFoYg3NdE+wPZ
 /Rhztxvuj+yBG6zZl3Ws9y55suq2NATcltpiW4bbVZf5i2cMA3en/ugsGpWuB/UO
 IF2cnqzgernOpkkzVGFbXd0ePH8MhLxEiMMm+cVoE5xDGM0M7HMCePiPc66yOyYy
 4axU5KiVRRe1y0a0QDWGOO9MNPX1q0AE2Gy6B6p3nlOVvA5LO9mW1mI9gGY1yH5/
 Cfr9GqE9N/SmHQdLVGq8SFMKDdrOfxqyaFTOdTzMxa3TQX3qNYhoUWxcWmDVeMGY
 hNCqS1wTQ8Pp3ZH7VREm/kGpLFmcIe7vaERzhZYyXGU9cE+o2REWIJzx4W5pSH3D
 qaw9V+vN7aiep9TzP7G8TibXszW3j07+I7K4Ua3wBAfnbJR4hUcsExROBr/oV1+m
 klzq/xoj5L1m6x4Jf5avvaW5ykbnzKIeX3urALrW4qqnd3nyrir0w9Ja1YeBymMz
 56uGu8vqb02TZySPky7sSRnAyctEBP4SUL4vuudDRxIm+mbNors=
 =ZyVC
 -----END PGP SIGNATURE-----

Merge 4.4.106 into android-4.4

Changes in 4.4.106
	can: ti_hecc: Fix napi poll return value for repoll
	can: kvaser_usb: free buf in error paths
	can: kvaser_usb: Fix comparison bug in kvaser_usb_read_bulk_callback()
	can: kvaser_usb: ratelimit errors if incomplete messages are received
	can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
	can: ems_usb: cancel urb on -EPIPE and -EPROTO
	can: esd_usb2: cancel urb on -EPIPE and -EPROTO
	can: usb_8dev: cancel urb on -EPIPE and -EPROTO
	virtio: release virtio index when fail to device_register
	hv: kvp: Avoid reading past allocated blocks from KVP file
	isa: Prevent NULL dereference in isa_bus driver callbacks
	scsi: libsas: align sata_device's rps_resp on a cacheline
	efi: Move some sysfs files to be read-only by root
	ASN.1: fix out-of-bounds read when parsing indefinite length item
	ASN.1: check for error from ASN1_OP_END__ACT actions
	X.509: reject invalid BIT STRING for subjectPublicKey
	x86/PCI: Make broadcom_postcore_init() check acpi_disabled
	ALSA: pcm: prevent UAF in snd_pcm_info
	ALSA: seq: Remove spurious WARN_ON() at timer check
	ALSA: usb-audio: Fix out-of-bound error
	ALSA: usb-audio: Add check return value for usb_string()
	iommu/vt-d: Fix scatterlist offset handling
	s390: fix compat system call table
	kdb: Fix handling of kallsyms_symbol_next() return value
	drm: extra printk() wrapper macros
	drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
	media: dvb: i2c transfers over usb cannot be done from stack
	arm64: KVM: fix VTTBR_BADDR_MASK BUG_ON off-by-one
	KVM: VMX: remove I/O port 0x80 bypass on Intel hosts
	arm64: fpsimd: Prevent registers leaking from dead tasks
	ARM: BUG if jumping to usermode address in kernel mode
	ARM: avoid faulting on qemu
	scsi: storvsc: Workaround for virtual DVD SCSI version
	thp: reduce indentation level in change_huge_pmd()
	thp: fix MADV_DONTNEED vs. numa balancing race
	mm: drop unused pmdp_huge_get_and_clear_notify()
	Revert "drm/armada: Fix compile fail"
	Revert "spi: SPI_FSL_DSPI should depend on HAS_DMA"
	Revert "s390/kbuild: enable modversions for symbols exported from asm"
	vti6: Don't report path MTU below IPV6_MIN_MTU.
	ARM: OMAP2+: gpmc-onenand: propagate error on initialization failure
	x86/hpet: Prevent might sleep splat on resume
	selftest/powerpc: Fix false failures for skipped tests
	module: set __jump_table alignment to 8
	ARM: OMAP2+: Fix device node reference counts
	ARM: OMAP2+: Release device node after it is no longer needed.
	gpio: altera: Use handle_level_irq when configured as a level_high
	HID: chicony: Add support for another ASUS Zen AiO keyboard
	usb: gadget: configs: plug memory leak
	USB: gadgetfs: Fix a potential memory leak in 'dev_config()'
	kvm: nVMX: VMCLEAR should not cause the vCPU to shut down
	libata: drop WARN from protocol error in ata_sff_qc_issue()
	workqueue: trigger WARN if queue_delayed_work() is called with NULL @wq
	scsi: lpfc: Fix crash during Hardware error recovery on SLI3 adapters
	irqchip/crossbar: Fix incorrect type of register size
	KVM: nVMX: reset nested_run_pending if the vCPU is going to be reset
	arm: KVM: Survive unknown traps from guests
	arm64: KVM: Survive unknown traps from guests
	spi_ks8995: fix "BUG: key accdaa28 not in .data!"
	bnx2x: prevent crash when accessing PTP with interface down
	bnx2x: fix possible overrun of VFPF multicast addresses array
	bnx2x: do not rollback VF MAC/VLAN filters we did not configure
	ipv6: reorder icmpv6_init() and ip6_mr_init()
	crypto: s5p-sss - Fix completing crypto request in IRQ handler
	i2c: riic: fix restart condition
	zram: set physical queue limits to avoid array out of bounds accesses
	netfilter: don't track fragmented packets
	axonram: Fix gendisk handling
	drm/amd/amdgpu: fix console deadlock if late init failed
	powerpc/powernv/ioda2: Gracefully fail if too many TCE levels requested
	EDAC, i5000, i5400: Fix use of MTR_DRAM_WIDTH macro
	EDAC, i5000, i5400: Fix definition of NRECMEMB register
	kbuild: pkg: use --transform option to prefix paths in tar
	mac80211_hwsim: Fix memory leak in hwsim_new_radio_nl()
	route: also update fnhe_genid when updating a route cache
	route: update fnhe_expires for redirect when the fnhe exists
	lib/genalloc.c: make the avail variable an atomic_long_t
	dynamic-debug-howto: fix optional/omitted ending line number to be LARGE instead of 0
	NFS: Fix a typo in nfs_rename()
	sunrpc: Fix rpc_task_begin trace point
	block: wake up all tasks blocked in get_request()
	sparc64/mm: set fields in deferred pages
	sctp: do not free asoc when it is already dead in sctp_sendmsg
	sctp: use the right sk after waking up from wait_buf sleep
	atm: horizon: Fix irq release error
	jump_label: Invoke jump_label_test() via early_initcall()
	xfrm: Copy policy family in clone_policy
	IB/mlx4: Increase maximal message size under UD QP
	IB/mlx5: Assign send CQ and recv CQ of UMR QP
	afs: Connect up the CB.ProbeUuid
	ipvlan: fix ipv6 outbound device
	audit: ensure that 'audit=1' actually enables audit for PID 1
	ipmi: Stop timers before cleaning up the module
	s390: always save and restore all registers on context switch
	more bio_map_user_iov() leak fixes
	tipc: fix memory leak in tipc_accept_from_sock()
	rds: Fix NULL pointer dereference in __rds_rdma_map
	sit: update frag_off info
	packet: fix crash in fanout_demux_rollover()
	net/packet: fix a race in packet_bind() and packet_notifier()
	Revert "x86/efi: Build our own page table structures"
	Revert "x86/efi: Hoist page table switching code into efi_call_virt()"
	Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers"
	arm: KVM: Fix VTTBR_BADDR_MASK BUG_ON off-by-one
	usb: gadget: ffs: Forbid usb_ep_alloc_request from sleeping
	Linux 4.4.106

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-18 10:49:53 +01:00
Greg Kroah-Hartman
9f5a8d610d Revert "x86/mm/pat: Ensure cpa->pfn only contains page frame numbers"
This reverts commit 87e2bd898d which is
commit edc3b9129cecd0f0857112136f5b8b1bc1d45918 upstream.

Turns there was too many other issues with this patch to make it viable
for the stable tree.

Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Borislav Petkov <bp@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-16 10:33:57 +01:00
Greg Kroah-Hartman
8bc4213be4 This is the 4.4.104 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlomc30ACgkQONu9yGCS
 aT5Hkw/9GJCZ3LyqVfd9mkB97/9OTFduQUjLsqFdgQIRZdByHe6fR8cfJ1ydeiJS
 fcnBtZzJNMBXurEWourqrSWptDkFANoG0EGResza+3aXXC2SuwmuXG0JgR4KdnpE
 1pkqnKJ42rnK1XWuHIp2S0A6S+SW8fA5HrVFpQDLiXUb4NB9xerG2PS6tq94rQN0
 TB2IdJzb5ITWQkCzYvuTphl4jz2XnMJskd53c9osG2izBxVXzA650ygjtW6HrqG6
 AWvc0BK7p/Rz+ty3PbGbSNUJyDicnknS2/UARMz/BlI5SiyL8aAHVC7h3k9dh2CU
 uPVhfebFCSBcZ+iYl4UUuywjiJwDTMk/QwNZFph5Hw5HntB2LAHSG3rrv/8OIoPe
 +bvMRonfWB+rtzDiy0UbCCutswJzf6h9Pj5y/PA+GIFnNmeDQ096EOI7BFGO1MVe
 s/d/krw4xqhDRb97ZlWWX8UqRplmPSe4pWP47Gr0K4qNAPhp4B2tW6VJcMG3z2oo
 0/iV9cZ2CVxpIA2oN7ymWCKSQ764VADTcNSqC9Wd+rDso2XcBPRYUo9pbNKLqryb
 hxQR1Hj0do35qkR1L/e6thdygMV2ZyQ4xBq9fVPjDrwi2d9dO1jfnkQkL+QEa73G
 AaufuvFdJefssFP91FwGZnhhMi8kWIuJSTFM3Pg9OwNEjgt3nxo=
 =eTmV
 -----END PGP SIGNATURE-----

Merge 4.4.104 into android-4.4

Changes in 4.4.104
	netlink: add a start callback for starting a netlink dump
	ipsec: Fix aborted xfrm policy dump crash
	x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
	x86/efi: Hoist page table switching code into efi_call_virt()
	x86/efi: Build our own page table structures
	ARM: dts: omap3: logicpd-torpedo-37xx-devkit: Fix MMC1 cd-gpio
	x86/efi-bgrt: Fix kernel panic when mapping BGRT data
	x86/efi-bgrt: Replace early_memremap() with memremap()
	mm, thp: Do not make page table dirty unconditionally in touch_p[mu]d()
	mm/madvise.c: fix madvise() infinite loop under special circumstances
	btrfs: clear space cache inode generation always
	KVM: x86: pvclock: Handle first-time write to pvclock-page contains random junk
	KVM: x86: Exit to user-mode on #UD intercept when emulator requires
	KVM: x86: inject exceptions produced by x86_decode_insn
	mmc: core: Do not leave the block driver in a suspended state
	eeprom: at24: check at24_read/write arguments
	bcache: Fix building error on MIPS
	Revert "drm/radeon: dont switch vt on suspend"
	drm/radeon: fix atombios on big endian
	drm/panel: simple: Add missing panel_simple_unprepare() calls
	mtd: nand: Fix writing mtdoops to nand flash.
	NFS: revalidate "." etc correctly on "open".
	drm/i915: Don't try indexed reads to alternate slave addresses
	drm/i915: Prevent zero length "index" write
	nfsd: Make init_open_stateid() a bit more whole
	nfsd: Fix stateid races between OPEN and CLOSE
	nfsd: Fix another OPEN stateid race
	Linux 4.4.104

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-12-05 11:31:58 +01:00
Matt Fleming
87e2bd898d x86/mm/pat: Ensure cpa->pfn only contains page frame numbers
commit edc3b9129cecd0f0857112136f5b8b1bc1d45918 upstream.

The x86 pageattr code is confused about the data that is stored
in cpa->pfn, sometimes it's treated as a page frame number,
sometimes it's treated as an unshifted physical address, and in
one place it's treated as a pte.

The result of this is that the mapping functions do not map the
intended physical address.

This isn't a problem in practice because most of the addresses
we're mapping in the EFI code paths are already mapped in
'trampoline_pgd' and so the pageattr mapping functions don't
actually do anything in this case. But when we move to using a
separate page table for the EFI runtime this will be an issue.

Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Reviewed-by: Borislav Petkov <bp@suse.de>
Acked-by: Borislav Petkov <bp@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1448658575-17029-3-git-send-email-matt@codeblueprint.co.uk
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: "Ghannam, Yazen" <Yazen.Ghannam@amd.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-12-05 11:22:49 +01:00
Greg Kroah-Hartman
cc3d2b7361 This is the 4.4.77 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllp5y8ACgkQONu9yGCS
 aT6g8hAApzYi9TwiaF6wyYXsrp7YvOm4NyMaVBl4t7v/nFql6VsUL+qWaJKB5EL9
 o72ybYPUzbxGTVWCm/wiBO31VWea0ak0pBbyywBiowGgwAcgG/jpqZobale4Y2TE
 15jEpmpA5+3BmXpMkrv/dz4LHZ4jm65/ADhMbkPGRZqUJ3mHmyVoi50l67dpTE5+
 xWQIErycwlVMppJGnXPeFFgeD7Etch7OJ9CishQRNMb3F8H58WiQrMWWe1NfL0DO
 H2g18IBHMsxEYJqnRqxviTOMe8S96Km+lKGX0LOTRYt+2OQpfIF7buU6N+6C96rO
 7V2n2G02m2mOFVUFlDYF1RQ9IBrxHJf9iGkaZBwsaxX7XAK63ZjRxgjnEL7gMPU/
 TMCOWZ53BdZezz2eAmdhySsV+4Xt6MmJJE8rR47AgsM2Le3tgK421zmraunmA0fR
 eoJS99YHcftAHXCD3puGLafEwGVe0G4eQbY4L7mj1Y9VjaAbmmsWq9rlNOQMZRgH
 JTNyYik1C7yGPJX1iKi9hLAKldzBwPuM3GfZMZQIOjA4t2VtSon7in5iKrihRg3N
 BSKXr6+orNw32tsqcC4kpLPbFUFb6zx3EKELwSJwD9ICN7swJEk7gFw7w/F/SOxI
 C1W4Ulm6EcYTWHDePERQ4zHlllHAmyJup61d9HnwA6HhPOLaff4=
 =oeNk
 -----END PGP SIGNATURE-----

Merge 4.4.77 into android-4.4

Changes in 4.4.77
	fs: add a VALID_OPEN_FLAGS
	fs: completely ignore unknown open flags
	driver core: platform: fix race condition with driver_override
	bgmac: reset & enable Ethernet core before using it
	mm: fix classzone_idx underflow in shrink_zones()
	tracing/kprobes: Allow to create probe with a module name starting with a digit
	drm/virtio: don't leak bo on drm_gem_object_init failure
	usb: dwc3: replace %p with %pK
	USB: serial: cp210x: add ID for CEL EM3588 USB ZigBee stick
	Add USB quirk for HVR-950q to avoid intermittent device resets
	usb: usbip: set buffer pointers to NULL after free
	usb: Fix typo in the definition of Endpoint[out]Request
	mac80211_hwsim: Replace bogus hrtimer clockid
	sysctl: don't print negative flag for proc_douintvec
	sysctl: report EINVAL if value is larger than UINT_MAX for proc_douintvec
	pinctrl: sh-pfc: r8a7791: Fix SCIF2 pinmux data
	pinctrl: meson: meson8b: fix the NAND DQS pins
	pinctrl: sunxi: Fix SPDIF function name for A83T
	pinctrl: mxs: atomically switch mux and drive strength config
	pinctrl: sh-pfc: Update info pointer after SoC-specific init
	USB: serial: option: add two Longcheer device ids
	USB: serial: qcserial: new Sierra Wireless EM7305 device ID
	gfs2: Fix glock rhashtable rcu bug
	x86/tools: Fix gcc-7 warning in relocs.c
	x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings
	ath10k: override CE5 config for QCA9377
	KEYS: Fix an error code in request_master_key()
	RDMA/uverbs: Check port number supplied by user verbs cmds
	mqueue: fix a use-after-free in sys_mq_notify()
	tools include: Add a __fallthrough statement
	tools string: Use __fallthrough in perf_atoll()
	tools strfilter: Use __fallthrough
	perf top: Use __fallthrough
	perf intel-pt: Use __fallthrough
	perf thread_map: Correctly size buffer used with dirent->dt_name
	perf scripting perl: Fix compile error with some perl5 versions
	perf tests: Avoid possible truncation with dirent->d_name + snprintf
	perf bench numa: Avoid possible truncation when using snprintf()
	perf tools: Use readdir() instead of deprecated readdir_r()
	perf thread_map: Use readdir() instead of deprecated readdir_r()
	perf script: Use readdir() instead of deprecated readdir_r()
	perf tools: Remove duplicate const qualifier
	perf annotate browser: Fix behaviour of Shift-Tab with nothing focussed
	perf pmu: Fix misleadingly indented assignment (whitespace)
	perf dwarf: Guard !x86_64 definitions under #ifdef else clause
	perf trace: Do not process PERF_RECORD_LOST twice
	perf tests: Remove wrong semicolon in while loop in CQM test
	perf tools: Use readdir() instead of deprecated readdir_r() again
	md: fix incorrect use of lexx_to_cpu in does_sb_need_changing
	md: fix super_offset endianness in super_1_rdev_size_change
	tcp: fix tcp_mark_head_lost to check skb len before fragmenting
	staging: vt6556: vnt_start Fix missing call to vnt_key_init_table.
	staging: comedi: fix clean-up of comedi_class in comedi_init()
	ext4: check return value of kstrtoull correctly in reserved_clusters_store
	x86/mm/pat: Don't report PAT on CPUs that don't support it
	saa7134: fix warm Medion 7134 EEPROM read
	Linux 4.4.77

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-15 13:29:08 +02:00
Mikulas Patocka
646b65808b x86/mm/pat: Don't report PAT on CPUs that don't support it
commit 99c13b8c8896d7bcb92753bf0c63a8de4326e78d upstream.

The pat_enabled() logic is broken on CPUs which do not support PAT and
where the initialization code fails to call pat_init(). Due to that the
enabled flag stays true and pat_enabled() returns true wrongfully.

As a consequence the mappings, e.g. for Xorg, are set up with the wrong
caching mode and the required MTRR setups are omitted.

To cure this the following changes are required:

  1) Make pat_enabled() return true only if PAT initialization was
     invoked and successful.

  2) Invoke init_cache_modes() unconditionally in setup_arch() and
     remove the extra callsites in pat_disable() and the pat disabled
     code path in pat_init().

Also rename __pat_enabled to pat_disabled to reflect the real purpose of
this variable.

Fixes: 9cd25aac1f ("x86/mm/pat: Emulate PAT when it is disabled")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Bernhard Held <berny156@gmx.de>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-15 11:57:50 +02:00
Greg Kroah-Hartman
64a73ff728 This is the 4.4.76 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllc3f0ACgkQONu9yGCS
 aT4fmA/+OHeYbhpaMRKqrUpsxB3NpROr2Z47ow6vaVjYZzd0irrODLlfIfDQ6EEo
 N3v28povu16VeYXk+4h8bsAP2K2j6/BlRaSi2hB6dmnY8GDMaXEfRojPYAlzVz50
 qnK/6152siDDarUx1h5Zc8GcmX/tEl6h3bOOxDcwLR+RvyIcWxenuR+uqRM/AV6o
 BPEiOuMu7P6LjID7KYgBTFNajVBMLrDXt4SCWdzOZmlNt0QXgKB9yw68vTcc+edC
 ZcXqa0M6nEWSDvwobbwBZhFL8H2dJjzweyjeFBgxnxgmOrRh6kvZG2wsz2c8O3/P
 g8TuMxU7siu+I3lFwKy+dgZ/1REz+6Q3oFBqXsuddrcPYu23rV6mz/GxqWy4cerb
 M4eTWz6L9vA2GoYpvBaWi0tKC9tkNM49g48Y24a6CW1O4dJWlz3RrpTiZmequbNF
 mo8EKomSXn4kYAm1xT03DGljQkK/i2JtyI5sk2hLEqqxKvZ/3q9xxLLKOVx8dPvs
 PIbfpapfYMXXMWgR6e+UKueNLgevfWE12X/OU4SgvSY4n/07/mH40XEd3zd82IsZ
 1Mw0qj3JnqCAFDBBMsDYa+OvABaGD1dHARuiv+aeqW8tqoBglFHxWqF+SQVNXLIE
 qTLiKz78vjQpH0zGpkA3HEOh/h4L7a0y3qRMECsk5SUxXsgu1gg=
 =bwNU
 -----END PGP SIGNATURE-----

Merge 4.4.76 into android-4.4

Changes in 4.4.76
	ipv6: release dst on error in ip6_dst_lookup_tail
	net: don't call strlen on non-terminated string in dev_set_alias()
	decnet: dn_rtmsg: Improve input length sanitization in dnrmg_receive_user_skb
	net: Zero ifla_vf_info in rtnl_fill_vfinfo()
	af_unix: Add sockaddr length checks before accessing sa_family in bind and connect handlers
	Fix an intermittent pr_emerg warning about lo becoming free.
	net: caif: Fix a sleep-in-atomic bug in cfpkt_create_pfx
	igmp: acquire pmc lock for ip_mc_clear_src()
	igmp: add a missing spin_lock_init()
	ipv6: fix calling in6_ifa_hold incorrectly for dad work
	net/mlx5: Wait for FW readiness before initializing command interface
	decnet: always not take dst->__refcnt when inserting dst into hash table
	net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
	sfc: provide dummy definitions of vswitch functions
	ipv6: Do not leak throw route references
	rtnetlink: add IFLA_GROUP to ifla_policy
	netfilter: xt_TCPMSS: add more sanity tests on tcph->doff
	netfilter: synproxy: fix conntrackd interaction
	NFSv4: fix a reference leak caused WARNING messages
	drm/ast: Handle configuration without P2A bridge
	mm, swap_cgroup: reschedule when neeed in swap_cgroup_swapoff()
	MIPS: Avoid accidental raw backtrace
	MIPS: pm-cps: Drop manual cache-line alignment of ready_count
	MIPS: Fix IRQ tracing & lockdep when rescheduling
	ALSA: hda - Fix endless loop of codec configure
	ALSA: hda - set input_path bitmap to zero after moving it to new place
	drm/vmwgfx: Free hash table allocated by cmdbuf managed res mgr
	usb: gadget: f_fs: Fix possibe deadlock
	sysctl: enable strict writes
	block: fix module reference leak on put_disk() call for cgroups throttle
	mm: numa: avoid waiting on freed migrated pages
	KVM: x86: fix fixing of hypercalls
	scsi: sd: Fix wrong DPOFUA disable in sd_read_cache_type
	scsi: lpfc: Set elsiocb contexts to NULL after freeing it
	qla2xxx: Fix erroneous invalid handle message
	ARM: dts: BCM5301X: Correct GIC_PPI interrupt flags
	net: mvneta: Fix for_each_present_cpu usage
	MIPS: ath79: fix regression in PCI window initialization
	net: korina: Fix NAPI versus resources freeing
	MIPS: ralink: MT7688 pinmux fixes
	MIPS: ralink: fix USB frequency scaling
	MIPS: ralink: Fix invalid assignment of SoC type
	MIPS: ralink: fix MT7628 pinmux typos
	MIPS: ralink: fix MT7628 wled_an pinmux gpio
	mtd: bcm47xxpart: limit scanned flash area on BCM47XX (MIPS) only
	bgmac: fix a missing check for build_skb
	mtd: bcm47xxpart: don't fail because of bit-flips
	bgmac: Fix reversed test of build_skb() return value.
	net: bgmac: Fix SOF bit checking
	net: bgmac: Start transmit queue in bgmac_open
	net: bgmac: Remove superflous netif_carrier_on()
	powerpc/eeh: Enable IO path on permanent error
	gianfar: Do not reuse pages from emergency reserve
	Btrfs: fix truncate down when no_holes feature is enabled
	virtio_console: fix a crash in config_work_handler
	swiotlb-xen: update dev_addr after swapping pages
	xen-netfront: Fix Rx stall during network stress and OOM
	scsi: virtio_scsi: Reject commands when virtqueue is broken
	platform/x86: ideapad-laptop: handle ACPI event 1
	amd-xgbe: Check xgbe_init() return code
	net: dsa: Check return value of phy_connect_direct()
	drm/amdgpu: check ring being ready before using
	vfio/spapr: fail tce_iommu_attach_group() when iommu_data is null
	virtio_net: fix PAGE_SIZE > 64k
	vxlan: do not age static remote mac entries
	ibmveth: Add a proper check for the availability of the checksum features
	kernel/panic.c: add missing \n
	HID: i2c-hid: Add sleep between POWER ON and RESET
	scsi: lpfc: avoid double free of resource identifiers
	spi: davinci: use dma_mapping_error()
	mac80211: initialize SMPS field in HT capabilities
	x86/mpx: Use compatible types in comparison to fix sparse error
	coredump: Ensure proper size of sparse core files
	swiotlb: ensure that page-sized mappings are page-aligned
	s390/ctl_reg: make __ctl_load a full memory barrier
	be2net: fix status check in be_cmd_pmac_add()
	perf probe: Fix to show correct locations for events on modules
	net/mlx4_core: Eliminate warning messages for SRQ_LIMIT under SRIOV
	sctp: check af before verify address in sctp_addr_id2transport
	ravb: Fix use-after-free on `ifconfig eth0 down`
	jump label: fix passing kbuild_cflags when checking for asm goto support
	xfrm: fix stack access out of bounds with CONFIG_XFRM_SUB_POLICY
	xfrm: NULL dereference on allocation failure
	xfrm: Oops on error in pfkey_msg2xfrm_state()
	watchdog: bcm281xx: Fix use of uninitialized spinlock.
	sched/loadavg: Avoid loadavg spikes caused by delayed NO_HZ accounting
	ARM64/ACPI: Fix BAD_MADT_GICC_ENTRY() macro implementation
	ARM: 8685/1: ensure memblock-limit is pmd-aligned
	x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
	x86/mm: Fix flush_tlb_page() on Xen
	ocfs2: o2hb: revert hb threshold to keep compatible
	iommu/vt-d: Don't over-free page table directories
	iommu: Handle default domain attach failure
	iommu/amd: Fix incorrect error handling in amd_iommu_bind_pasid()
	cpufreq: s3c2416: double free on driver init error path
	KVM: x86: fix emulation of RSM and IRET instructions
	KVM: x86/vPMU: fix undefined shift in intel_pmu_refresh()
	KVM: x86: zero base3 of unusable segments
	KVM: nVMX: Fix exception injection
	Linux 4.4.76

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-07-05 16:16:58 +02:00
Andy Lutomirski
5d650fcef9 x86/mm: Fix flush_tlb_page() on Xen
commit dbd68d8e84c606673ebbcf15862f8c155fa92326 upstream.

flush_tlb_page() passes a bogus range to flush_tlb_others() and
expects the latter to fix it up.  native_flush_tlb_others() has the
fixup but Xen's version doesn't.  Move the fixup to
flush_tlb_others().

AFAICS the only real effect is that, without this fix, Xen would
flush everything instead of just the one page on remote vCPUs in
when flush_tlb_page() was called.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Nadav Amit <namit@vmware.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: e7b52ffd45 ("x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range")
Link: http://lkml.kernel.org/r/10ed0e4dfea64daef10b87fb85df1746999b4dba.1492844372.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:22 +02:00
Joerg Roedel
6fb3b32230 x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
commit 5ed386ec09a5d75bcf073967e55e895c2607a5c3 upstream.

When this function fails it just sends a SIGSEGV signal to
user-space using force_sig(). This signal is missing
essential information about the cause, e.g. the trap_nr or
an error code.

Fix this by propagating the error to the only caller of
mpx_handle_bd_fault(), do_bounds(), which sends the correct
SIGSEGV signal to the process.

Signed-off-by: Joerg Roedel <jroedel@suse.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: fe3d197f84 ('x86, mpx: On-demand kernel allocation of bounds tables')
Link: http://lkml.kernel.org/r/1491488362-27198-1-git-send-email-joro@8bytes.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:22 +02:00
Tobias Klauser
c20bdc08af x86/mpx: Use compatible types in comparison to fix sparse error
[ Upstream commit 453828625731d0ba7218242ef6ec88f59408f368 ]

info->si_addr is of type void __user *, so it should be compared against
something from the same address space.

This fixes the following sparse error:

  arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces)

Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-07-05 14:37:20 +02:00
Greg Kroah-Hartman
77ddb50929 This is the 4.4.74 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAllQl/sACgkQONu9yGCS
 aT5zMRAAuDBpWjQ1IFtgmzQnKGyjS3fm5X/EgPmT81PFKXay5/TH6Hc85TvorChk
 mCC7qybadCFPjieBfUeCGhTposiGkbOZdYIzduzLeHPe7Eda88NKJw5ZS3x+RDro
 if6BZNtQPwPk9jQ95zpBu/p6eCuIGFzQObif8XHga9eEVP+TPGDKFn5EdLM8j99t
 ErKYyTLFEiZYa52hpCBbVz/4mX8bJOoAlZaitcbvaFbG0OodA5SL24sKlr7tAPrM
 ajnuqv+ghOUjbXrUlrTGxCjJ7vCJjdBqNzuxVFNj5P1xDucpBW8uuWGob0XWTMbB
 hj/ToAIQXQXrZKFpASWW74B4QZDcjo7dbhDWOurBaAsyLuBzAi26pI+q6TqgCQUO
 k17ilfk9LVEvvFhiQ7xpJPNnkh6tCEk7Jdblru6ZL5fHCAYe+qUDj56TbqjFJCQK
 +bDzPi0QXkEGQNKxo7zDu5iGQ0Gb0zD2Z3MrGD+3pCkM5yG0PXjzZ7lOlboyPzwY
 88dxuuTRmm8yGEEm81BKmDYqAA1l4FCrap8u9FLoNyoZyMnK7B+SHHuPRBRhL3F2
 I3L/v8BbJhXTsDNPXEsXtpZZpn2wxJp4x4gKWmCcOb5MM1nbFrFtwdj0cKobu6Xe
 ygNMEkjlW2uUrZoDXthj1ICda/cEw/R0gMWzBeNNVfErOZEmFxM=
 =zl9i
 -----END PGP SIGNATURE-----

Merge 4.4.74 into android-4.4

Changes in 4.4.74
	configfs: Fix race between create_link and configfs_rmdir
	can: gs_usb: fix memory leak in gs_cmd_reset()
	cpufreq: conservative: Allow down_threshold to take values from 1 to 10
	vb2: Fix an off by one error in 'vb2_plane_vaddr'
	mac80211: don't look at the PM bit of BAR frames
	mac80211/wpa: use constant time memory comparison for MACs
	mac80211: fix CSA in IBSS mode
	mac80211: fix IBSS presp allocation size
	serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
	x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
	mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode
	staging: rtl8188eu: prevent an underflow in rtw_check_beacon_data()
	iio: proximity: as3935: recalibrate RCO after resume
	USB: hub: fix SS max number of ports
	usb: core: fix potential memory leak in error path during hcd creation
	pvrusb2: reduce stack usage pvr2_eeprom_analyze()
	USB: gadget: dummy_hcd: fix hub-descriptor removable fields
	usb: r8a66597-hcd: select a different endpoint on timeout
	usb: r8a66597-hcd: decrease timeout
	drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR()
	usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk
	USB: gadgetfs, dummy-hcd, net2280: fix locking for callbacks
	mm/memory-failure.c: use compound_head() flags for huge pages
	swap: cond_resched in swap_cgroup_prepare()
	genirq: Release resources in __setup_irq() error path
	alarmtimer: Prevent overflow of relative timers
	usb: dwc3: exynos fix axius clock error path to do cleanup
	MIPS: Fix bnezc/jialc return address calculation
	alarmtimer: Rate limit periodic intervals
	mm: larger stack guard gap, between vmas
	Allow stack to grow up to address space limit
	mm: fix new crash in unmapped_area_topdown()
	Linux 4.4.74

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-06-27 09:47:59 +02:00
Hugh Dickins
4b35943067 mm: larger stack guard gap, between vmas
commit 1be7107fbe18eed3e319a6c3e83c78254b693acb upstream.

Stack guard page is a useful feature to reduce a risk of stack smashing
into a different mapping. We have been using a single page gap which
is sufficient to prevent having stack adjacent to a different mapping.
But this seems to be insufficient in the light of the stack usage in
userspace. E.g. glibc uses as large as 64kB alloca() in many commonly
used functions. Others use constructs liks gid_t buffer[NGROUPS_MAX]
which is 256kB or stack strings with MAX_ARG_STRLEN.

This will become especially dangerous for suid binaries and the default
no limit for the stack size limit because those applications can be
tricked to consume a large portion of the stack and a single glibc call
could jump over the guard page. These attacks are not theoretical,
unfortunatelly.

Make those attacks less probable by increasing the stack guard gap
to 1MB (on systems with 4k pages; but make it depend on the page size
because systems with larger base pages might cap stack allocations in
the PAGE_SIZE units) which should cover larger alloca() and VLA stack
allocations. It is obviously not a full fix because the problem is
somehow inherent, but it should reduce attack space a lot.

One could argue that the gap size should be configurable from userspace,
but that can be done later when somebody finds that the new 1MB is wrong
for some special case applications.  For now, add a kernel command line
option (stack_guard_gap) to specify the stack gap size (in page units).

Implementation wise, first delete all the old code for stack guard page:
because although we could get away with accounting one extra page in a
stack vma, accounting a larger gap can break userspace - case in point,
a program run with "ulimit -S -v 20000" failed when the 1MB gap was
counted for RLIMIT_AS; similar problems could come with RLIMIT_MLOCK
and strict non-overcommit mode.

Instead of keeping gap inside the stack vma, maintain the stack guard
gap as a gap between vmas: using vm_start_gap() in place of vm_start
(or vm_end_gap() in place of vm_end if VM_GROWSUP) in just those few
places which need to respect the gap - mainly arch_get_unmapped_area(),
and and the vma tree's subtree_gap support for that.

Original-patch-by: Oleg Nesterov <oleg@redhat.com>
Original-patch-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[wt: backport to 4.11: adjust context]
[wt: backport to 4.9: adjust context ; kernel doc was not in admin-guide]
[wt: backport to 4.4: adjust context ; drop ppc hugetlb_radix changes]
Signed-off-by: Willy Tarreau <w@1wt.eu>
[gkh: minor build fixes for 4.4]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:11 +02:00
Laura Abbott
93d022e256 x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init()
commit 861ce4a3244c21b0af64f880d5bfe5e6e2fb9e4a upstream.

'__vmalloc_start_set' currently only gets set in initmem_init() when
!CONFIG_NEED_MULTIPLE_NODES. This breaks detection of vmalloc address
with virt_addr_valid() with CONFIG_NEED_MULTIPLE_NODES=y, causing
a kernel crash:

  [mm/usercopy] 517e1fbeb6: kernel BUG at arch/x86/mm/physaddr.c:78!

Set '__vmalloc_start_set' appropriately for that case as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Laura Abbott <labbott@redhat.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: dc16ecf7fd ("x86-32: use specific __vmalloc_start_set flag in __virt_addr_valid")
Link: http://lkml.kernel.org/r/1494278596-30373-1-git-send-email-labbott@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-06-26 07:13:09 +02:00
Greg Kroah-Hartman
29fa724a09 This is the 4.4.63 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlj5tRkACgkQONu9yGCS
 aT5zFxAAouq2kxBFxxJIQ3255yy/7B6oBYrhilQZPrETC800PUaIqZtuQZPpaoqb
 3gG0+12ve0CMHK+PidEwsQlMlAHNI1xbzmUHm2UIrLYYCV817DTkEsc7JXGUvYVA
 /YA71GASKmLVi9DnsawRb0ELhTeQHec76LrPlgvyWH/OMEtNcMOv/8oWfTq9bKV2
 HsHC6MOwT2R86ukhYYmcfFHomTnJSpW7KtGXwNC/LhohzIfsKQKGQWb1f1j1aHGC
 u5yQ5Qc9T+DhPMHAEY+xuURz/3ohpUL8aSQXk7pua/bTD0X0klNQcf/BXVJXsaeI
 s4g78q+YdTcPL81rkEW+7yUvAlb3u+FdVr+wjsl/s6ih4iL0EgBsoClqUjGUUoz+
 jvCXHiMP7lHi50eIkppQf/yZSVKSobKn5YYf9AA+y6tQ9R9GguDS/IQSRe2HnHeR
 OymCBXa6BSmQGGyPiMUBiNTix6roJ8Vr4dK9lbsQXZ+YZICXWs1rpMOy5HK9EJWf
 M6YF6l9lHwQ38AN+MhsjUXIyKLp9zCk7syeFaeK6k/IA2kcm7dL/momiZ1QIBnhq
 OHB3iwEPZ5Rr4CVjk5j7Ue22ubdrtpc8IfTYV95N7nv+g3nBwe22k+RDi70NiDwk
 2pnBqhO/vtPRE9Ry3QBS73VEeXgNb9IIVwQ7hi9Rk7KUgmdEOOo=
 =iS0x
 -----END PGP SIGNATURE-----

Merge 4.4.63 into android-4.4

Changes in 4.4.63:
	cgroup, kthread: close race window where new kthreads can be migrated to non-root cgroups
	thp: fix MADV_DONTNEED vs clear soft dirty race
	drm/nouveau/mpeg: mthd returns true on success now
	drm/nouveau/mmu/nv4a: use nv04 mmu rather than the nv44 one
	CIFS: store results of cifs_reopen_file to avoid infinite wait
	Input: xpad - add support for Razer Wildcat gamepad
	perf/x86: Avoid exposing wrong/stale data in intel_pmu_lbr_read_32()
	x86/vdso: Ensure vdso32_enabled gets set to valid values only
	x86/vdso: Plug race between mapping and ELF header setup
	acpi, nfit, libnvdimm: fix interleave set cookie calculation (64-bit comparison)
	iscsi-target: Fix TMR reference leak during session shutdown
	iscsi-target: Drop work-around for legacy GlobalSAN initiator
	scsi: sr: Sanity check returned mode data
	scsi: sd: Consider max_xfer_blocks if opt_xfer_blocks is unusable
	scsi: sd: Fix capacity calculation with 32-bit sector_t
	xen, fbfront: fix connecting to backend
	libnvdimm: fix reconfig_mutex, mmap_sem, and jbd2_handle lockdep splat
	irqchip/irq-imx-gpcv2: Fix spinlock initialization
	ftrace: Fix removing of second function probe
	char: Drop bogus dependency of DEVPORT on !M68K
	char: lack of bool string made CONFIG_DEVPORT always on
	Revert "MIPS: Lantiq: Fix cascaded IRQ setup"
	kvm: fix page struct leak in handle_vmon
	zram: do not use copy_page with non-page aligned address
	powerpc: Disable HFSCR[TM] if TM is not supported
	crypto: ahash - Fix EINPROGRESS notification callback
	ath9k: fix NULL pointer dereference
	dvb-usb-v2: avoid use-after-free
	ext4: fix inode checksum calculation problem if i_extra_size is small
	platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event
	rtc: tegra: Implement clock handling
	mm: Tighten x86 /dev/mem with zeroing reads
	dvb-usb: don't use stack for firmware load
	dvb-usb-firmware: don't do DMA on stack
	virtio-console: avoid DMA from stack
	pegasus: Use heap buffers for all register access
	rtl8150: Use heap buffers for all register access
	catc: Combine failure cleanup code in catc_probe()
	catc: Use heap buffer for memory size test
	ibmveth: calculate gso_segs for large packets
	SUNRPC: fix refcounting problems with auth_gss messages.
	tty/serial: atmel: RS485 half duplex w/DMA: enable RX after TX is done
	net: ipv6: check route protocol when deleting routes
	sctp: deny peeloff operation on asocs with threads sleeping on it
	MIPS: fix Select HAVE_IRQ_EXIT_ON_IRQ_STACK patch.
	Linux 4.4.63

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2017-04-21 09:47:01 +02:00