android_kernel_oneplus_msm8998/fs
Andrea Arcangeli 8f6345a11c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping
commit 04f5866e41fb70690e28397487d8bd8eea7d712a upstream.

The core dumping code has always run without holding the mmap_sem for
writing, despite that is the only way to ensure that the entire vma
layout will not change from under it.  Only using some signal
serialization on the processes belonging to the mm is not nearly enough.
This was pointed out earlier.  For example in Hugh's post from Jul 2017:

  https://lkml.kernel.org/r/alpine.LSU.2.11.1707191716030.2055@eggly.anvils

  "Not strictly relevant here, but a related note: I was very surprised
   to discover, only quite recently, how handle_mm_fault() may be called
   without down_read(mmap_sem) - when core dumping. That seems a
   misguided optimization to me, which would also be nice to correct"

In particular because the growsdown and growsup can move the
vm_start/vm_end the various loops the core dump does around the vma will
not be consistent if page faults can happen concurrently.

Pretty much all users calling mmget_not_zero()/get_task_mm() and then
taking the mmap_sem had the potential to introduce unexpected side
effects in the core dumping code.

Adding mmap_sem for writing around the ->core_dump invocation is a
viable long term fix, but it requires removing all copy user and page
faults and to replace them with get_dump_page() for all binary formats
which is not suitable as a short term fix.

For the time being this solution manually covers the places that can
confuse the core dump either by altering the vma layout or the vma flags
while it runs.  Once ->core_dump runs under mmap_sem for writing the
function mmget_still_valid() can be dropped.

Allowing mmap_sem protected sections to run in parallel with the
coredump provides some minor parallelism advantage to the swapoff code
(which seems to be safe enough by never mangling any vma field and can
keep doing swapins in parallel to the core dumping) and to some other
corner case.

In order to facilitate the backporting I added "Fixes: 86039bd3b4e6"
however the side effect of this same race condition in /proc/pid/mem
should be reproducible since before 2.6.12-rc2 so I couldn't add any
other "Fixes:" because there's no hash beyond the git genesis commit.

Because find_extend_vma() is the only location outside of the process
context that could modify the "mm" structures under mmap_sem for
reading, by adding the mmget_still_valid() check to it, all other cases
that take the mmap_sem for reading don't need the new check after
mmget_not_zero()/get_task_mm().  The expand_stack() in page fault
context also doesn't need the new check, because all tasks under core
dumping are frozen.

Link: http://lkml.kernel.org/r/20190325224949.11068-1-aarcange@redhat.com
Fixes: 86039bd3b4 ("userfaultfd: add new syscall to provide memory externalization")
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Jann Horn <jannh@google.com>
Suggested-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Jann Horn <jannh@google.com>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Joel Fernandes (Google) <joel@joelfernandes.org>
[mhocko@suse.com: stable 4.4 backport
 - drop infiniband part because of missing 5f9794dc94f59
 - drop userfaultfd_event_wait_completion hunk because of
   missing 9cd75c3cd4c3d]
 - handle binder_update_page_range because of missing 720c241924046
 - handle mlx5_ib_disassociate_ucontext - akaher@vmware.com
]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-06-22 08:18:27 +02:00
..
9p 9p locks: add mount option for lock retry interval 2019-04-27 09:33:58 +02:00
adfs
affs affs_lookup(): close a race with affs_remove_link() 2018-05-30 07:48:51 +02:00
afs afs: Fix afs_kill_pages() 2017-12-20 10:04:56 +01:00
autofs4 autofs: fix error return in autofs_fill_super() 2019-03-23 08:44:27 +01:00
befs
bfs bfs: add sanity check at bfs_fill_super() 2018-12-01 09:46:33 +01:00
btrfs Btrfs: fix race updating log root item during fsync 2019-06-11 12:24:09 +02:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-12-17 21:55:12 +01:00
ceph ceph: flush dirty inodes before proceeding with remount 2019-06-11 12:23:46 +02:00
cifs CIFS: cifs_read_allocate_pages: don't iterate through whole page array on ENOMEM 2019-06-11 12:24:10 +02:00
coda coda: fix 'kernel memory exposure attempt' in fsync 2017-11-24 08:32:25 +01:00
configfs configfs: Fix use-after-free when accessing sd->s_dentry 2019-06-22 08:18:26 +02:00
cramfs Cramfs: fix abad comparison when wrap-arounds occur 2018-11-21 09:27:37 +01:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-16 19:45:01 +02:00
devpts devpts: clean up interface to pty drivers 2016-08-16 09:30:49 +02:00
dlm dlm: Don't swamp the CPU with callbacks queued during recovery 2019-02-20 10:13:04 +01:00
ecryptfs do d_instantiate/unlock_new_inode combinations safely 2018-05-30 07:48:52 +02:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs fs/exofs: fix potential memory leak in mount option parsing 2018-11-27 16:08:00 +01:00
exportfs exportfs: do not read dentry after free 2018-12-17 21:55:10 +01:00
ext2 ext2: Fix underflow in ext2_max_size() 2019-03-23 08:44:36 +01:00
ext4 ext4: do not delete unlinked inode from orphan list on failed truncate 2019-06-11 12:23:50 +02:00
f2fs f2fs: fix to do sanity check on valid block count of segment 2019-06-22 08:18:19 +02:00
fat fs/fat/file.c: issue flush after the writeback of FAT 2019-06-22 08:18:17 +02:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-12-17 21:55:11 +01:00
fuse fuse: retrieve: cap requested size to negotiated max_write 2019-06-22 08:18:20 +02:00
gfs2 gfs2: Fix lru_count going negative 2019-06-11 12:23:53 +02:00
hfs hfs: do not free node before using 2018-12-17 21:55:12 +01:00
hfsplus hfsplus: do not free node before using 2018-12-17 21:55:12 +01:00
hostfs hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() 2016-09-30 10:18:39 +02:00
hpfs hpfs: implement the show_options method 2016-06-01 12:15:54 -07:00
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-06-11 12:23:52 +02:00
isofs isofs: fix timestamps beyond 2027 2017-11-30 08:37:20 +00:00
jbd2 jbd2: fix compile warning when using JBUFFER_TRACE 2019-03-23 08:44:37 +01:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-05-16 19:45:01 +02:00
jfs jfs: Fix inconsistency between memory allocation and ea_buf->max_size 2018-08-09 12:19:28 +02:00
kernfs kernfs: Replace strncpy with memcpy 2018-12-13 09:21:29 +01:00
lockd lockd: fix access beyond unterminated strings in prints 2018-11-21 09:27:36 +01:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix
ncpfs ncpfs: fix build warning of strncpy 2019-03-23 08:44:21 +01:00
nfs NFS4: Fix v4.0 client state corruption when mount 2019-06-11 12:23:45 +02:00
nfs_common lockd: fix "list_add double add" caused by legacy signal interface 2018-02-03 17:04:28 +01:00
nfsd nfsd: allow fh_want_write to be called twice 2019-06-22 08:18:20 +02:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-30 07:48:52 +02:00
nls
notify fanotify: fix logic of events on child 2018-04-24 09:32:11 +02:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 fs/ocfs2: fix race in ocfs2_dentry_attach_lock() 2019-06-22 08:18:22 +02:00
omfs
openpromfs
overlayfs ovl: fix uid/gid when creating over whiteout 2019-04-27 09:33:59 +02:00
proc coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-06-22 08:18:27 +02:00
pstore pstore/ram: Do not treat empty buffers as valid 2019-01-26 09:42:53 +01:00
qnx4
qnx6
quota fs/quota: Fix spectre gadget in do_quotactl 2018-09-09 20:04:36 +02:00
ramfs mm, fs: obey gfp_mapping for add_to_page_cache() 2015-10-16 11:42:28 -07:00
reiserfs reiserfs: propagate errors from fill_with_dentries() properly 2018-11-27 16:08:00 +01:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-06-17 06:39:38 +02:00
squashfs squashfs: more metadata hardenings 2018-08-06 16:24:42 +02:00
sysfs scsi: sysfs: Introduce sysfs_{un,}break_active_protection() 2018-09-05 09:18:40 +02:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-12-17 21:55:09 +01:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs ubifs: Check for name being NULL while mounting 2018-10-13 09:11:34 +02:00
udf udf: Fix crash on IO error during truncate 2019-04-03 06:23:14 +02:00
ufs ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour 2019-06-11 12:23:49 +02:00
xfs xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE 2019-01-26 09:42:52 +01:00
aio.c aio: fix spectre gadget in lookup_ioctx 2018-12-21 14:09:50 +01:00
anon_inodes.c
attr.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c binfmt_elf: switch to new creds when switching to new mm 2019-04-27 09:33:53 +02:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c
binfmt_misc.c fs/binfmt_misc.c: do not allow offset overflow 2018-07-03 11:21:26 +02:00
binfmt_script.c Revert "exec: load_script: don't blindly truncate shebang string" 2019-02-20 10:13:20 +01:00
block_dev.c fs/block_dev: always invalidate cleancache in invalidate_bdev() 2017-05-20 14:27:01 +02:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-04-27 09:33:49 +02:00
char_dev.c chardev: add additional check for minor range overlap 2019-06-11 12:24:03 +02:00
compat.c
compat_binfmt_elf.c binfmt_elf: compat: avoid unused function warning 2018-02-25 11:03:51 +01:00
compat_ioctl.c fs: compat: Remove warning from COMPATIBLE_IOCTL 2018-04-08 11:51:57 +02:00
coredump.c coredump: Ensure proper size of sparse core files 2017-07-05 14:37:20 +02:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c Hang/soft lockup in d_invalidate with simultaneous calls 2019-04-03 06:23:19 +02:00
dcookies.c
direct-io.c direct-io: Prevent NULL pointer access in submit_page_section 2017-10-18 09:20:42 +02:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-03-23 08:44:26 +01:00
eventfd.c
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-02-20 10:13:14 +01:00
exec.c mm: replace get_user_pages() write/force parameters with gup_flags 2018-12-17 21:55:16 +01:00
fcntl.c fs/fcntl: f_setown, avoid undefined behaviour 2018-01-31 12:06:11 +01:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
file.c fs/file.c: initialize init_files.resize_wait 2019-04-27 09:33:49 +02:00
file_table.c
filesystems.c
fs-writeback.c fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount 2019-06-11 12:23:41 +02:00
fs_pin.c
fs_struct.c
inode.c writeback: initialize inode members that track writeback history 2019-04-03 06:23:21 +02:00
internal.h
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c
locks.c locks: don't check for race with close when setting OFD lock 2018-01-17 09:35:27 +01:00
Makefile ext4: promote ext4 over ext2 in the default probe order 2015-10-15 10:33:21 -04:00
mbcache.c
mount.h mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
mpage.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
namei.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-12-01 09:46:41 +01:00
namespace.c mount: Prevent MNT_DETACH from disconnecting locked mounts 2018-11-21 09:27:44 +01:00
no-block.c
nsfs.c nsfs: mark dentry with DCACHE_RCUACCESS 2018-02-16 20:09:43 +01:00
open.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-06-11 12:24:13 +02:00
pipe.c pipe: cap initial pipe capacity according to pipe-max-size limit 2018-05-26 08:48:51 +02:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-07-21 07:44:58 +02:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2017-04-30 05:49:28 +02:00
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-26 08:23:47 +01:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-04-12 09:08:55 -07:00
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-06-11 12:24:13 +02:00
readdir.c
select.c fs/select: add vmalloc fallback for select(2) 2018-01-31 12:06:09 +01:00
seq_file.c Make file credentials available to the seqfile interfaces 2017-08-06 19:19:42 -07:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-23 17:43:09 +01:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-14 13:16:24 +02:00
statfs.c
super.c fs: don't scan the inode cache before SB_BORN is set 2019-02-06 19:43:08 +01:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-05-08 07:46:01 +02:00
userfaultfd.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-06-22 08:18:27 +02:00
utimes.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
xattr.c getxattr: use correct xattr length 2018-09-09 20:04:36 +02:00