android_kernel_oneplus_msm8998/fs
Linus Torvalds 204b14581f access: avoid the RCU grace period for the temporary subjective credentials
commit d7852fbd0f0423937fa287a598bfde188bb68c22 upstream.

It turns out that 'access()' (and 'faccessat()') can cause a lot of RCU
work because it installs a temporary credential that gets allocated and
freed for each system call.

The allocation and freeing overhead is mostly benign, but because
credentials can be accessed under the RCU read lock, the freeing
involves a RCU grace period.

Which is not a huge deal normally, but if you have a lot of access()
calls, this causes a fair amount of seconday damage: instead of having a
nice alloc/free patterns that hits in hot per-CPU slab caches, you have
all those delayed free's, and on big machines with hundreds of cores,
the RCU overhead can end up being enormous.

But it turns out that all of this is entirely unnecessary.  Exactly
because access() only installs the credential as the thread-local
subjective credential, the temporary cred pointer doesn't actually need
to be RCU free'd at all.  Once we're done using it, we can just free it
synchronously and avoid all the RCU overhead.

So add a 'non_rcu' flag to 'struct cred', which can be set by users that
know they only use it in non-RCU context (there are other potential
users for this).  We can make it a union with the rcu freeing list head
that we need for the RCU case, so this doesn't need any extra storage.

Note that this also makes 'get_current_cred()' clear the new non_rcu
flag, in case we have filesystems that take a long-term reference to the
cred and then expect the RCU delayed freeing afterwards.  It's not
entirely clear that this is required, but it makes for clear semantics:
the subjective cred remains non-RCU as long as you only access it
synchronously using the thread-local accessors, but you _can_ use it as
a generic cred if you want to.

It is possible that we should just remove the whole RCU markings for
->cred entirely.  Only ->real_cred is really supposed to be accessed
through RCU, and the long-term cred copies that nfs uses might want to
explicitly re-enable RCU freeing if required, rather than have
get_current_cred() do it implicitly.

But this is a "minimal semantic changes" change for the immediate
problem.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Paul E. McKenney <paulmck@linux.ibm.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Jan Glauber <jglauber@marvell.com>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Jayachandran Chandrasekharan Nair <jnair@marvell.com>
Cc: Greg KH <greg@kroah.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2019-08-04 09:35:01 +02:00
..
9p 9p: pass the correct prototype to read_cache_page 2019-08-04 09:34:59 +02:00
adfs
affs affs_lookup(): close a race with affs_remove_link() 2018-05-30 07:48:51 +02:00
afs afs: Fix afs_kill_pages() 2017-12-20 10:04:56 +01:00
autofs4 autofs: fix error return in autofs_fill_super() 2019-03-23 08:44:27 +01:00
befs
bfs bfs: add sanity check at bfs_fill_super() 2018-12-01 09:46:33 +01:00
btrfs btrfs: Ensure replaced device doesn't have pending chunk allocation 2019-07-10 09:56:44 +02:00
cachefiles fscache, cachefiles: remove redundant variable 'cache' 2018-12-17 21:55:12 +01:00
ceph ceph: flush dirty inodes before proceeding with remount 2019-06-11 12:23:46 +02:00
cifs SMB3: retry on STATUS_INSUFFICIENT_RESOURCES instead of failing write 2019-07-10 09:56:34 +02:00
coda coda: pass the host file in vma->vm_file on mmap 2019-08-04 09:34:52 +02:00
configfs configfs: Fix use-after-free when accessing sd->s_dentry 2019-06-22 08:18:26 +02:00
cramfs Cramfs: fix abad comparison when wrap-arounds occur 2018-11-21 09:27:37 +01:00
debugfs debugfs: fix use-after-free on symlink traversal 2019-05-16 19:45:01 +02:00
devpts devpts: clean up interface to pty drivers 2016-08-16 09:30:49 +02:00
dlm dlm: Don't swamp the CPU with callbacks queued during recovery 2019-02-20 10:13:04 +01:00
ecryptfs eCryptfs: fix a couple type promotion bugs 2019-08-04 09:34:53 +02:00
efivarfs efi: Make efivarfs entries immutable by default 2016-03-03 15:07:09 -08:00
efs
exofs fs/exofs: fix potential memory leak in mount option parsing 2018-11-27 16:08:00 +01:00
exportfs exportfs: do not read dentry after free 2018-12-17 21:55:10 +01:00
ext2 ext2: Fix underflow in ext2_max_size() 2019-03-23 08:44:36 +01:00
ext4 fscrypt: don't set policy for a dead directory 2019-07-21 09:07:10 +02:00
f2fs f2fs: avoid out-of-range memory access 2019-08-04 09:34:58 +02:00
fat fs/fat/file.c: issue flush after the writeback of FAT 2019-06-22 08:18:17 +02:00
freevxfs
fscache fscache: fix race between enablement and dropping of object 2018-12-17 21:55:11 +01:00
fuse fuse: retrieve: cap requested size to negotiated max_write 2019-06-22 08:18:20 +02:00
gfs2 gfs2: Fix lru_count going negative 2019-06-11 12:23:53 +02:00
hfs hfs: do not free node before using 2018-12-17 21:55:12 +01:00
hfsplus hfsplus: do not free node before using 2018-12-17 21:55:12 +01:00
hostfs hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() 2016-09-30 10:18:39 +02:00
hpfs hpfs: implement the show_options method 2016-06-01 12:15:54 -07:00
hugetlbfs hugetlb: use same fault hash key for shared and private mappings 2019-06-11 12:23:52 +02:00
isofs isofs: fix timestamps beyond 2027 2017-11-30 08:37:20 +00:00
jbd2 jbd2: fix compile warning when using JBUFFER_TRACE 2019-03-23 08:44:37 +01:00
jffs2 jffs2: fix use-after-free on symlink traversal 2019-05-16 19:45:01 +02:00
jfs jfs: Fix inconsistency between memory allocation and ea_buf->max_size 2018-08-09 12:19:28 +02:00
kernfs kernfs: Replace strncpy with memcpy 2018-12-13 09:21:29 +01:00
lockd lockd: fix access beyond unterminated strings in prints 2018-11-21 09:27:36 +01:00
logfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
minix
ncpfs ncpfs: fix build warning of strncpy 2019-03-23 08:44:21 +01:00
nfs NFSv4: Fix open create exclusive when the server reboots 2019-08-04 09:34:55 +02:00
nfs_common lockd: fix "list_add double add" caused by legacy signal interface 2018-02-03 17:04:28 +01:00
nfsd nfsd: Fix overflow causing non-working mounts on 1 TB machines 2019-08-04 09:34:55 +02:00
nilfs2 do d_instantiate/unlock_new_inode combinations safely 2018-05-30 07:48:52 +02:00
nls
notify fanotify: fix logic of events on child 2018-04-24 09:32:11 +02:00
ntfs mm, fs: introduce mapping_gfp_constraint() 2015-11-06 17:50:42 -08:00
ocfs2 fs/ocfs2: fix race in ocfs2_dentry_attach_lock() 2019-06-22 08:18:22 +02:00
omfs
openpromfs
overlayfs ovl: modify ovl_permission() to do checks on two inodes 2019-07-10 09:56:36 +02:00
proc coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-06-22 08:18:27 +02:00
pstore pstore/ram: Do not treat empty buffers as valid 2019-01-26 09:42:53 +01:00
qnx4
qnx6
quota fs/quota: Fix spectre gadget in do_quotactl 2018-09-09 20:04:36 +02:00
ramfs
reiserfs reiserfs: propagate errors from fill_with_dentries() properly 2018-11-27 16:08:00 +01:00
romfs romfs: use different way to generate fsid for BLOCK or MTD 2017-06-17 06:39:38 +02:00
squashfs squashfs: more metadata hardenings 2018-08-06 16:24:42 +02:00
sysfs scsi: sysfs: Introduce sysfs_{un,}break_active_protection() 2018-09-05 09:18:40 +02:00
sysv sysv: return 'err' instead of 0 in __sysv_write_inode 2018-12-17 21:55:09 +01:00
tracefs tracefs: Fix refcount imbalance in start_creating() 2015-11-04 22:13:45 -05:00
ubifs ubifs: Check for name being NULL while mounting 2018-10-13 09:11:34 +02:00
udf udf: Fix incorrect final NOT_ALLOCATED (hole) extent length 2019-07-21 09:07:08 +02:00
ufs ufs: fix braino in ufs_get_inode_gid() for solaris UFS flavour 2019-06-11 12:23:49 +02:00
xfs xfs: don't fail when converting shortform attr to long form during ATTR_REPLACE 2019-01-26 09:42:52 +01:00
aio.c aio: fix spectre gadget in lookup_ioctx 2018-12-21 14:09:50 +01:00
anon_inodes.c
attr.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
bad_inode.c
binfmt_aout.c
binfmt_elf.c binfmt_elf: switch to new creds when switching to new mm 2019-04-27 09:33:53 +02:00
binfmt_elf_fdpic.c libnvdimm for 4.4: 2015-11-10 12:07:22 -08:00
binfmt_em86.c
binfmt_flat.c fs/binfmt_flat.c: make load_flat_shared_library() work 2019-07-10 09:56:30 +02:00
binfmt_misc.c fs/binfmt_misc.c: do not allow offset overflow 2018-07-03 11:21:26 +02:00
binfmt_script.c Revert "exec: load_script: don't blindly truncate shebang string" 2019-02-20 10:13:20 +01:00
block_dev.c fs/block_dev: always invalidate cleancache in invalidate_bdev() 2017-05-20 14:27:01 +02:00
buffer.c fs: fix guard_bio_eod to check for real EOD errors 2019-04-27 09:33:49 +02:00
char_dev.c chardev: add additional check for minor range overlap 2019-06-11 12:24:03 +02:00
compat.c
compat_binfmt_elf.c binfmt_elf: compat: avoid unused function warning 2018-02-25 11:03:51 +01:00
compat_ioctl.c fs: compat: Remove warning from COMPATIBLE_IOCTL 2018-04-08 11:51:57 +02:00
coredump.c coredump: Ensure proper size of sparse core files 2017-07-05 14:37:20 +02:00
dax.c dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
dcache.c Hang/soft lockup in d_invalidate with simultaneous calls 2019-04-03 06:23:19 +02:00
dcookies.c
direct-io.c direct-io: Prevent NULL pointer access in submit_page_section 2017-10-18 09:20:42 +02:00
drop_caches.c fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() 2019-03-23 08:44:26 +01:00
eventfd.c
eventpoll.c fs/epoll: drop ovflist branch prediction 2019-02-20 10:13:14 +01:00
exec.c mm: replace get_user_pages() write/force parameters with gup_flags 2018-12-17 21:55:16 +01:00
fcntl.c fs/fcntl: f_setown, avoid undefined behaviour 2018-01-31 12:06:11 +01:00
fhandle.c fs/coredump: prevent fsuid=0 dumps into user-controlled directories 2016-04-12 09:08:58 -07:00
file.c fs/file.c: initialize init_files.resize_wait 2019-04-27 09:33:49 +02:00
file_table.c
filesystems.c
fs-writeback.c fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount 2019-06-11 12:23:41 +02:00
fs_pin.c
fs_struct.c
inode.c Abort file_remove_privs() for non-reg. files 2019-06-22 08:18:27 +02:00
internal.h
ioctl.c
Kconfig dax: disable pmd mappings 2015-11-16 23:54:45 -08:00
Kconfig.binfmt
libfs.c
locks.c locks: don't check for race with close when setting OFD lock 2018-01-17 09:35:27 +01:00
Makefile
mbcache.c
mount.h mnt: In propgate_umount handle visiting mounts in any order 2017-07-21 07:44:57 +02:00
mpage.c fs: add i_blocksize() 2017-06-14 13:16:24 +02:00
namei.c namei: allow restricted O_CREAT of FIFOs and regular files 2018-12-01 09:46:41 +01:00
namespace.c mount: Prevent MNT_DETACH from disconnecting locked mounts 2018-11-21 09:27:44 +01:00
no-block.c
nsfs.c nsfs: mark dentry with DCACHE_RCUACCESS 2018-02-16 20:09:43 +01:00
open.c access: avoid the RCU grace period for the temporary subjective credentials 2019-08-04 09:35:01 +02:00
pipe.c pipe: cap initial pipe capacity according to pipe-max-size limit 2018-05-26 08:48:51 +02:00
pnode.c mnt: Make propagate_umount less slow for overlapping mount propagation trees 2017-07-21 07:44:58 +02:00
pnode.h mnt: Add a per mount namespace limit on the number of mounts 2017-04-30 05:49:28 +02:00
posix_acl.c tmpfs: clear S_ISGID when setting posix ACLs 2017-01-26 08:23:47 +01:00
proc_namespace.c vfs: show_vfsstat: do not ignore errors from show_devname method 2016-04-12 09:08:55 -07:00
read_write.c fs: stream_open - opener for stream-like files so that read and write can run simultaneously without deadlock 2019-06-11 12:24:13 +02:00
readdir.c
select.c fs/select: add vmalloc fallback for select(2) 2018-01-31 12:06:09 +01:00
seq_file.c Make file credentials available to the seqfile interfaces 2017-08-06 19:19:42 -07:00
signalfd.c
splice.c vfs: fix uninitialized flags in splice_to_pipe() 2017-02-23 17:43:09 +01:00
stack.c
stat.c ufs: restore maintaining ->i_blocks 2017-06-14 13:16:24 +02:00
statfs.c
super.c fs: don't scan the inode cache before SB_BORN is set 2019-02-06 19:43:08 +01:00
sync.c fs/sync.c: make sync_file_range(2) use WB_SYNC_NONE writeback 2015-11-06 17:50:42 -08:00
timerfd.c timerfd: Protect the might cancel mechanism proper 2017-05-08 07:46:01 +02:00
userfaultfd.c coredump: fix race condition between mmget_not_zero()/get_task_mm() and core dumping 2019-06-22 08:18:27 +02:00
utimes.c vfs: move permission checking into notify_change() for utimes(NULL) 2016-10-22 12:26:56 +02:00
xattr.c getxattr: use correct xattr length 2018-09-09 20:04:36 +02:00