This reorganizes how we do the stac/clac instructions in the user access
code. Instead of adding the instructions directly to the same inline
asm that does the actual user level access and exception handling, add
them at a higher level.
This is mainly preparation for the next step, where we will expose an
interface to allow users to mark several accesses together as being user
space accesses, but it does already clean up some code:
- the inlined trivial cases of copy_in_user() now do stac/clac just
once over the accesses: they used to do one pair around the user
space read, and another pair around the write-back.
- the {get,put}_user_ex() macros that are used with the catch/try
handling don't do any stac/clac at all, because that happens in the
try/catch surrounding them.
Other than those two cleanups that happened naturally from the
re-organization, this should not make any difference. Yet.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Iaad8756bc8e95876e2e2a7d7bbd333fc478ab441
(cherry picked from commit 11f1a4b9755f5dbc3e822a96502ebe9b044b14d8)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
The kernel test robot reported a usercopy failure in the new hardened
sanity checks, due to a page-crossing copy of the FPU state into the
task structure.
This happened because the kernel test robot was testing with SLOB, which
doesn't actually do the required book-keeping for slab allocations, and
as a result the hardening code didn't realize that the task struct
allocation was one single allocation - and the sanity checks fail.
Since SLOB doesn't even claim to support hardening (and you really
shouldn't use it), the straightforward solution is to just make the
usercopy hardening code depend on the allocator supporting it.
Reported-by: kernel test robot <xiaolong.ye@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: I37d51f866f873341bf7d5297249899b852e1c6ce
(cherry picked from commit 6040e57658eee6eb1315a26119101ca832d1f854)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This patch adds a new sysfs node (latency_hist) and reports
IO (svc time) latency histograms. Disabled by default, can be
enabled by echoing 0 into latency_hist, stats can be cleared
by writing 2 into latency_hist. This commit fixes the 32 bit
build breakage in the previous commit. Tested on both 32 bit
and 64 bit arm devices.
Bug: 30677035
Change-Id: I9a615a16616d80f87e75676ac4d078a5c429dcf9
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
check_bogus_address() checked for pointer overflow using this expression,
where 'ptr' has type 'const void *':
ptr + n < ptr
Since pointer wraparound is undefined behavior, gcc at -O2 by default
treats it like the following, which would not behave as intended:
(long)n < 0
Fortunately, this doesn't currently happen for kernel code because kernel
code is compiled with -fno-strict-overflow. But the expression should be
fixed anyway to use well-defined integer arithmetic, since it could be
treated differently by different compilers in the future or could be
reported by tools checking for undefined behavior.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I73b13be651cf35c03482f2014bf2c3dd291518ab
(cherry picked from commit 7329a655875a2f4bd6984fe8a7e00a6981e802f3)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLUB allocator to catch any copies that may span objects. Includes a
redzone handling fix discovered by Michael Ellerman.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Reviwed-by: Laura Abbott <labbott@redhat.com>
Change-Id: I52dc6fb3a3492b937d52b5cf9c046bf03dc40a3a
(cherry picked from commit ed18adc1cdd00a5c55a20fbdaed4804660772281)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Under CONFIG_HARDENED_USERCOPY, this adds object size checking to the
SLAB allocator to catch any copies that may span objects.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Change-Id: Ib910a71fdc2ab808e1a45b6d33e9bae1681a1f4a
(cherry picked from commit 04385fc5e8fffed84425d909a783c0f0c587d847)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Enables CONFIG_HARDENED_USERCOPY checks on arm64. As done by KASAN in -next,
renames the low-level functions to __arch_copy_*_user() so a static inline
can do additional work before the copy.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1286cae8e6ffcf12ea54ddd62f1a6d2ce742c8d0
(cherry picked from commit faf5b63e294151d6ac24ca6906d6f221bd3496cd)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Enables CONFIG_HARDENED_USERCOPY checks on arm.
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I03a44ca7a8c56832f15a6a74ac32e9330df3ac3b
(cherry picked from commit dfd45b6103c973bfcea2341d89e36faf947dbc33)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Enables CONFIG_HARDENED_USERCOPY checks on x86. This is done both in
copy_*_user() and __copy_*_user() because copy_*_user() actually calls
down to _copy_*_user() and not __copy_*_user().
Based on code from PaX and grsecurity.
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Change-Id: I260db1d4572bdd2f779200aca99d03a170658440
(cherry picked from commit 5b710f34e194c6b7710f69fdb5d798fdf35b98c1)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This is the start of porting PAX_USERCOPY into the mainline kernel. This
is the first set of features, controlled by CONFIG_HARDENED_USERCOPY. The
work is based on code by PaX Team and Brad Spengler, and an earlier port
from Casey Schaufler. Additional non-slab page tests are from Rik van Riel.
This patch contains the logic for validating several conditions when
performing copy_to_user() and copy_from_user() on the kernel object
being copied to/from:
- address range doesn't wrap around
- address range isn't NULL or zero-allocated (with a non-zero copy size)
- if on the slab allocator:
- object size must be less than or equal to copy size (when check is
implemented in the allocator, which appear in subsequent patches)
- otherwise, object must not span page allocations (excepting Reserved
and CMA ranges)
- if on the stack
- object must not extend before/after the current process stack
- object must be contained by a valid stack frame (when there is
arch/build support for identifying stack frames)
- object must not overlap with kernel text
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Change-Id: Iff3b5f1ddb04acd99ccf9a9046c7797363962b2a
(cherry picked from commit f5509cc18daa7f82bcc553be70df2117c8eedc16)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This creates per-architecture function arch_within_stack_frames() that
should validate if a given object is contained by a kernel stack frame.
Initial implementation is on x86.
This is based on code from PaX.
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1f3b299bb8991d65dcdac6af85d633d4b7776df1
(cherry picked from commit 0f60a8efe4005ab5e65ce000724b04d4ca04a199)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Code such as hardened user copy[1] needs a way to tell if a
page is CMA or not. Add is_migrate_cma_page in a similar way
to is_migrate_isolate_page.
[1]http://article.gmane.org/gmane.linux.kernel.mm/155238
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Change-Id: I1f9aa13d8d063038fa70b93282a836648fbb4f6d
(cherry picked from commit 7c15d9bb8231f998ae7dc0b72415f5215459f7fb)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
When I initially added the unsafe_[get|put]_user() helpers in commit
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched
accesses"), I made the mistake of modeling the interface on our
traditional __[get|put]_user() functions, which return zero on success,
or -EFAULT on failure.
That interface is fairly easy to use, but it's actually fairly nasty for
good code generation, since it essentially forces the caller to check
the error value for each access.
In particular, since the error handling is already internally
implemented with an exception handler, and we already use "asm goto" for
various other things, we could fairly easily make the error cases just
jump directly to an error label instead, and avoid the need for explicit
checking after each operation.
So switch the interface to pass in an error label, rather than checking
the error value in the caller. Best do it now before we start growing
more users (the signal handling code in particular would be a good place
to use the new interface).
So rather than
if (unsafe_get_user(x, ptr))
... handle error ..
the interface is now
unsafe_get_user(x, ptr, label);
where an error during the user mode fetch will now just cause a jump to
'label' in the caller.
Right now the actual _implementation_ of this all still ends up being a
"if (err) goto label", and does not take advantage of any exception
label tricks, but for "unsafe_put_user()" in particular it should be
fairly straightforward to convert to using the exception table model.
Note that "unsafe_get_user()" is much harder to convert to a clever
exception table model, because current versions of gcc do not allow the
use of "asm goto" (for the exception) with output values (for the actual
value to be fetched). But that is hopefully not a limitation in the
long term.
[ Also note that it might be a good idea to switch unsafe_get_user() to
actually _return_ the value it fetches from user space, but this
commit only changes the error handling semantics ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ib905a84a04d46984320f6fd1056da4d72f3d6b53
(cherry picked from commit 1bd4403d86a1c06cb6cc9ac87664a0c9d3413d51)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
As Kees Cook notes in the ARM counterpart of this patch [0]:
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext.
In particular, Kees is referring to the HARDENED_USERCOPY patch set [1],
which rejects attempts to call copy_to_user() on kernel ranges containing
executable code, but does allow access to the .rodata segment. Regardless
of whether one may or may not agree with the distinction, it makes sense
for _etext to have the same meaning across architectures.
So let's put _etext where it belongs, between .text and .rodata, and fix
up existing references to use __init_begin instead, which unlike _end_rodata
includes the exception and notes sections as well.
The _etext references in kaslr.c are left untouched, since its references
to [_stext, _etext) are meant to capture potential jump instruction targets,
and so disregarding .rodata is actually an improvement here.
[0] http://article.gmane.org/gmane.linux.kernel/2245084
[1] http://thread.gmane.org/gmane.linux.kernel.hardened.devel/2502
Reported-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Change-Id: I8f6582525217b9ca324f6a382ea52d30ce1d0dbd
(cherry picked from commit 9fdc14c55cd6579d619ccd9d40982e0805e62b6d)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
The _etext position is defined to be the end of the kernel text code,
and should not include any part of the data segments. This interferes
with things that might check memory ranges and expect executable code
up to _etext. Just to be conservative, leave the kernel resource as
it was, using __init_begin instead of _etext as the end mark.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Change-Id: Ida514d1359dbe6f782f562ce29b4ba09ae72bfc0
(cherry picked from commit 14c4a533e0996f95a0a64dfd0b6252d788cebc74)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
This converts the generic user string functions to use the batched user
access functions.
It makes a big difference on Skylake, which is the first x86
microarchitecture to implement SMAP. The STAC/CLAC instructions are not
very fast, and doing them for each access inside the loop that copies
strings from user space (which is what the pathname handling does for
every pathname the kernel uses, for example) is very inefficient.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic39a686b4bb1ad9cd16ad0887bb669beed6fe8aa
(cherry picked from commit 9fd4470ff4974c41b1db43c3b355b9085af9c12a)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
The naming is meant to discourage random use: the helper functions are
not really any more "unsafe" than the traditional double-underscore
functions (which need the address range checking), but they do need even
more infrastructure around them, and should not be used willy-nilly.
In addition to checking the access range, these user access functions
require that you wrap the user access with a "user_acess_{begin,end}()"
around it.
That allows architectures that implement kernel user access control
(x86: SMAP, arm64: PAN) to do the user access control in the wrapping
user_access_begin/end part, and then batch up the actual user space
accesses using the new interfaces.
The main (and hopefully only) use for these are for core generic access
helpers, initially just the generic user string functions
(strnlen_user() and strncpy_from_user()).
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ic64efea41f97171bdbdabe3e531489aebd9b6fac
(cherry picked from commit 5b24a7a2aa2040c8c50c3b71122901d01661ff78)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4)
When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()
Then it attempts to copy user data into this fresh skb.
If the copy fails, we undo the work and remove the fresh skb.
Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)
Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.
This bug was found by Marco Grassi thanks to syzkaller.
Fixes: 6859d49475 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I58bb02d6e4e399612e8580b9e02d11e661df82f5
Bug: 31183296
Don't need to set SECCOMP_FILTER explicitly since CONFIG_SECCOMP=y will
select that config anyway.
Fixes: a49dcf2e74 ("ANDROID: base-cfg: enable SECCOMP config")
Change-Id: Iff18ed4d2db5a55b9f9480d5ecbeef7b818b3837
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit 8148a73c9901a8794a50f950083c00ccf97d43b3)
If /proc/<PID>/environ gets read before the envp[] array is fully set up
in create_{aout,elf,elf_fdpic,flat}_tables(), we might end up trying to
read more bytes than are actually written, as env_start will already be
set but env_end will still be zero, making the range calculation
underflow, allowing to read beyond the end of what has been written.
Fix this as it is done for /proc/<PID>/cmdline by testing env_end for
zero. It is, apparently, intentionally set last in create_*_tables().
This bug was found by the PaX size_overflow plugin that detected the
arithmetic underflow of 'this_len = env_end - (env_start + src)' when
env_end is still zero.
The expected consequence is that userland trying to access
/proc/<PID>/environ of a not yet fully set up process may get
inconsistent data as we're in the middle of copying in the environment
variables.
Fixes: https://forums.grsecurity.net/viewtopic.php?f=3&t=4363
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=116461
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Emese Revfy <re.emese@gmail.com>
Cc: Pax Team <pageexec@freemail.hu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Mateusz Guzik <mguzik@redhat.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Jarod Wilson <jarod@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Change-Id: Ia2f58d48c15478ed4b6e237b63e704c70ff21e96
Bug: 30951939
(cherry picked from commit 210bd104c6acd31c3c6b8b075b3f12d4a9f6b60d)
We have to unlock before returning -ENOMEM.
Fixes: 8dfbcc4351a0 ('[media] xc2028: avoid use after free')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@osg.samsung.com>
Change-Id: I7b6ba9fde5c6e29467e6de23d398af2fe56e2547
Bug: 30946097
(cherry picked from commit 77da160530dd1dc94f6ae15a981f24e5f0021e84)
I got a KASAN report of use-after-free:
==================================================================
BUG: KASAN: use-after-free in klist_iter_exit+0x61/0x70 at addr ffff8800b6581508
Read of size 8 by task trinity-c1/315
=============================================================================
BUG kmalloc-32 (Not tainted): kasan: bad access detected
-----------------------------------------------------------------------------
Disabling lock debugging due to kernel taint
INFO: Allocated in disk_seqf_start+0x66/0x110 age=144 cpu=1 pid=315
___slab_alloc+0x4f1/0x520
__slab_alloc.isra.58+0x56/0x80
kmem_cache_alloc_trace+0x260/0x2a0
disk_seqf_start+0x66/0x110
traverse+0x176/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
INFO: Freed in disk_seqf_stop+0x42/0x50 age=160 cpu=1 pid=315
__slab_free+0x17a/0x2c0
kfree+0x20a/0x220
disk_seqf_stop+0x42/0x50
traverse+0x3b5/0x860
seq_read+0x7e3/0x11a0
proc_reg_read+0xbc/0x180
do_loop_readv_writev+0x134/0x210
do_readv_writev+0x565/0x660
vfs_readv+0x67/0xa0
do_preadv+0x126/0x170
SyS_preadv+0xc/0x10
do_syscall_64+0x1a1/0x460
return_from_SYSCALL_64+0x0/0x6a
CPU: 1 PID: 315 Comm: trinity-c1 Tainted: G B 4.7.0+ #62
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
ffffea0002d96000 ffff880119b9f918 ffffffff81d6ce81 ffff88011a804480
ffff8800b6581500 ffff880119b9f948 ffffffff8146c7bd ffff88011a804480
ffffea0002d96000 ffff8800b6581500 fffffffffffffff4 ffff880119b9f970
Call Trace:
[<ffffffff81d6ce81>] dump_stack+0x65/0x84
[<ffffffff8146c7bd>] print_trailer+0x10d/0x1a0
[<ffffffff814704ff>] object_err+0x2f/0x40
[<ffffffff814754d1>] kasan_report_error+0x221/0x520
[<ffffffff8147590e>] __asan_report_load8_noabort+0x3e/0x40
[<ffffffff83888161>] klist_iter_exit+0x61/0x70
[<ffffffff82404389>] class_dev_iter_exit+0x9/0x10
[<ffffffff81d2e8ea>] disk_seqf_stop+0x3a/0x50
[<ffffffff8151f812>] seq_read+0x4b2/0x11a0
[<ffffffff815f8fdc>] proc_reg_read+0xbc/0x180
[<ffffffff814b24e4>] do_loop_readv_writev+0x134/0x210
[<ffffffff814b4c45>] do_readv_writev+0x565/0x660
[<ffffffff814b8a17>] vfs_readv+0x67/0xa0
[<ffffffff814b8de6>] do_preadv+0x126/0x170
[<ffffffff814b92ec>] SyS_preadv+0xc/0x10
This problem can occur in the following situation:
open()
- pread()
- .seq_start()
- iter = kmalloc() // succeeds
- seqf->private = iter
- .seq_stop()
- kfree(seqf->private)
- pread()
- .seq_start()
- iter = kmalloc() // fails
- .seq_stop()
- class_dev_iter_exit(seqf->private) // boom! old pointer
As the comment in disk_seqf_stop() says, stop is called even if start
failed, so we need to reinitialise the private pointer to NULL when seq
iteration stops.
An alternative would be to set the private pointer to NULL when the
kmalloc() in disk_seqf_start() fails.
Cc: stable@vger.kernel.org
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>
Change-Id: I07b33f4b38341f60a37806cdd45b0a0c3ab4d84d
Bug: 30942273
Enable following seccomp configs
CONFIG_SECCOMP=y
CONFIG_SECCOMP_FILTER=y
Otherwise we will get mediacode error like this on Android N:
E /system/bin/mediaextractor: libminijail: prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER): Invalid argument
Change-Id: I2477b6a2cfdded5c0ebf6ffbb6150b0e5fe2ba12
Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
x86_64:allmodconfig fails to build with the following error.
ERROR: "rcu_sync_lockdep_assert" [kernel/locking/locktorture.ko] undefined!
Introduced by commit 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem:
Optimize readers and reduce global impact"). The applied upstream version
exports the missing symbol, so let's do the same.
Change-Id: If4e516715c3415fe8c82090f287174857561550d
Fixes: 3228c5eb7a ("RFC: FROMLIST: locking/percpu-rwsem: Optimize ...")
Signed-off-by: Guenter Roeck <groeck@chromium.org>
commit 8835ba4a39cf53f705417b3b3a94eb067673f2c9 upstream.
An attack has become available which pretends to be a quirky
device circumventing normal sanity checks and crashes the kernel
by an insufficient number of interfaces. This patch adds a check
to the code path for quirky devices.
BUG: 28242610
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I9a5f7f3c704b65e866335054f470451fcfae9d1c
commit 4ec0ef3a82125efc36173062a50624550a900ae0 upstream.
The iowarrior driver expects at least one valid endpoint. If given
malicious descriptors that specify 0 for the number of endpoints,
it will crash in the probe function. Ensure there is at least
one endpoint on the interface before using it.
The full report of this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/87
BUG: 28242610
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: If5161c23928e9ef77cb3359cba9b36622b1908df
commit 0b818e3956fc1ad976bee791eadcbb3b5fec5bfd upstream.
Attacks that trick drivers into passing a NULL pointer
to usb_driver_claim_interface() using forged descriptors are
known. This thwarts them by sanity checking.
BUG: 28242610
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: Ib43ec5edb156985a9db941785a313f6801df092a
commit 4e9a0b05257f29cf4b75f3209243ed71614d062e upstream.
An attack using the lack of sanity checking in probe is known. This
patch checks for the existence of a second port.
CVE-2016-3136
BUG: 28242610
Signed-off-by: Oliver Neukum <ONeukum@suse.com>
[johan: add error message ]
Signed-off-by: Johan Hovold <johan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I284ad648c2087c34a098d67e0cc6d948a568413c
The powermate driver expects at least one valid USB endpoint in its
probe function. If given malicious descriptors that specify 0 for
the number of endpoints, it will crash. Validate the number of
endpoints on the interface before using them.
The full report for this issue can be found here:
http://seclists.org/bugtraq/2016/Mar/85
BUG: 28242610
Reported-by: Ralf Spenneberg <ralf@spenneberg.net>
Signed-off-by: Josh Boyer <jwboyer@fedoraproject.org>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I1cb956a35f3bba73324240d5bd0a029f49d3c456
(cherry picked from commit 083ae308280d13d187512b9babe3454342a7987e)
The per-socket rate limit for 'challenge acks' was introduced in the
context of limiting ack loops:
commit f2b2c582e8 ("tcp: mitigate ACK loops for connections as tcp_sock")
And I think it can be extended to rate limit all 'challenge acks' on a
per-socket basis.
Since we have the global tcp_challenge_ack_limit, this patch allows for
tcp_challenge_ack_limit to be set to a large value and effectively rely on
the per-socket limit, or set tcp_challenge_ack_limit to a lower value and
still prevents a single connections from consuming the entire challenge ack
quota.
It further moves in the direction of eliminating the global limit at some
point, as Eric Dumazet has suggested. This a follow-up to:
Subject: tcp: make challenge acks less predictable
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I622d5ae96e9387e775a0196c892d8d0e1a5564a7
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
cgroup_threadgroup_rwsem is acquired in read mode during process exit
and fork. It is also grabbed in write mode during
__cgroups_proc_write(). I've recently run into a scenario with lots
of memory pressure and OOM and I am beginning to see
systemd
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
percpu_down_write+0x114/0x170
__cgroup_procs_write.isra.12+0xb8/0x3c0
cgroup_file_write+0x74/0x1a0
kernfs_fop_write+0x188/0x200
__vfs_write+0x6c/0xe0
vfs_write+0xc0/0x230
SyS_write+0x6c/0x110
system_call+0x38/0xb4
This thread is waiting on the reader of cgroup_threadgroup_rwsem to
exit. The reader itself is under memory pressure and has gone into
reclaim after fork. There are times the reader also ends up waiting on
oom_lock as well.
__switch_to+0x1f8/0x350
__schedule+0x30c/0x990
schedule+0x48/0xc0
jbd2_log_wait_commit+0xd4/0x180
ext4_evict_inode+0x88/0x5c0
evict+0xf8/0x2a0
dispose_list+0x50/0x80
prune_icache_sb+0x6c/0x90
super_cache_scan+0x190/0x210
shrink_slab.part.15+0x22c/0x4c0
shrink_zone+0x288/0x3c0
do_try_to_free_pages+0x1dc/0x590
try_to_free_pages+0xdc/0x260
__alloc_pages_nodemask+0x72c/0xc90
alloc_pages_current+0xb4/0x1a0
page_table_alloc+0xc0/0x170
__pte_alloc+0x58/0x1f0
copy_page_range+0x4ec/0x950
copy_process.isra.5+0x15a0/0x1870
_do_fork+0xa8/0x4b0
ppc_clone+0x8/0xc
In the meanwhile, all processes exiting/forking are blocked almost
stalling the system.
This patch moves the threadgroup_change_begin from before
cgroup_fork() to just before cgroup_canfork(). There is no nee to
worry about threadgroup changes till the task is actually added to the
threadgroup. This avoids having to call reclaim with
cgroup_threadgroup_rwsem held.
tj: Subject and description edits.
Signed-off-by: Balbir Singh <bsingharora@gmail.com>
Acked-by: Zefan Li <lizefan@huawei.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: stable@vger.kernel.org # v4.2+
Signed-off-by: Tejun Heo <tj@kernel.org>
[jstultz: Cherry-picked from:
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git 568ac888215c7f]
Change-Id: Ie8ece84fb613cf6a7b08cea1468473a8df2b9661
Signed-off-by: John Stultz <john.stultz@linaro.org>
The current percpu-rwsem read side is entirely free of serializing insns
at the cost of having a synchronize_sched() in the write path.
The latency of the synchronize_sched() is too high for cgroups. The
commit 1ed1328792 talks about the write path being a fairly cold path
but this is not the case for Android which moves task to the foreground
cgroup and back around binder IPC calls from foreground processes to
background processes, so it is significantly hotter than human initiated
operations.
Switch cgroup_threadgroup_rwsem into the slow mode for now to avoid the
problem, hopefully it should not be that slow after another commit
80127a39681b ("locking/percpu-rwsem: Optimize readers and reduce global
impact").
We could just add rcu_sync_enter() into cgroup_init() but we do not want
another synchronize_sched() at boot time, so this patch adds the new helper
which doesn't block but currently can only be called before the first use.
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Reported-by: Dmitry Shmidt <dimitrysh@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
[jstultz: backported to 4.4]
Change-Id: I34aa9c394d3052779b56976693e96d861bd255f2
Mailing-list-URL: https://lkml.org/lkml/2016/8/11/557
Signed-off-by: John Stultz <john.stultz@linaro.org>
Currently the percpu-rwsem switches to (global) atomic ops while a
writer is waiting; which could be quite a while and slows down
releasing the readers.
This patch cures this problem by ordering the reader-state vs
reader-count (see the comments in __percpu_down_read() and
percpu_down_write()). This changes a global atomic op into a full
memory barrier, which doesn't have the global cacheline contention.
This also enables using the percpu-rwsem with rcu_sync disabled in order
to bias the implementation differently, reducing the writer latency by
adding some cost to readers.
Mailing-list-URL: https://lkml.org/lkml/2016/8/9/181
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[jstultz: Backported to 4.4]
Change-Id: I8ea04b4dca2ec36f1c2469eccafde1423490572f
Signed-off-by: John Stultz <john.stultz@linaro.org>
ping_v6_sendmsg does not set flowi6_oif in response to
sin6_scope_id or sk_bound_dev_if, so it is not possible to use
these APIs to ping an IPv6 address on a different interface.
Instead, it sets flowi6_iif, which is incorrect but harmless.
Stop setting flowi6_iif, and support various ways of setting oif
in the same priority order used by udpv6_sendmsg.
[Backport of net 5e457896986e16c440c97bb94b9ccd95dd157292]
Bug: 29370996
Change-Id: Ibe1b9434c00ed96f1e30acb110734c6570b087b8
Tested: https://android-review.googlesource.com/#/c/254470/
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
IPv6 ping socket error handler doesn't correctly convert the new 32 bit
mtu to host endianness before using.
[Cherry-pick of net dcb94b88c09ce82a80e188d49bcffdc83ba215a6]
Bug: 29370996
Change-Id: Iea0ca79f16c2a1366d82b3b0a3097093d18da8b7
Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 6d0bfe2261 ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Exports the device mapper callbacks of linear and dm-verity-target
methods.
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: I0358be0615c431dce3cc78575aaac4ccfe3aacd7
This patch adds a new sysfs node (latency_hist) and reports IO
(svc time) latency histograms. Disabled by default, can be enabled
by echoing 0 into latency_hist, stats can be cleared by writing 2
into latency_hist.
Bug: 30677035
Change-Id: I625938135ea33e6e87cf6af1fc7edc136d8b4b32
Signed-off-by: Mohan Srinivasan <srmohan@google.com>
(cherry picked from commit a5527dda344fff0514b7989ef7a755729769daa1)
The unix_dgram_sendmsg routine use the following test
if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {
to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.
Fixes: 7d267278a9 ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: I4ebef6a390df3487903b166b837e34c653e01cb2
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit af368027a49a751d6ff4ee9e3f9961f35bb4fede)
ALSA timer ioctls have an open race and this may lead to a
use-after-free of timer instance object. A simplistic fix is to make
each ioctl exclusive. We have already tread_sem for controlling the
tread, and extend this as a global mutex to be applied to each ioctl.
The downside is, of course, the worse concurrency. But these ioctls
aren't to be parallel accessible, in anyway, so it should be fine to
serialize there.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Change-Id: I1ac52f1cba5e7408fd88c8fc1c30ca2e83967ebb
Bug: 28694392
(cherry picked from commit 75ff39ccc1bd5d3c455b6822ab09e533c551f758)
Yue Cao claims that current host rate limiting of challenge ACKS
(RFC 5961) could leak enough information to allow a patient attacker
to hijack TCP sessions. He will soon provide details in an academic
paper.
This patch increases the default limit from 100 to 1000, and adds
some randomization so that the attacker can no longer hijack
sessions without spending a considerable amount of probes.
Based on initial analysis and patch from Linus.
Note that we also have per socket rate limiting, so it is tempting
to remove the host limit in the future.
v2: randomize the count of challenge acks per second, not the period.
Fixes: 282f23c6ee ("tcp: implement RFC 5961 3.2")
Reported-by: Yue Cao <ycao009@ucr.edu>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change-Id: Ib46ba66f5e4a5a7c81bfccd7b0aa83c3d9e1b3bb
Bug: 30809774
There may be a race condition if f_fs calls unregister_gadget_item in
ffs_closed() when unregister_gadget is called by UDC store at the same time.
this leads to a kernel NULL pointer dereference:
[ 310.644928] Unable to handle kernel NULL pointer dereference at virtual address 00000004
[ 310.645053] init: Service 'adbd' is being killed...
[ 310.658938] pgd = c9528000
[ 310.662515] [00000004] *pgd=19451831, *pte=00000000, *ppte=00000000
[ 310.669702] Internal error: Oops: 817 [#1] PREEMPT SMP ARM
[ 310.675211] Modules linked in:
[ 310.678294] CPU: 0 PID: 1537 Comm: ->transport Not tainted 4.1.15-03725-g793404c #2
[ 310.685958] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 310.692493] task: c8e24200 ti: c945e000 task.ti: c945e000
[ 310.697911] PC is at usb_gadget_unregister_driver+0xb4/0xd0
[ 310.703502] LR is at __mutex_lock_slowpath+0x10c/0x16c
[ 310.708648] pc : [<c075efc0>] lr : [<c0bfb0bc>] psr: 600f0113
<snip..>
[ 311.565585] [<c075efc0>] (usb_gadget_unregister_driver) from [<c075e2b8>] (unregister_gadget_item+0x1c/0x34)
[ 311.575426] [<c075e2b8>] (unregister_gadget_item) from [<c076fcc8>] (ffs_closed+0x8c/0x9c)
[ 311.583702] [<c076fcc8>] (ffs_closed) from [<c07736b8>] (ffs_data_reset+0xc/0xa0)
[ 311.591194] [<c07736b8>] (ffs_data_reset) from [<c07738ac>] (ffs_data_closed+0x90/0xd0)
[ 311.599208] [<c07738ac>] (ffs_data_closed) from [<c07738f8>] (ffs_ep0_release+0xc/0x14)
[ 311.607224] [<c07738f8>] (ffs_ep0_release) from [<c023e030>] (__fput+0x80/0x1d0)
[ 311.614635] [<c023e030>] (__fput) from [<c014e688>] (task_work_run+0xb0/0xe8)
[ 311.621788] [<c014e688>] (task_work_run) from [<c010afdc>] (do_work_pending+0x7c/0xa4)
[ 311.629718] [<c010afdc>] (do_work_pending) from [<c010770c>] (work_pending+0xc/0x20)
for functions using functionFS, i.e. android adbd will close /dev/usb-ffs/adb/ep0
when usb IO thread fails, but switch adb from on to off also triggers write
"none" > UDC. These 2 operations both call unregister_gadget, which will lead
to the panic above.
add a mutex before calling unregister_gadget for api used in f_fs.
Signed-off-by: Winter Wang <wente.wang@nxp.com>
Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com>
v4.4 introduced changes to the callbacks used for
dm-linear and dm-verity-target targets. Move to those headers
in dm-android-verity.
Verified on hikey while having
BOARD_USES_RECOVERY_AS_BOOT := true
BOARD_BUILD_SYSTEM_ROOT_IMAGE := true
BUG: 27339727
Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
Change-Id: Ic64950c3b55f0a6eaa570bcedc2ace83bbf3005e
(cherry picked from commit 6a480a7842545ec520a91730209ec0bae41694c1)
First of all, trying to open them r/w is idiocy; it's guaranteed to fail.
Moreover, assigning ->f_pos and assuming that everything will work is
blatantly broken - try that with e.g. tmpfs as underlying layer and watch
the fireworks. There may be a non-trivial amount of state associated with
current IO position, well beyond the numeric offset. Using the single
struct file associated with underlying inode is really not a good idea;
we ought to open one for each ecryptfs directory struct file.
Additionally, file_operations both for directories and non-directories are
full of pointless methods; non-directories should *not* have ->iterate(),
directories should not have ->flush(), ->fasync() and ->splice_read().
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Change-Id: I4813ce803f270fdd364758ce1dc108b76eab226e
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit f0fe970df3838c202ef6c07a4c2b36838ef0a88b)
There are legitimate reasons to disallow mmap on certain files, notably
in sysfs or procfs. We shouldn't emulate mmap support on file systems
that don't offer support natively.
CVE-2016-1583
Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Cc: stable@vger.kernel.org
[tyhicks: clean up f_op check by using ecryptfs_file_to_lower()]
Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
Change-Id: I66e3670771630a25b0608f10019d1584e9ce73a6
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>