[ upstream commit 290af86629b25ffd1ed6232c4e9107da031705cb ]
The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
A quote from goolge project zero blog:
"At this point, it would normally be necessary to locate gadgets in
the host kernel code that can be used to actually leak data by reading
from an attacker-controlled location, shifting and masking the result
appropriately and then using the result of that as offset to an
attacker-controlled address for a load. But piecing gadgets together
and figuring out which ones work in a speculation context seems annoying.
So instead, we decided to use the eBPF interpreter, which is built into
the host kernel - while there is no legitimate way to invoke it from inside
a VM, the presence of the code in the host kernel's text section is sufficient
to make it usable for the attack, just like with ordinary ROP gadgets."
To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
option that removes interpreter from the kernel in favor of JIT-only mode.
So far eBPF JIT is supported by:
x64, arm64, arm32, sparc64, s390, powerpc64, mips64
The start of JITed program is randomized and code page is marked as read-only.
In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
v2->v3:
- move __bpf_prog_ret0 under ifdef (Daniel)
v1->v2:
- fix init order, test_bpf and cBPF (Daniel's feedback)
- fix offloaded bpf (Jakub's feedback)
- add 'return 0' dummy in case something can invoke prog->bpf_func
- retarget bpf tree. For bpf-next the patch would need one extra hunk.
It will be sent when the trees are merged back to net-next
Considered doing:
int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
but it seems better to land the patch as-is and in bpf-next remove
bpf_jit_enable global variable from all JITs, consolidate in one place
and remove this jit_init() function.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ upstream commit 90caccdd8cc0215705f18b92771b449b01e2474a ]
- bpf prog_array just like all other types of bpf array accepts 32-bit index.
Clarify that in the comment.
- fix x64 JIT of bpf_tail_call which was incorrectly loading 8 instead of 4 bytes
- tighten corresponding check in the interpreter to stay consistent
The JIT bug can be triggered after introduction of BPF_F_NUMA_NODE flag
in commit 96eabe7a40aa in 4.14. Before that the map_flags would stay zero and
though JIT code is wrong it will check bounds correctly.
Hence two fixes tags. All other JITs don't have this problem.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Fixes: 96eabe7a40aa ("bpf: Allow selecting numa node during map creation")
Fixes: b52f00e6a7 ("x86: bpf_jit: implement bpf_tail_call() helper")
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit c131187db2d3fa2f8bf32fdf4e9a4ef805168467 ]
when the verifier detects that register contains a runtime constant
and it's compared with another constant it will prune exploration
of the branch that is guaranteed not to be taken at runtime.
This is all correct, but malicious program may be constructed
in such a way that it always has a constant comparison and
the other branch is never taken under any conditions.
In this case such path through the program will not be explored
by the verifier. It won't be taken at run-time either, but since
all instructions are JITed the malicious program may cause JITs
to complain about using reserved fields, etc.
To fix the issue we have to track the instructions explored by
the verifier and sanitize instructions that are dead at run time
with NOPs. We cannot reject such dead code, since llvm generates
it for valid C code, since it doesn't do as much data flow
analysis as the verifier does.
Fixes: 17a5267067 ("bpf: verifier (add verifier core)")
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Boosted RT tasks can be deboosted quickly, this makes boost usless
for RT tasks and causes lots of glitching. Use timers to prevent
de-boost too soon and wait for long enough such that next enqueue
happens after a threshold.
While this can be solved in the governor, there are following
advantages:
- The approach used is governor-independent
- Reduces boost group lock contention for frequently sleepers/wakers
Note:
Fixed build breakage due to schedfreq dependency which isn't used
for RT anymore.
Bug: 30210506
Change-Id: I428a2695cac06cc3458cdde0dea72315e4e66c00
Signed-off-by: Joel Fernandes <joelaf@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpxo0gACgkQONu9yGCS
aT78EhAAs+LNVHZzqBuwgiuB/7Fsx5RvnzetpCstjWQnJHUCPjU9iCc4oTgpTGeC
jLZeQeUlwAguL87+GLEhKEflSqKd5O/3VozLd4Xw7tGTUSqkV/0yUbmKuXzMwqTP
ZlUDtM8eK4nfQ9ci/9yF6D3jMcpboVzFSlfu+HYLFxNUhr3NOf8jpPrMqDqTWEbP
ncT4habS87sQSDtZLFVsGLq2rtOg91NkiXSJEwyDeioTwR9kUju5eJGhF1yhmJZh
GEBOddmpD+RndL/Q0SN9poThWEFtWHwaBKeittHYzwnn5J7+ov9pjMmXkvGf8Slc
pWVx7WADcPkmyx18x53szI05uR0VycPB8YhwQW28yB9+4LabPzLSz9KNVDNcs7Tf
1GpP7Au0YVJBMjbUuJfZVe0MgSM6pRsw+I/etz47O27zsm/HEoRqHNwTk7T6B6jd
W0vjw2HohpQUxVa6AAgqVqCgzw4ALCmlIcepaOxtU6l3XEWLrMe8OwwUl6pQY+Fr
8dLk87SnFMgWVMyQf6M4Bse5EGHwfVEvA8z82HOlGNbynycexYDFWWxI/0P4CjPx
VCRg3XZF4OyoRWmy/9NKgpeQBXS+fiIGIGp0opjeMfpw6t6IJqeeFik6DzXZ3Hhe
FHYlCCtc45TAr3kJwSgfIoS7PcorBK93MDoEV58yJa6kcu0OOFQ=
=vx5p
-----END PGP SIGNATURE-----
Merge 4.4.114 into android-4.4
Changes in 4.4.114
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
usbip: prevent vhci_hcd driver from leaking a socket pointer address
usbip: Fix implicit fallthrough warning
usbip: Fix potential format overflow in userspace tools
x86/microcode/intel: Fix BDW late-loading revision check
x86/cpu/intel: Introduce macros for Intel family numbers
x86/retpoline: Fill RSB on context switch for affected CPUs
sched/deadline: Use the revised wakeup rule for suspending constrained dl tasks
can: af_can: can_rcv(): replace WARN_ONCE by pr_warn_once
can: af_can: canfd_rcv(): replace WARN_ONCE by pr_warn_once
PM / sleep: declare __tracedata symbols as char[] rather than char
time: Avoid undefined behaviour in ktime_add_safe()
timers: Plug locking race vs. timer migration
Prevent timer value 0 for MWAITX
drivers: base: cacheinfo: fix x86 with CONFIG_OF enabled
drivers: base: cacheinfo: fix boot error message when acpi is enabled
PCI: layerscape: Add "fsl,ls2085a-pcie" compatible ID
PCI: layerscape: Fix MSG TLP drop setting
mmc: sdhci-of-esdhc: add/remove some quirks according to vendor version
fs/select: add vmalloc fallback for select(2)
mm/mmap.c: do not blow on PROT_NONE MAP_FIXED holes in the stack
hwpoison, memcg: forcibly uncharge LRU pages
cma: fix calculation of aligned offset
mm, page_alloc: fix potential false positive in __zone_watermark_ok
ipc: msg, make msgrcv work with LONG_MIN
x86/ioapic: Fix incorrect pointers in ioapic_setup_resources()
ACPI / processor: Avoid reserving IO regions too early
ACPI / scan: Prefer devices without _HID/_CID for _ADR matching
ACPICA: Namespace: fix operand cache leak
netfilter: x_tables: speed up jump target validation
netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel
netfilter: nf_dup_ipv6: set again FLOWI_FLAG_KNOWN_NH at flowi6_flags
netfilter: nf_ct_expect: remove the redundant slash when policy name is empty
netfilter: nfnetlink_queue: reject verdict request from different portid
netfilter: restart search if moved to other chain
netfilter: nf_conntrack_sip: extend request line validation
netfilter: use fwmark_reflect in nf_send_reset
netfilter: fix IS_ERR_VALUE usage
netfilter: nfnetlink_cthelper: Add missing permission checks
netfilter: xt_osf: Add missing permission checks
ext2: Don't clear SGID when inheriting ACLs
reiserfs: fix race in prealloc discard
reiserfs: don't preallocate blocks for extended attributes
reiserfs: Don't clear SGID when inheriting ACLs
fs/fcntl: f_setown, avoid undefined behaviour
scsi: libiscsi: fix shifting of DID_REQUEUE host byte
Revert "module: Add retpoline tag to VERMAGIC"
Input: trackpoint - force 3 buttons if 0 button is reported
usb: usbip: Fix possible deadlocks reported by lockdep
usbip: fix stub_rx: get_pipe() to validate endpoint number
usbip: fix stub_rx: harden CMD_SUBMIT path to handle malicious input
usbip: prevent leaking socket pointer address in messages
um: link vmlinux with -no-pie
vsyscall: Fix permissions for emulate mode with KAISER/PTI
eventpoll.h: add missing epoll event masks
x86/microcode/intel: Extend BDW late-loading further with LLC size check
hrtimer: Reset hrtimer cpu base proper on CPU hotplug
dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
ipv6: Fix getsockopt() for sockets with default IPV6_AUTOFLOWLABEL
ipv6: fix udpv6 sendmsg crash caused by too small MTU
ipv6: ip6_make_skb() needs to clear cork.base.dst
lan78xx: Fix failure in USB Full Speed
net: igmp: fix source address check for IGMPv3 reports
tcp: __tcp_hdrlen() helper
net: qdisc_pkt_len_init() should be more robust
pppoe: take ->needed_headroom of lower device into account on xmit
r8169: fix memory corruption on retrieval of hardware statistics.
sctp: do not allow the v4 socket to bind a v4mapped v6 address
sctp: return error if the asoc has been peeled off in sctp_wait_for_sndbuf
vmxnet3: repair memory leak
net: Allow neigh contructor functions ability to modify the primary_key
ipv4: Make neigh lookup keys for loopback/point-to-point devices be INADDR_ANY
flow_dissector: properly cap thoff field
net: tcp: close sock if net namespace is exiting
nfsd: auth: Fix gid sorting when rootsquash enabled
Linux 4.4.114
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit d5421ea43d30701e03cadc56a38854c36a8b4433 upstream.
The hrtimer interrupt code contains a hang detection and mitigation
mechanism, which prevents that a long delayed hrtimer interrupt causes a
continous retriggering of interrupts which prevent the system from making
progress. If a hang is detected then the timer hardware is programmed with
a certain delay into the future and a flag is set in the hrtimer cpu base
which prevents newly enqueued timers from reprogramming the timer hardware
prior to the chosen delay. The subsequent hrtimer interrupt after the delay
clears the flag and resumes normal operation.
If such a hang happens in the last hrtimer interrupt before a CPU is
unplugged then the hang_detected flag is set and stays that way when the
CPU is plugged in again. At that point the timer hardware is not armed and
it cannot be armed because the hang_detected flag is still active, so
nothing clears that flag. As a consequence the CPU does not receive hrtimer
interrupts and no timers expire on that CPU which results in RCU stalls and
other malfunctions.
Clear the flag along with some other less critical members of the hrtimer
cpu base to ensure starting from a clean state when a CPU is plugged in.
Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
root cause of that hard to reproduce heisenbug. Once understood it's
trivial and certainly justifies a brown paperbag.
Fixes: 41d2e49493 ("hrtimer: Tune hrtimer_interrupt hang logic")
Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b831275a3553c32091222ac619cfddd73a5553fb upstream.
Linus noticed that lock_timer_base() lacks a READ_ONCE() for accessing the
timer flags. As a consequence the compiler is allowed to reload the flags
between the initial check for TIMER_MIGRATION and the following timer base
computation and the spin lock of the base.
While this has not been observed (yet), we need to make sure that it never
happens.
Fixes: 0eeda71bc3 ("timer: Replace timer base by a cpu index")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610241711220.4983@nanos
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Mike Galbraith <mgalbraith@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3effcb4247e74a51f5d8b775a1ee4abf87cc089a upstream.
We have been facing some problems with self-suspending constrained
deadline tasks. The main reason is that the original CBS was not
designed for such sort of tasks.
One problem reported by Xunlei Pang takes place when a task
suspends, and then is awakened before the deadline, but so close
to the deadline that its remaining runtime can cause the task
to have an absolute density higher than allowed. In such situation,
the original CBS assumes that the task is facing an early activation,
and so it replenishes the task and set another deadline, one deadline
in the future. This rule works fine for implicit deadline tasks.
Moreover, it allows the system to adapt the period of a task in which
the external event source suffered from a clock drift.
However, this opens the window for bandwidth leakage for constrained
deadline tasks. For instance, a task with the following parameters:
runtime = 5 ms
deadline = 7 ms
[density] = 5 / 7 = 0.71
period = 1000 ms
If the task runs for 1 ms, and then suspends for another 1ms,
it will be awakened with the following parameters:
remaining runtime = 4
laxity = 5
presenting a absolute density of 4 / 5 = 0.80.
In this case, the original CBS would assume the task had an early
wakeup. Then, CBS will reset the runtime, and the absolute deadline will
be postponed by one relative deadline, allowing the task to run.
The problem is that, if the task runs this pattern forever, it will keep
receiving bandwidth, being able to run 1ms every 2ms. Following this
behavior, the task would be able to run 500 ms in 1 sec. Thus running
more than the 5 ms / 1 sec the admission control allowed it to run.
Trying to address the self-suspending case, Luca Abeni, Giuseppe
Lipari, and Juri Lelli [1] revisited the CBS in order to deal with
self-suspending tasks. In the new approach, rather than
replenishing/postponing the absolute deadline, the revised wakeup rule
adjusts the remaining runtime, reducing it to fit into the allowed
density.
A revised version of the idea is:
At a given time t, the maximum absolute density of a task cannot be
higher than its relative density, that is:
runtime / (deadline - t) <= dl_runtime / dl_deadline
Knowing the laxity of a task (deadline - t), it is possible to move
it to the other side of the equality, thus enabling to define max
remaining runtime a task can use within the absolute deadline, without
over-running the allowed density:
runtime = (dl_runtime / dl_deadline) * (deadline - t)
For instance, in our previous example, the task could still run:
runtime = ( 5 / 7 ) * 5
runtime = 3.57 ms
Without causing damage for other deadline tasks. It is note worthy
that the laxity cannot be negative because that would cause a negative
runtime. Thus, this patch depends on the patch:
df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline")
Which throttles a constrained deadline task activated after the
deadline.
Finally, it is also possible to use the revised wakeup rule for
all other tasks, but that would require some more discussions
about pros and cons.
[The main difference from the original commit is that
the BW_SHIFT define was not present yet. As BW_SHIFT was
introduced in a new feature, I just used the value (20),
likewise we used to use before the #define.
Other changes were required because of comments. - bistrot]
Reported-by: Xunlei Pang <xpang@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
[peterz: replaced dl_is_constrained with dl_is_implicit]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/5c800ab3a74a168a84ee5f3f84d12a02e11383be.1495803804.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
In up-migrate path, select_energy_cpu_brute() was called directly
without checking energy_aware(). This will make select_energy_cpu_brute()
always worked even disabling energy_aware() on the asymmetric cpu
capacity system.
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpnhEgACgkQONu9yGCS
aT6wiBAAszhEwuUQy79/r5C8BTgpQNkt7rGWwZGRMz/nd/FTZSdJjZCI93NdT144
2i9x0ejQXkdpld2Al3Rl5GOlqEw43XTWqgiU3h/fW4nS+l/gpVZu2b9/2jsmsz36
cJGikTqwofs8wMzIlrAvfHIdXKrEAzeIbsp1NuDFq7WTdeUGorzu4ZSw7MfjQN70
tXSctd1IAhr776p6OqihVkasKV4S3D83vowivpvSCRsHR8HmmtS2kIl9QlHwNJo6
KzH3z5DHupJev+qYMsy7AucZjiDuQbXCw+9kPb9jAqFC00fBOng6DwNA63DaAL7N
QIx+tGJNUT/OPJTl0oift33Zg2fWALmsoSqHH6eJal7XjcP0sSLEnF91ayWms+BQ
m8qURMCYFShguk3om9jO4yZr6C+YbaqXxqGnhjPhnX2TvueUf7zTinXUk6d3JEfX
wnaugvqHyzWdPdxCOdBkUJ7YWRoODRKKrCHIB17A9063bZN0PombhimAPOR69NC5
kqd0bzK/lnY7OUGHipK/nfPRVJfSJlR43AFehaloowI/6hUe057v2bc3IQgTBUf1
kqX5wQD/VfhEtVibk5GomsgE/ERBkhIqpKNhm5U+/Qe2szO/XiKYuh3rEKGsTXus
0vx+TqIFpKt+oSY5rhtv9coRJov5kMnw2PYVsO+qr2TQ6TMILyQ=
=nlXw
-----END PGP SIGNATURE-----
Merge 4.4.113 into android-4.4
Changes in 4.4.113
gcov: disable for COMPILE_TEST
x86/cpu/AMD: Make LFENCE a serializing instruction
x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC
x86/mm/32: Move setup_clear_cpu_cap(X86_FEATURE_PCID) earlier
x86/asm: Use register variable to get stack pointer value
x86/kbuild: enable modversions for symbols exported from asm
x86/asm: Make asm/alternative.h safe from assembly
EXPORT_SYMBOL() for asm
kconfig.h: use __is_defined() to check if MODULE is defined
x86/retpoline: Add initial retpoline support
x86/spectre: Add boot time option to select Spectre v2 mitigation
x86/retpoline/crypto: Convert crypto assembler indirect jumps
x86/retpoline/entry: Convert entry assembler indirect jumps
x86/retpoline/ftrace: Convert ftrace assembler indirect jumps
x86/retpoline/hyperv: Convert assembler indirect jumps
x86/retpoline/xen: Convert Xen hypercall indirect jumps
x86/retpoline/checksum32: Convert assembler indirect jumps
x86/retpoline/irq32: Convert assembler indirect jumps
x86/retpoline: Fill return stack buffer on vmexit
x86/retpoline: Remove compile time warning
scsi: sg: disable SET_FORCE_LOW_DMA
futex: Prevent overflow by strengthen input validation
ALSA: pcm: Remove yet superfluous WARN_ON()
ALSA: hda - Apply headphone noise quirk for another Dell XPS 13 variant
ALSA: hda - Apply the existing quirk to iMac 14,1
af_key: fix buffer overread in verify_address_len()
af_key: fix buffer overread in parse_exthdrs()
scsi: hpsa: fix volume offline state
sched/deadline: Zero out positive runtime after throttling constrained tasks
x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macros
module: Add retpoline tag to VERMAGIC
pipe: avoid round_pipe_size() nr_pages overflow on 32-bit
x86/apic/vector: Fix off by one in error path
Input: 88pm860x-ts - fix child-node lookup
Input: twl6040-vibra - fix DT node memory management
Input: twl6040-vibra - fix child-node lookup
Input: twl4030-vibra - fix sibling-node lookup
tracing: Fix converting enum's from the map in trace_event_eval_update()
phy: work around 'phys' references to usb-nop-xceiv devices
ARM: dts: kirkwood: fix pin-muxing of MPP7 on OpenBlocks A7
can: peak: fix potential bug in packet fragmentation
libata: apply MAX_SEC_1024 to all LITEON EP1 series devices
dm btree: fix serious bug in btree_split_beneath()
dm thin metadata: THIN_MAX_CONCURRENT_LOCKS should be 6
arm64: KVM: Fix SMCCC handling of unimplemented SMC/HVC calls
x86/cpu, x86/pti: Do not enable PTI on AMD processors
kbuild: modversions for EXPORT_SYMBOL() for asm
x86/mce: Make machine check speculation protected
retpoline: Introduce start/end markers of indirect thunk
kprobes/x86: Blacklist indirect thunk functions for kprobes
kprobes/x86: Disable optimizing on the function jumps to indirect thunk
x86/pti: Document fix wrong index
x86/retpoline: Optimize inline assembler for vmexit_fill_RSB
MIPS: AR7: ensure the port type's FCR value is used
Linux 4.4.113
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 1ebe1eaf2f02784921759992ae1fde1a9bec8fd0 upstream.
Since enums do not get converted by the TRACE_EVENT macro into their values,
the event format displaces the enum name and not the value. This breaks
tools like perf and trace-cmd that need to interpret the raw binary data. To
solve this, an enum map was created to convert these enums into their actual
numbers on boot up. This is done by TRACE_EVENTS() adding a
TRACE_DEFINE_ENUM() macro.
Some enums were not being converted. This was caused by an optization that
had a bug in it.
All calls get checked against this enum map to see if it should be converted
or not, and it compares the call's system to the system that the enum map
was created under. If they match, then they call is processed.
To cut down on the number of iterations needed to find the maps with a
matching system, since calls and maps are grouped by system, when a match is
made, the index into the map array is saved, so that the next call, if it
belongs to the same system as the previous call, could start right at that
array index and not have to scan all the previous arrays.
The problem was, the saved index was used as the variable to know if this is
a call in a new system or not. If the index was zero, it was assumed that
the call is in a new system and would keep incrementing the saved index
until it found a matching system. The issue arises when the first matching
system was at index zero. The next map, if it belonged to the same system,
would then think it was the first match and increment the index to one. If
the next call belong to the same system, it would begin its search of the
maps off by one, and miss the first enum that should be converted. This left
a single enum not converted properly.
Also add a comment to describe exactly what that index was for. It took me a
bit too long to figure out what I was thinking when debugging this issue.
Link: http://lkml.kernel.org/r/717BE572-2070-4C1E-9902-9F2E0FEDA4F8@oracle.com
Fixes: 0c564a538a ("tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values")
Reported-by: Chuck Lever <chuck.lever@oracle.com>
Teste-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ae83b56a56f8d9643dedbee86b457fa1c5d42f59 upstream.
When a contrained task is throttled by dl_check_constrained_dl(),
it may carry the remaining positive runtime, as a result when
dl_task_timer() fires and calls replenish_dl_entity(), it will
not be replenished correctly due to the positive dl_se->runtime.
This patch assigns its runtime to 0 if positive after throttling.
Signed-off-by: Xunlei Pang <xlpang@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: df8eac8cafce ("sched/deadline: Throttle a constrained deadline task activated after the deadline)
Link: http://lkml.kernel.org/r/1494421417-27550-1-git-send-email-xlpang@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a upstream.
UBSAN reports signed integer overflow in kernel/futex.c:
UBSAN: Undefined behaviour in kernel/futex.c:2041:18
signed integer overflow:
0 - -2147483648 cannot be represented in type 'int'
Add a sanity check to catch negative values of nr_wake and nr_requeue.
Signed-off-by: Li Jinyue <lijinyue@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: peterz@infradead.org
Cc: dvhart@infradead.org
Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit cc622420798c4bcf093785d872525087a7798db9 upstream.
Enabling gcov is counterproductive to compile testing: it significantly
increases the kernel image size, compile time, and it produces lots
of false positive "may be used uninitialized" warnings as the result
of missed optimizations.
This is in line with how UBSAN_SANITIZE_ALL and PROFILE_ALL_BRANCHES
work, both of which have similar problems.
With an ARM allmodconfig kernel, I see the build time drop from
283 minutes CPU time to 225 minutes, and the vmlinux size drops
from 43MB to 26MB.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Michal Marek <mmarek@suse.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
After applying up-migrate patches(dc626b2 sched: avoid pushing
tasks to an offline CPU, 2da014c sched: Extend active balance
to accept 'push_task' argument), leaving EAS disabled and doing
a stability test which includes some random cpu plugin/plugout.
There are two types crashes happened as below:
TYPE 1:
[ 2072.653091] c1 ------------[ cut here ]------------
[ 2072.653133] c1 WARNING: CPU: 1 PID: 13 at kernel/fork.c:252 __put_task_struct+0x30/0x124()
[ 2072.653173] c1 CPU: 1 PID: 13 Comm: migration/1 Tainted: G W O 4.4.83-01066-g04c5403-dirty #17
[ 2072.653215] c1 [<c011141c>] (unwind_backtrace) from [<c010ced8>] (show_stack+0x20/0x24)
[ 2072.653235] c1 [<c010ced8>] (show_stack) from [<c043d7f8>] (dump_stack+0xa8/0xe0)
[ 2072.653255] c1 [<c043d7f8>] (dump_stack) from [<c012be04>] (warn_slowpath_common+0x98/0xc4)
[ 2072.653273] c1 [<c012be04>] (warn_slowpath_common) from [<c012beec>] (warn_slowpath_null+0x2c/0x34)
[ 2072.653291] c1 [<c012beec>] (warn_slowpath_null) from [<c01293b4>] (__put_task_struct+0x30/0x124)
[ 2072.653310] c1 [<c01293b4>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314)
[ 2072.653331] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144)
[ 2072.653352] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270)
[ 2072.653370] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[ 2072.653388] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[ 2072.653400] c1 ---[ end trace 49c3d154890763fc ]---
[ 2072.653418] c1 Unable to handle kernel NULL pointer dereference at virtual address 00000000
...
[ 2072.832804] c1 [<c01ba00c>] (put_css_set) from [<c01be870>] (cgroup_free+0x6c/0x78)
[ 2072.832823] c1 [<c01be870>] (cgroup_free) from [<c01293f8>] (__put_task_struct+0x74/0x124)
[ 2072.832844] c1 [<c01293f8>] (__put_task_struct) from [<c0166964>] (active_load_balance_cpu_stop+0x22c/0x314)
[ 2072.832860] c1 [<c0166964>] (active_load_balance_cpu_stop) from [<c01c2604>] (cpu_stopper_thread+0x90/0x144)
[ 2072.832879] c1 [<c01c2604>] (cpu_stopper_thread) from [<c014d80c>] (smpboot_thread_fn+0x258/0x270)
[ 2072.832896] c1 [<c014d80c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[ 2072.832914] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[ 2072.832930] c1 Code: f57ff05b f590f000 e3e02000 e3a03001 (e1941f9f)
[ 2072.839208] c1 ---[ end trace 49c3d154890763fd ]---
TYPE 2:
[ 214.742695] c1 ------------[ cut here ]------------
[ 214.742709] c1 kernel BUG at kernel/smpboot.c:136!
[ 214.742718] c1 Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
[ 214.748785] c1 CPU: 1 PID: 18 Comm: migration/2 Tainted: G W O 4.4.83-00912-g370f62c #1
[ 214.748805] c1 task: ef2d9680 task.stack: ee862000
[ 214.748821] c1 PC is at smpboot_thread_fn+0x168/0x270
[ 214.748832] c1 LR is at smpboot_thread_fn+0xe4/0x270
...
[ 214.821339] c1 [<c014d71c>] (smpboot_thread_fn) from [<c0149ee4>] (kthread+0x118/0x12c)
[ 214.821363] c1 [<c0149ee4>] (kthread) from [<c0108310>] (ret_from_fork+0x14/0x24)
[ 214.821378] c1 Code: e5950000 e5943010 e1500003 0a000000 (e7f001f2)
[ 214.827676] c1 ---[ end trace da87539f59bab8de ]---
For the first type crash, the root cause is the push_task pointer will be
used without initialization on the out_lock path. And maybe cpu hotplug in/out
make this happen more easily.
For the second type crash, it hits 'BUG_ON(td->cpu != smp_processor_id());' in
smpboot_thread_fn(). It seems that OOPS was caused by migration/2 which actually
running on cpu1. And I haven't found what actually happened.
However, after this fix, the second type crash seems gone too.
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpfCtUACgkQONu9yGCS
aT6ZvBAAqxRZ9H5LCEVboN5KE4cvTDS7pYhJPk518ZxnSJslwUl7SZ+AOzxivV9w
YouBOEHbufSmbVJgPgsxuhlFsw+TMOYATUBVWIBrWjuD+nD+ooba0j5nb4FW2SOc
XTWv5X8t+Ho19uWcq7w9W+3Ang5f8ySNZUZIG4F/HTeRGU3//J29wfEP2nM9cVOJ
ZsOze9aK88KbLwgJRr2uCa/eyARvUeqOFomIlUhLNHgtU8xfEEKVX72r68RJ/bbU
xhoceKJHXLDnA29ZFG6hEi/EIgG6Zr9Iwp/QBe2JtcGtpXCNTR1f+VuW//rcqzka
OBXctQlObRuZ361jl+WcWg3aycK8DgSJPgC1+QTEcOULa64smu3n//ICqdPNHWSS
MIG1iVH5zKhtRyDkVZKnk66jqi04GWZ370FpmUvrmaOLFftSM7FHk/U4GDR5eOFJ
8vxARTrUF4ls2weLBwNiR7zFLiI7iaN8LYmGnjLeBvgVy4u8zZgqfrhwDrMX7dh6
mEAjNNufLTrsGo7O8tNhwI3KIn7s4gJp5u3c28I0LmB+G3OH+jIopy0o/NXXjAkm
5gYGsf5mkf0I2SbDT/wkRSAFwuhCfgWKfQiTZmdukLuRo5VaL+SP148hZBcTol0z
Jsqpy8SeAkWkPcegoMUwGQLRVU3QM1NL0NpT1TAT1Ng4lw5igxU=
=7usw
-----END PGP SIGNATURE-----
Merge 4.4.112 into android-4.4
Changes in 4.4.112
dm bufio: fix shrinker scans when (nr_to_scan < retain_target)
KVM: Fix stack-out-of-bounds read in write_mmio
can: gs_usb: fix return value of the "set_bittiming" callback
IB/srpt: Disable RDMA access by the initiator
MIPS: Validate PR_SET_FP_MODE prctl(2) requests against the ABI of the task
MIPS: Factor out NT_PRFPREG regset access helpers
MIPS: Guard against any partial write attempt with PTRACE_SETREGSET
MIPS: Consistently handle buffer counter with PTRACE_SETREGSET
MIPS: Fix an FCSR access API regression with NT_PRFPREG and MSA
MIPS: Also verify sizeof `elf_fpreg_t' with PTRACE_SETREGSET
MIPS: Disallow outsized PTRACE_SETREGSET NT_PRFPREG regset accesses
net/mac80211/debugfs.c: prevent build failure with CONFIG_UBSAN=y
kvm: vmx: Scrub hardware GPRs at VM-exit
x86/vsdo: Fix build on PARAVIRT_CLOCK=y, KVM_GUEST=n
x86/acpi: Handle SCI interrupts above legacy space gracefully
iommu/arm-smmu-v3: Don't free page table ops twice
ALSA: pcm: Remove incorrect snd_BUG_ON() usages
ALSA: pcm: Add missing error checks in OSS emulation plugin builder
ALSA: pcm: Abort properly at pending signal in OSS read/write loops
ALSA: pcm: Allow aborting mutex lock at OSS read/write loops
ALSA: aloop: Release cable upon open error path
ALSA: aloop: Fix inconsistent format due to incomplete rule
ALSA: aloop: Fix racy hw constraints adjustment
x86/acpi: Reduce code duplication in mp_override_legacy_irq()
mm/compaction: fix invalid free_pfn and compact_cached_free_pfn
mm/compaction: pass only pageblock aligned range to pageblock_pfn_to_page
mm/page-writeback: fix dirty_ratelimit calculation
mm/zswap: use workqueue to destroy pool
zswap: don't param_set_charp while holding spinlock
locks: don't check for race with close when setting OFD lock
futex: Replace barrier() in unqueue_me() with READ_ONCE()
locking/mutex: Allow next waiter lockless wakeup
usbvision fix overflow of interfaces array
usb: musb: ux500: Fix NULL pointer dereference at system PM
r8152: fix the wake event
r8152: use test_and_clear_bit
r8152: adjust ALDPS function
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
x86/mm/pat, /dev/mem: Remove superfluous error message
hwrng: core - sleep interruptible in read
sysrq: Fix warning in sysrq generated crash.
xhci: Fix ring leak in failure path of xhci_alloc_virt_device()
Revert "userfaultfd: selftest: vm: allow to build in vm/ directory"
x86/pti/efi: broken conversion from efi to kernel page table
8021q: fix a memory leak for VLAN 0 device
ip6_tunnel: disable dst caching if tunnel is dual-stack
net: core: fix module type in sock_diag_bind
RDS: Heap OOB write in rds_message_alloc_sgs()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
sh_eth: fix SH7757 GEther initialization
net: stmmac: enable EEE in MII, GMII or RGMII only
ipv6: fix possible mem leaks in ipv6_make_skb()
crypto: algapi - fix NULL dereference in crypto_remove_spawns()
rbd: set max_segments to USHRT_MAX
x86/microcode/intel: Extend BDW late-loading with a revision check
KVM: x86: Add memory barrier on vmcs field lookup
drm/vmwgfx: Potential off by one in vmw_view_add()
kaiser: Set _PAGE_NX only if supported
bpf: add bpf_patch_insn_single helper
bpf: don't (ab)use instructions to store state
bpf: move fixup_bpf_calls() function
bpf: refactor fixup_bpf_calls()
bpf: adjust insn_aux_data when patching insns
bpf: prevent out-of-bounds speculation
bpf, array: fix overflow in max_entries and undefined behavior in index_mask
iscsi-target: Make TASK_REASSIGN use proper se_cmd->cmd_kref
target: Avoid early CMD_T_PRE_EXECUTE failures during ABORT_TASK
USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
USB: serial: cp210x: add new device ID ELV ALC 8xxx
usb: misc: usb3503: make sure reset is low for at least 100us
USB: fix usbmon BUG trigger
usbip: remove kernel addresses from usb device and urb debug msgs
staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
Bluetooth: Prevent stack info leak from the EFS element.
uas: ignore UAS for Norelsys NS1068(X) chips
e1000e: Fix e1000_check_for_copper_link_ich8lan return value.
x86/Documentation: Add PTI description
x86/cpu: Factor out application of forced CPU caps
x86/cpufeatures: Make CPU bugs sticky
x86/cpufeatures: Add X86_BUG_CPU_INSECURE
x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN
x86/cpufeatures: Add X86_BUG_SPECTRE_V[12]
x86/cpu: Merge bugs.c and bugs_64.c
sysfs/cpu: Add vulnerability folder
x86/cpu: Implement CPU vulnerabilites sysfs functions
sysfs/cpu: Fix typos in vulnerability documentation
x86/alternatives: Fix optimize_nops() checking
x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm
selftests/x86: Add test_vsyscall
Linux 4.4.112
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit bbeb6e4323dad9b5e0ee9f60c223dd532e2403b1 upstream.
syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
and thus unprivileged. With the recently added logic in b2157399cc98
("bpf: prevent out-of-bounds speculation") we round this up to the next
power of two value for max_entries for unprivileged such that we can
apply proper masking into potentially zeroed out map slots.
However, this will generate an index_mask of 0xffffffff, and therefore
a + 1 will let this overflow into new max_entries of 0. This will pass
allocation, etc, and later on map access we still enforce on the original
attr->max_entries value which was 0xfffffffd, therefore triggering GPF
all over the place. Thus bail out on overflow in such case.
Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
space is undefined. Therefore, do this by hand in a 64 bit variable.
This fixes all the issues triggered by syzkaller's reproducers.
Fixes: b2157399cc98 ("bpf: prevent out-of-bounds speculation")
Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.
Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.
To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.
Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index & index_mask' operation.
If prog_array map is created by unpriv user replace
bpf_tail_call(ctx, map, index);
with
if (index >= max_entries) {
index &= map->index_mask;
bpf_tail_call(ctx, map, index);
}
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index >= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.
Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.
That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.
v2->v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8041902dae5299c1f194ba42d14383f734631009 upstream.
convert_ctx_accesses() replaces single bpf instruction with a set of
instructions. Adjust corresponding insn_aux_data while patching.
It's needed to make sure subsequent 'for(all insn)' loops
have matching insn and insn_aux_data.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 79741b3bdec01a8628368fbcfccc7d189ed606cb upstream.
reduce indent and make it iterate over instructions similar to
convert_ctx_accesses(). Also convert hard BUG_ON into soft verifier error.
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e245c5c6a5656e4d61aa7bb08e9694fd6e5b2b9d upstream.
no functional change.
move fixup_bpf_calls() to verifier.c
it's being refactored in the next patch
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3df126f35f88dc76eea33769f85a3c3bb8ce6c6b upstream.
Storing state in reserved fields of instructions makes
it impossible to run verifier on programs already
marked as read-only. Allocate and use an array of
per-instruction state instead.
While touching the error path rename and move existing
jump target.
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c237ee5eb33bf19fe0591c04ff8db19da7323a83 upstream.
Move the functionality to patch instructions out of the verifier
code and into the core as the new bpf_patch_insn_single() helper
will be needed later on for blinding as well. No changes in
functionality.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6 upstream.
Make use of wake-queues and enable the wakeup to occur after releasing the
wait_lock. This is similar to what we do with rtmutex top waiter,
slightly shortening the critical region and allow other waiters to
acquire the wait_lock sooner. In low contention cases it can also help
the recently woken waiter to find the wait_lock available (fastpath)
when it continues execution.
Reviewed-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ding Tianhong <dingtianhong@huawei.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <waiman.long@hpe.com>
Cc: Will Deacon <Will.Deacon@arm.com>
Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 29b75eb2d56a714190a93d7be4525e617591077a upstream.
Commit e91467ecd1 ("bug in futex unqueue_me") introduced a barrier() in
unqueue_me() to prevent the compiler from rereading the lock pointer which
might change after a check for NULL.
Replace the barrier() with a READ_ONCE() for the following reasons:
1) READ_ONCE() is a weaker form of barrier() that affects only the specific
load operation, while barrier() is a general compiler level memory barrier.
READ_ONCE() was not available at the time when the barrier was added.
2) Aside of that READ_ONCE() is descriptive and self explainatory while a
barrier without comment is not clear to the casual reader.
No functional change.
[ tglx: Massaged changelog ]
Signed-off-by: Jianyu Zhan <nasa4836@gmail.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Darren Hart <dvhart@linux.intel.com>
Cc: dave@stgolabs.net
Cc: peterz@infradead.org
Cc: linux@rasmusvillemoes.dk
Cc: akpm@linux-foundation.org
Cc: fengguang.wu@intel.com
Cc: bigeasy@linutronix.de
Link: http://lkml.kernel.org/r/1457314344-5685-1-git-send-email-nasa4836@gmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpVzqgACgkQONu9yGCS
aT5dRg//ar6AJzOM7VRU4Zpb6XAR6524mM2VLLFP8xwhWwqjqyJuqWw7OxhWeEY2
5BvljZNt3vn2v+2fjxLthDUFSfvrcdgriGG5xTMQG9AlRwFUhDKNe5SL8F/q0aiG
G49Txm9GjWQNc50AvSRIWg9N5IOvvWC3QU0IGD2SEOng/IB7vtXIBokr+rFBPARa
6+Vr4fEpTXoOrhZ8niQmWarpH9fqWPVHC8MagKR1kwHyL6pQhSK4rdSJETpJw+4v
YzZ7ZWR7wGdMkiUzn0sYWwWVlwrUAo7zAsvouZYTPY6q8LJQGXkt5vzZd+zjZ1hA
kEFyuHSgjXQLEUAE+wfdsJC/sfdTOwZ94Jxc+reL9lAIBykiQ8U232k1dMKUhDOx
EdPNuB/+TdRSTxskoyS54t+2wTN9JYvrDr2Nzg8CJ1Q5juka8fXlslRNvvHAS3wZ
OCus40TUFmvVKA9jtlMAHKpEyKu+le9LZbjQU00Bdsp3NIGe6G8y+8ZlW81cePfH
OKDUOqjme9vqT26v7cneM05ItXeQcchi5NElzwOtMZUmaZvyngVVClq0uDay0Pa9
2kprHnw4rJY3wRvLzdXf/+fAOmSe3nYHuws+dQOTPGJwRWSNFqg3Jjjp3ybdBhfU
SgfcUTvuDKY0UzhFqFRFU9+1NwafkcECVztTsZBBOdRl+wag/1w=
=/oVX
-----END PGP SIGNATURE-----
Merge 4.4.111 into android-4.4
Changes in 4.4.111
x86/kasan: Write protect kasan zero shadow
kernel/acct.c: fix the acct->needcheck check in check_free_space()
crypto: n2 - cure use after free
crypto: chacha20poly1305 - validate the digest size
crypto: pcrypt - fix freeing pcrypt instances
sunxi-rsb: Include OF based modalias in device uevent
fscache: Fix the default for fscache_maybe_release_page()
kernel: make groups_sort calling a responsibility group_info allocators
kernel/signal.c: protect the traced SIGNAL_UNKILLABLE tasks from SIGKILL
kernel/signal.c: protect the SIGNAL_UNKILLABLE tasks from !sig_kernel_only() signals
kernel/signal.c: remove the no longer needed SIGNAL_UNKILLABLE check in complete_signal()
ARC: uaccess: dont use "l" gcc inline asm constraint modifier
Input: elantech - add new icbody type 15
x86/microcode/AMD: Add support for fam17h microcode loading
parisc: Fix alignment of pa_tlb_lock in assembly on 32-bit SMP kernel
x86/tlb: Drop the _GPL from the cpu_tlbstate export
genksyms: Handle string literals with spaces in reference files
module: keep percpu symbols in module's symtab
module: Issue warnings when tainting kernel
proc: much faster /proc/vmstat
Map the vsyscall page with _PAGE_USER
Fix build error in vma.c
Linux 4.4.111
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 3205c36cf7d96024626f92d65f560035df1abcb2 upstream.
While most of the locations where a kernel taint bit is set are accompanied
with a warning message, there are two which set their bits silently. If
the tainting module gets unloaded later on, it is almost impossible to tell
what was the reason for setting the flag.
Signed-off-by: Libor Pechacek <lpechacek@suse.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e0224418516b4d8a6c2160574bac18447c354ef0 upstream.
Currently, percpu symbols from .data..percpu ELF section of a module are
not copied over and stored in final symtab array of struct module.
Consequently such symbol cannot be returned via kallsyms API (for
example kallsyms_lookup_name). This can be especially confusing when the
percpu symbol is exported. Only its __ksymtab et al. are present in its
symtab.
The culprit is in layout_and_allocate() function where SHF_ALLOC flag is
dropped for .data..percpu section. There is in fact no need to copy the
section to final struct module, because kernel module loader allocates
extra percpu section by itself. Unfortunately only symbols from
SHF_ALLOC sections are copied due to a check in is_core_symbol().
The patch changes is_core_symbol() function to copy over also percpu
symbols (their st_shndx points to .data..percpu ELF section). We do it
only if CONFIG_KALLSYMS_ALL is set to be consistent with the rest of the
function (ELF section is SHF_ALLOC but !SHF_EXECINSTR). Finally
elf_type() returns type 'a' for a percpu symbol because its address is
absolute.
Signed-off-by: Miroslav Benes <mbenes@suse.cz>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 426915796ccaf9c2bd9bb06dc5702225957bc2e5 upstream.
complete_signal() checks SIGNAL_UNKILLABLE before it starts to destroy
the thread group, today this is wrong in many ways.
If nothing else, fatal_signal_pending() should always imply that the
whole thread group (except ->group_exit_task if it is not NULL) is
killed, this check breaks the rule.
After the previous changes we can rely on sig_task_ignored();
sig_fatal(sig) && SIGNAL_UNKILLABLE can only be true if we actually want
to kill this task and sig == SIGKILL OR it is traced and debugger can
intercept the signal.
This should hopefully fix the problem reported by Dmitry. This
test-case
static int init(void *arg)
{
for (;;)
pause();
}
int main(void)
{
char stack[16 * 1024];
for (;;) {
int pid = clone(init, stack + sizeof(stack)/2,
CLONE_NEWPID | SIGCHLD, NULL);
assert(pid > 0);
assert(ptrace(PTRACE_ATTACH, pid, 0, 0) == 0);
assert(waitpid(-1, NULL, WSTOPPED) == pid);
assert(ptrace(PTRACE_DETACH, pid, 0, SIGSTOP) == 0);
assert(syscall(__NR_tkill, pid, SIGKILL) == 0);
assert(pid == wait(NULL));
}
}
triggers the WARN_ON_ONCE(!(task->jobctl & JOBCTL_STOP_PENDING)) in
task_participate_group_stop(). do_signal_stop()->signal_group_exit()
checks SIGNAL_GROUP_EXIT and return false, but task_set_jobctl_pending()
checks fatal_signal_pending() and does not set JOBCTL_STOP_PENDING.
And his should fix the minor security problem reported by Kyle,
SECCOMP_RET_TRACE can miss fatal_signal_pending() the same way if the
task is the root of a pid namespace.
Link: http://lkml.kernel.org/r/20171103184246.GD21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Kyle Huey <me@kylehuey.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ac25385089f673560867eb5179228a44ade0cfc1 upstream.
Change sig_task_ignored() to drop the SIG_DFL && !sig_kernel_only()
signals even if force == T. This simplifies the next change and this
matches the same check in get_signal() which will drop these signals
anyway.
Link: http://lkml.kernel.org/r/20171103184227.GC21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 628c1bcba204052d19b686b5bac149a644cdb72e upstream.
The comment in sig_ignored() says "Tracers may want to know about even
ignored signals" but SIGKILL can not be reported to debugger and it is
just wrong to return 0 in this case: SIGKILL should only kill the
SIGNAL_UNKILLABLE task if it comes from the parent ns.
Change sig_ignored() to ignore ->ptrace if sig == SIGKILL and rely on
sig_task_ignored().
SISGTOP coming from within the namespace is not really right too but at
least debugger can intercept it, and we can't drop it here because this
will break "gdb -p 1": ptrace_attach() won't work. Perhaps we will add
another ->ptrace check later, we will see.
Link: http://lkml.kernel.org/r/20171103184206.GB21036@redhat.com
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Tested-by: Kyle Huey <me@kylehuey.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.
In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.
This patch:
- Make groups_sort globally visible.
- Move the call to groups_sort to the modifiers of group_info
- Remove the call to groups_sort from set_groups
Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker <thiago.becker@gmail.com>
Reviewed-by: Matthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NeilBrown <neilb@suse.com>
Acked-by: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4d9570158b6260f449e317a5f9ed030c2504a615 upstream.
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check
is very wrong, we need time_is_after_jiffies() to make sys_acct() work.
Ignoring the overflows, the code should "goto out" if needcheck >
jiffies, while currently it checks "needcheck < jiffies" and thus in the
likely case check_free_space() does nothing until jiffies overflow.
In particular this means that sys_acct() is simply broken, acct_on()
sets acct->needcheck = jiffies and expects that check_free_space()
should set acct->active = 1 after the free-space check, but this won't
happen if jiffies increments in between.
This was broken by commit 32dc730860 ("get rid of timer in
kern/acct.c") in 2011, then another (correct) commit 795a2f22a8
("acct() should honour the limits from the very beginning") made the
problem more visible.
Link: http://lkml.kernel.org/r/20171213133940.GA6554@redhat.com
Fixes: 32dc730860 ("get rid of timer in kern/acct.c")
Reported-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Suggested-by: TSUKADA Koutaro <tsukada@ascade.co.jp>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpPj0wACgkQONu9yGCS
aT5QOhAAu3PoT3472I7zuWDUG0KQo5r0wdUO+YPW31VIHrxQ2H3sxR44rSHc5jW/
tTg2TIYNBkNoj4jJDJ9J7f6PSnN1vGFglFW4GzxE3cr2+W7u5M5ex8yCYMcBIY9U
56hbyqX5lf5KjGWJiQThwYsMBokrBJW2igAFN3cW39nNABhl0W39kiysGA9vbNrV
+QMA4+ZADA2EeIRcdJmj8uc/cez/7sGAfrSktvATkI+HFamnTs0mrx9cl0eQKvjm
y5PCxYUCbi4kqD4WM+UCYO3zpUD+r4iMDXwXBwLWkFvbumY4mVTItP+gq5M4Fb1g
MSauGUGH7BDsT9gspricCMcAmjcTn6hth7/7/ZhlNq3NZv89pOquhpE0JOSAmYbA
P4WaIRRWwpVrRt+THU7vZpAQWpFSwGmtE7tBfPMt2J7zqY3lMYmO3DoA+gejw3CV
igbvmV0UY2uYSFnjawUUJ+k+ggYfGyRkUl2DfcllPhZFqE1XEi3NyjI0wi8vtXTd
UlrU55TqsldCw1bjXH3lWrpoNybWvqUD2a249ZVs/h06Q5NKwNL8mTye+2BBQtCP
QzAqHYbkBKv/f8M6Kg+HtTzgqUbWxVCeQTWFXHMAPVo4bCwGvVGrXbGJIj15lBuQ
GWqc3dt69zxpn1tlcRHKH0P3KnkC67dARtY+8F8+D+HAHVY71Bg=
=Kpwd
-----END PGP SIGNATURE-----
Merge 4.4.110 into android-4.4
Changes in 4.4.110
x86/boot: Add early cmdline parsing for options with arguments
KAISER: Kernel Address Isolation
kaiser: merged update
kaiser: do not set _PAGE_NX on pgd_none
kaiser: stack map PAGE_SIZE at THREAD_SIZE-PAGE_SIZE
kaiser: fix build and FIXME in alloc_ldt_struct()
kaiser: KAISER depends on SMP
kaiser: fix regs to do_nmi() ifndef CONFIG_KAISER
kaiser: fix perf crashes
kaiser: ENOMEM if kaiser_pagetable_walk() NULL
kaiser: tidied up asm/kaiser.h somewhat
kaiser: tidied up kaiser_add/remove_mapping slightly
kaiser: kaiser_remove_mapping() move along the pgd
kaiser: cleanups while trying for gold link
kaiser: name that 0x1000 KAISER_SHADOW_PGD_OFFSET
kaiser: delete KAISER_REAL_SWITCH option
kaiser: vmstat show NR_KAISERTABLE as nr_overhead
kaiser: enhanced by kernel and user PCIDs
kaiser: load_new_mm_cr3() let SWITCH_USER_CR3 flush user
kaiser: PCID 0 for kernel and 128 for user
kaiser: x86_cr3_pcid_noflush and x86_cr3_pcid_user
kaiser: paranoid_entry pass cr3 need to paranoid_exit
kaiser: _pgd_alloc() without __GFP_REPEAT to avoid stalls
kaiser: fix unlikely error in alloc_ldt_struct()
kaiser: add "nokaiser" boot option, using ALTERNATIVE
x86/kaiser: Rename and simplify X86_FEATURE_KAISER handling
x86/kaiser: Check boottime cmdline params
kaiser: use ALTERNATIVE instead of x86_cr3_pcid_noflush
kaiser: drop is_atomic arg to kaiser_pagetable_walk()
kaiser: asm/tlbflush.h handle noPGE at lower level
kaiser: kaiser_flush_tlb_on_return_to_user() check PCID
x86/paravirt: Dont patch flush_tlb_single
x86/kaiser: Reenable PARAVIRT
kaiser: disabled on Xen PV
x86/kaiser: Move feature detection up
KPTI: Rename to PAGE_TABLE_ISOLATION
KPTI: Report when enabled
x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
x86/vdso: Get pvclock data from the vvar VMA instead of the fixmap
x86/kasan: Clear kasan_zero_page after TLB flush
kaiser: Set _PAGE_NX only if supported
Linux 4.4.110
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Kaiser only needs to map one page of the stack; and
kernel/fork.c did not build on powerpc (no __PAGE_KERNEL).
It's all cleaner if linux/kaiser.h provides kaiser_map_thread_stack()
and kaiser_unmap_thread_stack() wrappers around asm/kaiser.h's
kaiser_add_mapping() and kaiser_remove_mapping(). And use
linux/kaiser.h in init/main.c to avoid the #ifdefs there.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This patch introduces our implementation of KAISER (Kernel Address Isolation to
have Side-channels Efficiently Removed), a kernel isolation technique to close
hardware side channels on kernel address information.
More information about the patch can be found on:
https://github.com/IAIK/KAISER
From: Richard Fellner <richard.fellner@student.tugraz.at>
From: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
X-Subject: [RFC, PATCH] x86_64: KAISER - do not map kernel in user mode
Date: Thu, 4 May 2017 14:26:50 +0200
Link: http://marc.info/?l=linux-kernel&m=149390087310405&w=2
Kaiser-4.10-SHA1: c4b1831d44c6144d3762ccc72f0c4e71a0c713e5
To: <linux-kernel@vger.kernel.org>
To: <kernel-hardening@lists.openwall.com>
Cc: <clementine.maurice@iaik.tugraz.at>
Cc: <moritz.lipp@iaik.tugraz.at>
Cc: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Cc: Richard Fellner <richard.fellner@student.tugraz.at>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: <kirill.shutemov@linux.intel.com>
Cc: <anders.fogh@gdata-adan.de>
After several recent works [1,2,3] KASLR on x86_64 was basically
considered dead by many researchers. We have been working on an
efficient but effective fix for this problem and found that not mapping
the kernel space when running in user mode is the solution to this
problem [4] (the corresponding paper [5] will be presented at ESSoS17).
With this RFC patch we allow anybody to configure their kernel with the
flag CONFIG_KAISER to add our defense mechanism.
If there are any questions we would love to answer them.
We also appreciate any comments!
Cheers,
Daniel (+ the KAISER team from Graz University of Technology)
[1] http://www.ieee-security.org/TC/SP2013/papers/4977a191.pdf
[2] https://www.blackhat.com/docs/us-16/materials/us-16-Fogh-Using-Undocumented-CPU-Behaviour-To-See-Into-Kernel-Mode-And-Break-KASLR-In-The-Process.pdf
[3] https://www.blackhat.com/docs/us-16/materials/us-16-Jang-Breaking-Kernel-Address-Space-Layout-Randomization-KASLR-With-Intel-TSX.pdf
[4] https://github.com/IAIK/KAISER
[5] https://gruss.cc/files/kaiser.pdf
[patch based also on
https://raw.githubusercontent.com/IAIK/KAISER/master/KAISER/0001-KAISER-Kernel-Address-Isolation.patch]
Signed-off-by: Richard Fellner <richard.fellner@student.tugraz.at>
Signed-off-by: Moritz Lipp <moritz.lipp@iaik.tugraz.at>
Signed-off-by: Daniel Gruss <daniel.gruss@iaik.tugraz.at>
Signed-off-by: Michael Schwarz <michael.schwarz@iaik.tugraz.at>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpL3okACgkQONu9yGCS
aT6p5g/8CAG9NU/fLu7IMcIlyqfVvdOhzxn44oHCxq08eycqoggdnb3TZXxBUBgY
+w8uZk8yxNdjXR39GjkMSUy06WRvl2XDSrd36sDGRCBP62Fi8l5scmlRaNEnI/E8
ltBSB93P16SmnpKa/3Zscz+7LcaoXHpU5Xhs8Zmf4I69qmzOFX2qSKsUyzVT+gNI
ZoSN/mYuXf7+dzrcKhVdYzm4ZdMRvxdT0WefeoeZMekfAtU9D8zaFOA9jTIAMHSZ
adNn18s7UKmaipZf/01mW9srvZce4nPKiUC8WVGstiyl27ws+IDleKVmDnqFALjy
2LIxDvjDth/x8jfqTb7F6bFh6dVtMJjwUmd3KL7hgPuTddoQQe/GfKnjSHkbNxyR
qNxNtbOgQ2EVOf59fejxWshCP/fButNo8uvCI1ERdm4axGXcf9hiucdlwzCYezHs
UN0xrxAXprhqTq4hQFB9E4C49e8nMPNsyXTMZwSZRPe2z53spD53JR/0sl5Z2RWe
ueO21tBZ6ev9jPNi+lJrCVw1oBO+PKOmdNPAaSynUVm96grRnW6grUI3mX9FqMXb
r62UWG3YCWWBgxA3iQQrMxf/3S2YZXz59TBbp9GU8xOYJZLhKL29/iB7Rv4ANtkR
aMDrABjWqrCZpIazqkZ5uwbsNl6Q51e3Mji3EfwkBaMqjc41++I=
=B52+
-----END PGP SIGNATURE-----
Merge 4.4.109 into android-4.4
Changes in 4.4.109
ACPI: APEI / ERST: Fix missing error handling in erst_reader()
crypto: mcryptd - protect the per-CPU queue with a lock
mfd: cros ec: spi: Don't send first message too soon
mfd: twl4030-audio: Fix sibling-node lookup
mfd: twl6040: Fix child-node lookup
ALSA: rawmidi: Avoid racy info ioctl via ctl device
ALSA: usb-audio: Fix the missing ctl name suffix at parsing SU
PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
parisc: Hide Diva-built-in serial aux and graphics card
spi: xilinx: Detect stall with Unknown commands
KVM: X86: Fix load RFLAGS w/o the fixed bit
kvm: x86: fix RSM when PCID is non-zero
powerpc/perf: Dereference BHRB entries safely
net: mvneta: clear interface link status on port disable
tracing: Remove extra zeroing out of the ring buffer page
tracing: Fix possible double free on failure of allocating trace buffer
tracing: Fix crash when it fails to alloc ring buffer
ring-buffer: Mask out the info bits when returning buffer page length
iw_cxgb4: Only validate the MSN for successful completions
ASoC: fsl_ssi: AC'97 ops need regmap, clock and cleaning up on failure
ASoC: twl4030: fix child-node lookup
ALSA: hda: Drop useless WARN_ON()
ALSA: hda - fix headset mic detection issue on a Dell machine
x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
x86/mm: Remove flush_tlb() and flush_tlb_current_task()
x86/mm: Make flush_tlb_mm_range() more predictable
x86/mm: Reimplement flush_tlb_page() using flush_tlb_mm_range()
x86/mm: Remove the UP asm/tlbflush.h code, always use the (formerly) SMP code
x86/mm: Disable PCID on 32-bit kernels
x86/mm: Add the 'nopcid' boot option to turn off PCID
x86/mm: Enable CR4.PCIDE on supported systems
x86/mm/64: Fix reboot interaction with CR4.PCIDE
kbuild: add '-fno-stack-check' to kernel build options
ipv4: igmp: guard against silly MTU values
ipv6: mcast: better catch silly mtu values
net: igmp: Use correct source address on IGMPv3 reports
netlink: Add netns check on taps
net: qmi_wwan: add Sierra EM7565 1199:9091
net: reevalulate autoflowlabel setting after sysctl setting
tcp md5sig: Use skb's saddr when replying to an incoming segment
tg3: Fix rx hang on MTU change with 5717/5719
net: ipv4: fix for a race condition in raw_sendmsg
net: mvmdio: disable/unprepare clocks in EPROBE_DEFER case
sctp: Replace use of sockets_allocated with specified macro.
ipv4: Fix use-after-free when flushing FIB tables
net: bridge: fix early call to br_stp_change_bridge_id and plug newlink leaks
net: Fix double free and memory corruption in get_net_ns_by_id()
net: phy: micrel: ksz9031: reconfigure autoneg after phy autoneg workaround
sock: free skb in skb_complete_tx_timestamp on error
usbip: fix usbip bind writing random string after command in match_busid
usbip: stub: stop printing kernel pointer addresses in messages
usbip: vhci: stop printing kernel pointer addresses in messages
USB: serial: ftdi_sio: add id for Airbus DS P8GR
USB: serial: qcserial: add Sierra Wireless EM7565
USB: serial: option: add support for Telit ME910 PID 0x1101
USB: serial: option: adding support for YUGA CLM920-NC5
usb: Add device quirk for Logitech HD Pro Webcam C925e
usb: add RESET_RESUME for ELSA MicroLink 56K
USB: Fix off by one in type-specific length check of BOS SSP capability
usb: xhci: Add XHCI_TRUST_TX_LENGTH for Renesas uPD720201
nohz: Prevent a timer interrupt storm in tick_nohz_stop_sched_tick()
x86/smpboot: Remove stale TLB flush invocations
n_tty: fix EXTPROC vs ICANON interaction with TIOCINQ (aka FIONREAD)
mm/vmstat: Make NR_TLB_REMOTE_FLUSH_RECEIVED available even on UP
Linux 4.4.109
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 5d62c183f9e9df1deeea0906d099a94e8a43047a upstream.
The conditions in irq_exit() to invoke tick_nohz_irq_exit() which
subsequently invokes tick_nohz_stop_sched_tick() are:
if ((idle_cpu(cpu) && !need_resched()) || tick_nohz_full_cpu(cpu))
If need_resched() is not set, but a timer softirq is pending then this is
an indication that the softirq code punted and delegated the execution to
softirqd. need_resched() is not true because the current interrupted task
takes precedence over softirqd.
Invoking tick_nohz_irq_exit() in this case can cause an endless loop of
timer interrupts because the timer wheel contains an expired timer, but
softirqs are not yet executed. So it returns an immediate expiry request,
which causes the timer to fire immediately again. Lather, rinse and
repeat....
Prevent that by adding a check for a pending timer soft interrupt to the
conditions in tick_nohz_stop_sched_tick() which avoid calling
get_next_timer_interrupt(). That keeps the tick sched timer on the tick and
prevents a repetitive programming of an already expired timer.
Reported-by: Sebastian Siewior <bigeasy@linutronix.d>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Cc: Sebastian Siewior <bigeasy@linutronix.de>
Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1712272156050.2431@nanos
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia@spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia@spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6b7e633fe9c24682df550e5311f47fb524701586 upstream.
The ring_buffer_read_page() takes care of zeroing out any extra data in the
page that it returns. There's no need to zero it out again from the
consumer. It was removed from one consumer of this function, but
read_buffers_splice_read() did not remove it, and worse, it contained a
nasty bug because of it.
Fixes: 2711ca237a ("ring-buffer: Move zeroing out excess in page to ring buffer code")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlpA+4sACgkQONu9yGCS
aT4RphAAkoRI16GF7c2U1aLVGcYA5zezxnYEjtFXjoM7sLwmfEBeSTTR8OqCZGaa
hLvhrqCbF6qjgu0dKAKaLasnoUk+/ZEpxSE0JAlQ0ZdmCP9YH+Sd2PqjD48QGJ80
O2JktsYT3MjaNKFHNeSElf2ffk3mDzRpeeHEtSduJE3/Pqvoch2qV36qJWstOpLR
QKEQCKVP9Xa71V5Zu5tDAMFI0IQ09HRjBjyjlAxnJL+/wgYM/NTraZ1/3Ju8nGDU
Ohamr80SdIZ0+36Gl9mFTDQRl5yuh2SllqaD4Hq4P41HPIWaXuTuKQwA0LD7U8U6
N7I5byN7QVmOszERc0jzkQPd4aN1FWfDqAR5i6S+fDbianDiHvBNsybihayWqzay
MgX3VnzhFrnkh5UcijKOjsRHp/CkDIAfUseb5dFekAjuctWtNK7qkE+bGySZJ9ET
XO2b8xlRz/nYuWodsAXT6FhVaVd45ba+VR/nDwSzijf+7SY2Ub9/lwChePbI8VJj
vSiBURRtZhEgwXXJYH3JT4L+MSqfKA+1Jd+G7BqvZaSQ6S8RLvts/JZAX1nGs0JK
B2a84CD0lXd5RcMBRFXkfzCEZDA1hB/oLpmVVXxNROseSlSc/ExQ3xZ9UPXlWCnN
dB7XCV5GoV9Fqx5FX0lg6eEkNzUrkFV/My/FKaJtZR8U0TJB1nk=
=uslW
-----END PGP SIGNATURE-----
Merge 4.4.108 into android-4.4
Changes in 4.4.108
arm64: Initialise high_memory global variable earlier
cxl: Check if vphb exists before iterating over AFU devices
x86/mm: Add INVPCID helpers
x86/mm: Fix INVPCID asm constraint
x86/mm: Add a 'noinvpcid' boot option to turn off INVPCID
x86/mm: If INVPCID is available, use it to flush global mappings
mm/rmap: batched invalidations should use existing api
mm/mmu_context, sched/core: Fix mmu_context.h assumption
sched/core: Add switch_mm_irqs_off() and use it in the scheduler
x86/mm: Build arch/x86/mm/tlb.c even on !SMP
x86/mm, sched/core: Uninline switch_mm()
x86/mm, sched/core: Turn off IRQs in switch_mm()
ARM: Hide finish_arch_post_lock_switch() from modules
sched/core: Idle_task_exit() shouldn't use switch_mm_irqs_off()
x86/irq: Do not substract irq_tlb_count from irq_call_count
ALSA: hda - add support for docking station for HP 820 G2
ALSA: hda - add support for docking station for HP 840 G3
arm: kprobes: Fix the return address of multiple kretprobes
arm: kprobes: Align stack to 8-bytes in test code
cpuidle: Validate cpu_dev in cpuidle_add_sysfs()
r8152: fix the list rx_done may be used without initialization
crypto: deadlock between crypto_alg_sem/rtnl_mutex/genl_mutex
sch_dsmark: fix invalid skb_cow() usage
bna: integer overflow bug in debugfs
net: qmi_wwan: Add USB IDs for MDM6600 modem on Motorola Droid 4
usb: gadget: f_uvc: Sanity check wMaxPacketSize for SuperSpeed
usb: gadget: udc: remove pointer dereference after free
netfilter: nfnl_cthelper: fix runtime expectation policy updates
netfilter: nfnl_cthelper: Fix memory leak
inet: frag: release spinlock before calling icmp_send()
pinctrl: st: add irq_request/release_resources callbacks
scsi: lpfc: Fix PT2PT PRLI reject
KVM: x86: correct async page present tracepoint
KVM: VMX: Fix enable VPID conditions
ARM: dts: ti: fix PCI bus dtc warnings
hwmon: (asus_atk0110) fix uninitialized data access
HID: xinmo: fix for out of range for THT 2P arcade controller.
r8152: prevent the driver from transmitting packets with carrier off
s390/qeth: no ETH header for outbound AF_IUCV
bna: avoid writing uninitialized data into hw registers
net: Do not allow negative values for busy_read and busy_poll sysctl interfaces
i40e: Do not enable NAPI on q_vectors that have no rings
RDMA/iser: Fix possible mr leak on device removal event
irda: vlsi_ir: fix check for DMA mapping errors
netfilter: nfnl_cthelper: fix a race when walk the nf_ct_helper_hash table
netfilter: nf_nat_snmp: Fix panic when snmp_trap_helper fails to register
ARM: dts: am335x-evmsk: adjust mmc2 param to allow suspend
KVM: pci-assign: do not map smm memory slot pages in vt-d page tables
isdn: kcapi: avoid uninitialized data
xhci: plat: Register shutdown for xhci_plat
netfilter: nfnetlink_queue: fix secctx memory leak
ARM: dma-mapping: disallow dma_get_sgtable() for non-kernel managed memory
cpuidle: powernv: Pass correct drv->cpumask for registration
bnxt_en: Fix NULL pointer dereference in reopen failure path
backlight: pwm_bl: Fix overflow condition
crypto: crypto4xx - increase context and scatter ring buffer elements
rtc: pl031: make interrupt optional
net: phy: at803x: Change error to EINVAL for invalid MAC
PCI: Avoid bus reset if bridge itself is broken
scsi: cxgb4i: fix Tx skb leak
scsi: mpt3sas: Fix IO error occurs on pulling out a drive from RAID1 volume created on two SATA drive
PCI: Create SR-IOV virtfn/physfn links before attaching driver
igb: check memory allocation failure
ixgbe: fix use of uninitialized padding
PCI/AER: Report non-fatal errors only to the affected endpoint
scsi: lpfc: Fix secure firmware updates
scsi: lpfc: PLOGI failures during NPIV testing
fm10k: ensure we process SM mbx when processing VF mbx
tcp: fix under-evaluated ssthresh in TCP Vegas
rtc: set the alarm to the next expiring timer
cpuidle: fix broadcast control when broadcast can not be entered
thermal: hisilicon: Handle return value of clk_prepare_enable
MIPS: math-emu: Fix final emulation phase for certain instructions
Revert "Bluetooth: btusb: driver to enable the usb-wakeup feature"
ALSA: hda - Clear the leftover component assignment at snd_hdac_i915_exit()
ALSA: hda - Degrade i915 binding failure message
ALSA: hda - Fix yet another i915 pointer leftover in error path
alpha: fix build failures
Linux 4.4.108
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 252d2a4117bc181b287eeddf848863788da733ae upstream.
idle_task_exit() can be called with IRQs on x86 on and therefore
should use switch_mm(), not switch_mm_irqs_off().
This doesn't seem to cause any problems right now, but it will
confuse my upcoming TLB flush changes. Nonetheless, I think it
should be backported because it's trivial. There won't be any
meaningful performance impact because idle_task_exit() is only
used when offlining a CPU.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: f98db6013c55 ("sched/core: Add switch_mm_irqs_off() and use it in the scheduler")
Link: http://lkml.kernel.org/r/ca3d1a9fa93a0b49f5a8ff729eda3640fb6abdf9.1497034141.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f98db6013c557c216da5038d9c52045be55cd039 upstream.
By default, this is the same thing as switch_mm().
x86 will override it as an optimization.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reviewed-by: Borislav Petkov <bp@suse.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/df401df47bdd6be3e389c6f1e3f5310d70e81b2c.1461688545.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlo6J9YACgkQONu9yGCS
aT7Cdg/8D7+btjAJPjk/suKUBSOpfSkYIoakaVSGf7r7gFv7SkMF023SikLUK+vN
xa0FL1bYASzXuKgcxY9vB7ZkCDShrglqTCIpbWwHJhwS0fRGTGrMN2MM+opVgeoG
4ngnEPLue5TqZs3LVrpTySQFODlxnY3C4lpKopN7QNrcr1M5iiMELXCJu/qy6JhC
ZBsRGUY8GHbouqC0YSqNlrv+C7zbfAlaawIBDSmYm0R4F+TuqoKZlBGAJ9lbALcZ
pM8OaOXx9v471RhE7Tcsl3Eiz3vKHFKWxG/ZujkSqB21wPq6gd4VuP/wMuelX0GC
rDTb/nn9Zhmv7UCOn62htlRLrAnSaJ9FlEK+u3TJ+XBGE9gmanH9IjIljCehEZeI
55Vm7q6IwQT2WvgTzqUoco4AYI37T9pqJ++I1E3jY/zk+bfCIQ1ZMpMXmwAUx738
m7boO38eRnyXMxqf4hfVQ4BFPkwaxdW/I3LDanE6U85Hw2nI2uZIPHRbVrC6gAPS
aY9EUFEancxu4mW92mWWKrEnEWs5Jsb313ISAKSU75WwWmZ/tgsUtSzvY7XNPwYr
G//HbPA5zNFdM1zSO4HQiLYjOqAiwNNAbDEWKoYr8MQpgYbTB4/SgT15mJzw6uTo
WtKZsIWMYiXVW8cNhQiWAVUJVf66GMlc6kHcyHU7YHH3obZAYXU=
=DNsf
-----END PGP SIGNATURE-----
Merge 4.4.107 into android-4.4
Changes in 4.4.107
crypto: hmac - require that the underlying hash algorithm is unkeyed
crypto: salsa20 - fix blkcipher_walk API usage
autofs: fix careless error in recent commit
tracing: Allocate mask_str buffer dynamically
USB: uas and storage: Add US_FL_BROKEN_FUA for another JMicron JMS567 ID
USB: core: prevent malicious bNumInterfaces overflow
usbip: fix stub_send_ret_submit() vulnerability to null transfer_buffer
ceph: drop negative child dentries before try pruning inode's alias
Bluetooth: btusb: driver to enable the usb-wakeup feature
xhci: Don't add a virt_dev to the devs array before it's fully allocated
sched/rt: Do not pull from current CPU if only one CPU to pull
dmaengine: dmatest: move callback wait queue to thread context
ext4: fix fdatasync(2) after fallocate(2) operation
ext4: fix crash when a directory's i_size is too small
KEYS: add missing permission check for request_key() destination
mac80211: Fix addition of mesh configuration element
usb: phy: isp1301: Add OF device ID table
md-cluster: free md_cluster_info if node leave cluster
userfaultfd: shmem: __do_fault requires VM_FAULT_NOPAGE
userfaultfd: selftest: vm: allow to build in vm/ directory
net: initialize msg.msg_flags in recvfrom
net: bcmgenet: correct the RBUF_OVFL_CNT and RBUF_ERR_CNT MIB values
net: bcmgenet: correct MIB access of UniMAC RUNT counters
net: bcmgenet: reserved phy revisions must be checked first
net: bcmgenet: power down internal phy if open or resume fails
net: bcmgenet: Power up the internal PHY before probing the MII
NFSD: fix nfsd_minorversion(.., NFSD_AVAIL)
NFSD: fix nfsd_reset_versions for NFSv4.
Input: i8042 - add TUXEDO BU1406 (N24_25BU) to the nomux list
drm/omap: fix dmabuf mmap for dma_alloc'ed buffers
netfilter: bridge: honor frag_max_size when refragmenting
writeback: fix memory leak in wb_queue_work()
net: wimax/i2400m: fix NULL-deref at probe
dmaengine: Fix array index out of bounds warning in __get_unmap_pool()
net: Resend IGMP memberships upon peer notification.
mlxsw: reg: Fix SPVM max record count
mlxsw: reg: Fix SPVMLR max record count
intel_th: pci: Add Gemini Lake support
openrisc: fix issue handling 8 byte get_user calls
scsi: hpsa: update check for logical volume status
scsi: hpsa: limit outstanding rescans
fjes: Fix wrong netdevice feature flags
drm/radeon/si: add dpm quirk for Oland
sched/deadline: Make sure the replenishment timer fires in the next period
sched/deadline: Throttle a constrained deadline task activated after the deadline
sched/deadline: Use deadline instead of period when calculating overflow
mmc: mediatek: Fixed bug where clock frequency could be set wrong
drm/radeon: reinstate oland workaround for sclk
afs: Fix missing put_page()
afs: Populate group ID from vnode status
afs: Adjust mode bits processing
afs: Flush outstanding writes when an fd is closed
afs: Migrate vlocation fields to 64-bit
afs: Prevent callback expiry timer overflow
afs: Fix the maths in afs_fs_store_data()
afs: Populate and use client modification time
afs: Fix page leak in afs_write_begin()
afs: Fix afs_kill_pages()
net/mlx4_core: Avoid delays during VF driver device shutdown
perf symbols: Fix symbols__fixup_end heuristic for corner cases
efi/esrt: Cleanup bad memory map log messages
NFSv4.1 respect server's max size in CREATE_SESSION
btrfs: add missing memset while reading compressed inline extents
target: Use system workqueue for ALUA transitions
target: fix ALUA transition timeout handling
target: fix race during implicit transition work flushes
sfc: don't warn on successful change of MAC
fbdev: controlfb: Add missing modes to fix out of bounds access
video: udlfb: Fix read EDID timeout
video: fbdev: au1200fb: Release some resources if a memory allocation fails
video: fbdev: au1200fb: Return an error code if a memory allocation fails
rtc: pcf8563: fix output clock rate
dmaengine: ti-dma-crossbar: Correct am335x/am43xx mux value type
PCI/PME: Handle invalid data when reading Root Status
powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo
netfilter: ipvs: Fix inappropriate output of procfs
powerpc/opal: Fix EBUSY bug in acquiring tokens
powerpc/ipic: Fix status get and status clear
target/iscsi: Fix a race condition in iscsit_add_reject_from_cmd()
iscsi-target: fix memory leak in lio_target_tiqn_addtpg()
target:fix condition return in core_pr_dump_initiator_port()
target/file: Do not return error for UNMAP if length is zero
arm-ccn: perf: Prevent module unload while PMU is in use
crypto: tcrypt - fix buffer lengths in test_aead_speed()
mm: Handle 0 flags in _calc_vm_trans() macro
clk: mediatek: add the option for determining PLL source clock
clk: imx6: refine hdmi_isfr's parent to make HDMI work on i.MX6 SoCs w/o VPU
clk: tegra: Fix cclk_lp divisor register
ppp: Destroy the mutex when cleanup
thermal/drivers/step_wise: Fix temperature regulation misbehavior
GFS2: Take inode off order_write list when setting jdata flag
bcache: explicitly destroy mutex while exiting
bcache: fix wrong cache_misses statistics
l2tp: cleanup l2tp_tunnel_delete calls
xfs: fix log block underflow during recovery cycle verification
xfs: fix incorrect extent state in xfs_bmap_add_extent_unwritten_real
PCI: Detach driver before procfs & sysfs teardown on device remove
scsi: hpsa: cleanup sas_phy structures in sysfs when unloading
scsi: hpsa: destroy sas transport properties before scsi_host
powerpc/perf/hv-24x7: Fix incorrect comparison in memord
tty fix oops when rmmod 8250
usb: musb: da8xx: fix babble condition handling
pinctrl: adi2: Fix Kconfig build problem
raid5: Set R5_Expanded on parity devices as well as data.
scsi: scsi_devinfo: Add REPORTLUN2 to EMC SYMMETRIX blacklist entry
vt6655: Fix a possible sleep-in-atomic bug in vt6655_suspend
scsi: sd: change manage_start_stop to bool in sysfs interface
scsi: sd: change allow_restart to bool in sysfs interface
scsi: bfa: integer overflow in debugfs
udf: Avoid overflow when session starts at large offset
macvlan: Only deliver one copy of the frame to the macvlan interface
RDMA/cma: Avoid triggering undefined behavior
IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop
ath9k: fix tx99 potential info leak
Linux 4.4.107
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 2317d5f1c34913bac5971d93d69fb6c31bb74670 ]
I was testing Daniel's changes with his test case, and tweaked it a
little. Instead of having the runtime equal to the deadline, I
increased the deadline ten fold.
Daniel's test case had:
attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */
attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */
attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
To make it more interesting, I changed it to:
attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */
attr.sched_deadline = 20 * 1000 * 1000; /* 20 ms */
attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
The results were rather surprising. The behavior that Daniel's patch
was fixing came back. The task started using much more than .1% of the
CPU. More like 20%.
Looking into this I found that it was due to the dl_entity_overflow()
constantly returning true. That's because it uses the relative period
against relative runtime vs the absolute deadline against absolute
runtime.
runtime / (deadline - t) > dl_runtime / dl_period
There's even a comment mentioning this, and saying that when relative
deadline equals relative period, that the equation is the same as using
deadline instead of period. That comment is backwards! What we really
want is:
runtime / (deadline - t) > dl_runtime / dl_deadline
We care about if the runtime can make its deadline, not its period. And
then we can say "when the deadline equals the period, the equation is
the same as using dl_period instead of dl_deadline".
After correcting this, now when the task gets enqueued, it can throttle
correctly, and Daniel's fix to the throttling of sleeping deadline
tasks works even when the runtime and deadline are not the same.
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/02135a27f1ae3fe5fd032568a5a2f370e190e8d7.1488392936.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit df8eac8cafce7d086be3bd5cf5a838fa37594dfb ]
During the activation, CBS checks if it can reuse the current task's
runtime and period. If the deadline of the task is in the past, CBS
cannot use the runtime, and so it replenishes the task. This rule
works fine for implicit deadline tasks (deadline == period), and the
CBS was designed for implicit deadline tasks. However, a task with
constrained deadline (deadine < period) might be awakened after the
deadline, but before the next period. In this case, replenishing the
task would allow it to run for runtime / deadline. As in this case
deadline < period, CBS enables a task to run for more than the
runtime / period. In a very loaded system, this can cause a domino
effect, making other tasks miss their deadlines.
To avoid this problem, in the activation of a constrained deadline
task after the deadline but before the next period, throttle the
task and set the replenishing timer to the begin of the next period,
unless it is boosted.
Reproducer:
--------------- %< ---------------
int main (int argc, char **argv)
{
int ret;
int flags = 0;
unsigned long l = 0;
struct timespec ts;
struct sched_attr attr;
memset(&attr, 0, sizeof(attr));
attr.size = sizeof(attr);
attr.sched_policy = SCHED_DEADLINE;
attr.sched_runtime = 2 * 1000 * 1000; /* 2 ms */
attr.sched_deadline = 2 * 1000 * 1000; /* 2 ms */
attr.sched_period = 2 * 1000 * 1000 * 1000; /* 2 s */
ts.tv_sec = 0;
ts.tv_nsec = 2000 * 1000; /* 2 ms */
ret = sched_setattr(0, &attr, flags);
if (ret < 0) {
perror("sched_setattr");
exit(-1);
}
for(;;) {
/* XXX: you may need to adjust the loop */
for (l = 0; l < 150000; l++);
/*
* The ideia is to go to sleep right before the deadline
* and then wake up before the next period to receive
* a new replenishment.
*/
nanosleep(&ts, NULL);
}
exit(0);
}
--------------- >% ---------------
On my box, this reproducer uses almost 50% of the CPU time, which is
obviously wrong for a task with 2/2000 reservation.
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luca Abeni <luca.abeni@santannapisa.it>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Romulo Silva de Oliveira <romulo.deoliveira@ufsc.br>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tommaso Cucinotta <tommaso.cucinotta@sssup.it>
Link: http://lkml.kernel.org/r/edf58354e01db46bf42df8d2dd32418833f68c89.1488392936.git.bristot@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>