[ Upstream commit ee60ad219f5c7c4fb2f047f88037770063ef785f ]
The race occurs in __mkroute_output() when 2 threads lookup a dst:
CPU A CPU B
find_exception()
find_exception() [fnhe expires]
ip_del_fnhe() [fnhe is deleted]
rt_bind_exception()
In rt_bind_exception() it will bind a deleted fnhe with the new dst, and
this dst will get no chance to be freed. It causes a dev defcnt leak and
consecutive dmesg warnings:
unregister_netdevice: waiting for ethX to become free. Usage count = 1
Especially thanks Jon to identify the issue.
This patch fixes it by setting fnhe_daddr to 0 in ip_del_fnhe() to stop
binding the deleted fnhe with a new dst when checking fnhe's fnhe_daddr
and daddr in rt_bind_exception().
It works as both ip_del_fnhe() and rt_bind_exception() are protected by
fnhe_lock and the fhne is freed by kfree_rcu().
Fixes: deed49df7390 ("route: check and remove route cache when we get route")
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ae9819e339b451da7a86ab6fe38ecfcb6814e78a ]
Hardware has the CBS (Credit Based Shaper) which affects only Q3
and Q2. When updating the CBS settings, even if the driver does so
after waiting for Tx DMA finished, there is a possibility that frame
data still remains in TxFIFO.
To avoid this, decrease TxFIFO depth of Q3 and Q2 to one.
This patch has been exercised this using netperf TCP_MAERTS, TCP_STREAM
and UDP_STREAM tests run on an Ebisu board. No performance change was
detected, outside of noise in the tests, both in terms of throughput and
CPU utilisation.
Fixes: c156633f13 ("Renesas Ethernet AVB driver proper")
Signed-off-by: Masaru Nagai <masaru.nagai.vx@renesas.com>
Signed-off-by: Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
[simon: updated changelog]
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9417d81f4f8adfe20a12dd1fadf73a618cbd945d ]
sk_setup_caps() is called to set sk->sk_dst_cache in pptp_connect,
so we have to dst_release(sk->sk_dst_cache) in pptp_sock_destruct,
otherwise, the dst refcnt will leak.
It can be reproduced by this syz log:
r1 = socket$pptp(0x18, 0x1, 0x2)
bind$pptp(r1, &(0x7f0000000100)={0x18, 0x2, {0x0, @local}}, 0x1e)
connect$pptp(r1, &(0x7f0000000000)={0x18, 0x2, {0x3, @remote}}, 0x1e)
Consecutive dmesg warnings will occur:
unregister_netdevice: waiting for lo to become free. Usage count = 1
v1->v2:
- use rcu_dereference_protected() instead of rcu_dereference_check(),
as suggested by Eric.
Fixes: 00959ade36 ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4aa68e07d845562561f5e73c04aa521376e95252 upstream.
When checking for permission to view keys whilst reading from
/proc/keys, we should use the credentials with which the /proc/keys file
was opened. This is because, in a classic type of exploit, it can be
possible to bypass checks for the *current* credentials by passing the
file descriptor to a suid program.
Following commit 34dbbcdbf633 ("Make file credentials available to the
seqfile interfaces") we can finally fix it. So let's do it.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 644c7e48cb59cfc6988ddc7bf3d3b1ba5fe7fa9d upstream.
Baozeng Ding reported a KASAN stack out of bounds issue - it uncovered that
the TCP option parsing routines in netfilter TCP connection tracking could
read one byte out of the buffer of the TCP options. Therefore in the patch
we check that the available data length is large enough to parse both TCP
option code and size.
Reported-by: Baozeng Ding <sploving1@gmail.com>
Tested-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 017b1b6d28c479f1ad9a7a41f775545a3e1cba35 upstream.
nfacct_filter_alloc doesn't validate the NFACCT_FILTER_MASK and
NFACCT_FILTER_VALUE parameters which can trigger a NULL pointer
dereference. CAP_NET_ADMIN is required to trigger the bug.
Signed-off-by: Phil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b301f2538759933cf9ff1f7c4f968da72e3f0757 upstream.
Make sure the table names via getsockopt GET_ENTRIES is nul-terminated
in ebtables and all the x_tables variants and their respective compat
code. Uncovered by KASAN.
Reported-by: Baozeng Ding <sploving1@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 30c7be26fd3587abcb69587f781098e3ca2d565b upstream.
In commits 93821778de ("udp: Fix rcv socket locking") and
f7ad74fef3 ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into
__udpv6_queue_rcv_skb") UDP backlog handlers were renamed, but UDPlite
was forgotten.
This leads to crashes if UDPlite header is pulled twice, which happens
starting from commit e6afc8ace6dd ("udp: remove headers from UDP packets
before queueing")
Bug found by syzkaller team, thanks a lot guys !
Note that backlog use in UDP/UDPlite is scheduled to be removed starting
from linux-4.10, so this patch is only needed up to linux-4.9
Fixes: 93821778de ("udp: Fix rcv socket locking")
Fixes: f7ad74fef3 ("net/ipv6/udp: UDP encapsulation: break backlog_rcv into __udpv6_queue_rcv_skb")
Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 28928a3ce142b2e4e5a7a0f067cefb41a3d2c3f9 upstream.
In Odroid XU3 Lite board, the temperature levels reported for thermal
zone 0 were weird. In warm room:
/sys/class/thermal/thermal_zone0/temp:32000
/sys/class/thermal/thermal_zone1/temp:51000
/sys/class/thermal/thermal_zone2/temp:55000
/sys/class/thermal/thermal_zone3/temp:54000
/sys/class/thermal/thermal_zone4/temp:51000
Sometimes after booting the value was even equal to ambient temperature
which is highly unlikely to be a real temperature of sensor in SoC.
The thermal sensor's calibration (trimming) is based on fused values.
In case of the board above, the fused values are: 35, 52, 43, 58 and 43
(corresponding to each TMU device). However driver defined a minimum value
for fused data as 40 and for smaller values it was using a hard-coded 55
instead. This lead to mapping data from sensor to wrong temperatures
for thermal zone 0.
Various vendor 3.10 trees (Hardkernel's based on Samsung LSI, Artik 10)
do not impose any limits on fused values. Since we do not have any
knowledge about these limits, use 0 as a minimum accepted fused value.
This should essentially allow accepting any reasonable fused value thus
behaving like vendor driver.
The exynos5420-tmu-sensor-conf.dtsi is copied directly from existing
exynos4412 with one change - the samsung,tmu_min_efuse_value.
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Javier Martinez Canillas <javier@osg.samsung.com>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Reviewed-by: Anand Moon <linux.amoon@gmail.com>
Tested-by: Anand Moon <linux.amoon@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This reverts commit 7212e37cbd.
Hedi Berriche <hedi.berriche@hpe.com> notes:
> In 4.4-stable efi_runtime_lock as defined in drivers/firmware/efi/runtime-wrappers.c
> is a spinlock (given it predates commit dce48e351c0d) and commit
>
> f331e766c4be x86/platform/UV: Use efi_runtime_lock to serialise BIOS calls
>
> which 7212e37cbd is a backport of, needs it to be a semaphore.
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit a66352e005488ecb4b534ba1af58a9f671eba9b8 upstream.
Add minimal parameters needed by the Exynos CLKOUT driver to Exynos3250
PMU node. This fixes the following warning on boot:
exynos_clkout_init: failed to register clkout clock
Fixes: d19bb397e1 ("ARM: dts: exynos: Update PMU node with CLKOUT related data")
Cc: <stable@vger.kernel.org>
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 38d589f2fd08f1296aea3ce62bebd185125c6d81 upstream.
With the ultimate goal of keeping rt_mutex wait_list and futex_q waiters
consistent it's necessary to split 'rt_mutex_futex_lock()' into finer
parts, such that only the actual blocking can be done without hb->lock
held.
Split split_mutex_finish_proxy_lock() into two parts, one that does the
blocking and one that does remove_waiter() when the lock acquire failed.
When the rtmutex was acquired successfully the waiter can be removed in the
acquisiton path safely, since there is no concurrency on the lock owner.
This means that, except for futex_lock_pi(), all wait_list modifications
are done with both hb->lock and wait_lock held.
[bigeasy@linutronix.de: fix for futex_requeue_pi_signal_restart]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: juri.lelli@arm.com
Cc: bigeasy@linutronix.de
Cc: xlpang@redhat.com
Cc: rostedt@goodmis.org
Cc: mathieu.desnoyers@efficios.com
Cc: jdesfossez@efficios.com
Cc: dvhart@infradead.org
Cc: bristot@redhat.com
Link: http://lkml.kernel.org/r/20170322104152.001659630@infradead.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Zubin Mithra <zsm@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit df997abeebadaa4824271009e2d2b526a70a11cb upstream.
Add missing break statement in order to prevent the code from falling
through to case ISCSI_BOOT_TGT_NAME, which is unnecessary.
This bug was found thanks to the ongoing efforts to enable
-Wimplicit-fallthrough.
Fixes: b33a84a384 ("ibft: convert iscsi_ibft module to iscsi boot lib")
Cc: stable@vger.kernel.org
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e154ab69321ce2c54f19863d75c77b4e2dc9d365 upstream.
Lenovo s21e-20 uses ELAN0601 in its ACPI tables for the Elan touchpad.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 44fc95e218a09d7966a9d448941fdb003f6bb69f upstream.
Tablet initially begins communicating at 9600 baud, so this command
should be used to connect to the device:
$ inputattach --daemon --baud 9600 --wacom_iv /dev/ttyS0
https://github.com/linuxwacom/xf86-input-wacom/issues/40
Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com>
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2b424cfc69728224fcb5fad138ea7260728e0901 ]
Patch (b6c7a324df37b "MIPS: Fix get_frame_info() handling of
microMIPS function size.") introduces additional function size
check for microMIPS by only checking insn between ip and ip + func_size.
However, func_size in get_frame_info() is always 0 if KALLSYMS is not
enabled. This causes get_frame_info() to return immediately without
calculating correct frame_size, which in turn causes "Can't analyze
schedule() prologue" warning messages at boot time.
This patch removes func_size check, and let the frame_size check run
up to 128 insns for both MIPS and microMIPS.
Signed-off-by: Jun-Ru Chang <jrjang@realtek.com>
Signed-off-by: Tony Wu <tonywu@realtek.com>
Signed-off-by: Paul Burton <paul.burton@mips.com>
Fixes: b6c7a324df37b ("MIPS: Fix get_frame_info() handling of microMIPS function size.")
Cc: <ralf@linux-mips.org>
Cc: <jhogan@kernel.org>
Cc: <macro@mips.com>
Cc: <yamada.masahiro@socionext.com>
Cc: <peterz@infradead.org>
Cc: <mingo@kernel.org>
Cc: <linux-mips@vger.kernel.org>
Cc: <linux-kernel@vger.kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 59a17706915fe5ea6f711e1f92d4fb706bce07fe ]
When perf is built with the annobin plugin (RHEL8 build) extra symbols
are added to its binary:
# nm perf | grep annobin | head -10
0000000000241100 t .annobin_annotate.c
0000000000326490 t .annobin_annotate.c
0000000000249255 t .annobin_annotate.c_end
00000000003283a8 t .annobin_annotate.c_end
00000000001bce18 t .annobin_annotate.c_end.hot
00000000001bce18 t .annobin_annotate.c_end.hot
00000000001bc3e2 t .annobin_annotate.c_end.unlikely
00000000001bc400 t .annobin_annotate.c_end.unlikely
00000000001bce18 t .annobin_annotate.c.hot
00000000001bce18 t .annobin_annotate.c.hot
...
Those symbols have no use for report or annotation and should be
skipped. Moreover they interfere with the DWARF unwind test on the PPC
arch, where they are mixed with checked symbols and then the test fails:
# perf test dwarf -v
59: Test dwarf unwind :
--- start ---
test child forked, pid 8515
unwind: .annobin_dwarf_unwind.c:ip = 0x10dba40dc (0x2740dc)
...
got: .annobin_dwarf_unwind.c 0x10dba40dc, expecting test__arch_unwind_sample
unwind: failed with 'no error'
The annobin symbols are defined as NOTYPE/LOCAL/HIDDEN:
# readelf -s ./perf | grep annobin | head -1
40: 00000000001bce4f 0 NOTYPE LOCAL HIDDEN 13 .annobin_init.c
They can still pass the check for the label symbol. Adding check for
HIDDEN and INTERNAL (as suggested by Nick below) visibility and filter
out such symbols.
> Just to be awkward, if you are going to ignore STV_HIDDEN
> symbols then you should probably also ignore STV_INTERNAL ones
> as well... Annobin does not generate them, but you never know,
> one day some other tool might create some.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Clifton <nickc@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20190128133526.GD15461@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit afa0c5904ba16d59b0454f7ee4c807dae350f432 ]
The error path in qeth_alloc_qdio_buffers() that takes care of
cleaning up the Output Queues is buggy. It first frees the queue, but
then calls qeth_clear_outq_buffers() with that very queue struct.
Make the call to qeth_clear_outq_buffers() part of the free action
(in the correct order), and while at it fix the naming of the helper.
Fixes: 0da9581ddb ("qeth: exploit asynchronous delivery of storage blocks")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit dc3f595b6617ebc0307e0ce151e8f2f2b2489b95 ]
atchan->status variable is used to store two different information:
- pass channel interrupts status from interrupt handler to tasklet;
- channel information like whether it is cyclic or paused;
This causes a bug when device_terminate_all() is called,
(AT_XDMAC_CHAN_IS_CYCLIC cleared on atchan->status) and then a late End
of Block interrupt arrives (AT_XDMAC_CIS_BIS), which sets bit 0 of
atchan->status. Bit 0 is also used for AT_XDMAC_CHAN_IS_CYCLIC, so when
a new descriptor for a cyclic transfer is created, the driver reports
the channel as in use:
if (test_and_set_bit(AT_XDMAC_CHAN_IS_CYCLIC, &atchan->status)) {
dev_err(chan2dev(chan), "channel currently used\n");
return NULL;
}
This patch fixes the bug by adding a different struct member to keep
the interrupts status separated from the channel status bits.
Fixes: e1f7c9eee7 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver")
Signed-off-by: Codrin Ciubotariu <codrin.ciubotariu@microchip.com>
Acked-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Signed-off-by: Vinod Koul <vkoul@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2380a22b60ce6f995eac806e69c66e397b59d045 ]
Resetting bit 4 disables the interrupt delivery to the "secure
processor" core. This breaks the keyboard on a OLPC XO 1.75 laptop,
where the firmware running on the "secure processor" bit-bangs the
PS/2 protocol over the GPIO lines.
It is not clear what the rest of the bits are and Marvell was unhelpful
when asked for documentation. Aside from the SP bit, there are probably
priority bits.
Leaving the unknown bits as the firmware set them up seems to be a wiser
course of action compared to just turning them off.
Signed-off-by: Lubomir Rintel <lkundrak@v3.sk>
Acked-by: Pavel Machek <pavel@ucw.cz>
[maz: fixed-up subject and commit message]
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ba16adeb346387eb2d1ada69003588be96f098fa ]
devm_ allocated data will be automatically freed. The free
of devm_ allocated data is invalid.
Fixes: 1c459de1e6 ("ARM: pxa: ssp: use devm_ functions")
Signed-off-by: Peng Hao <peng.hao2@zte.com.cn>
[title's prefix changed]
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f585b283e3f025754c45bbe7533fc6e5c4643700 ]
In autofs_fill_super() on error of get inode/make root dentry the return
should be ENOMEM as this is the only failure case of the called
functions.
Link: http://lkml.kernel.org/r/154725123240.11260.796773942606871359.stgit@pluto-themaw-net
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 63ce5f552beb9bdb41546b3a26c4374758b21815 ]
autofs_expire_run() calls dput(dentry) to drop the reference count of
dentry. However, dentry is read via autofs_dentry_ino(dentry) after
that. This may result in a use-free-bug. The patch drops the reference
count of dentry only when it is never used.
Link: http://lkml.kernel.org/r/154725122396.11260.16053424107144453867.stgit@pluto-themaw-net
Signed-off-by: Pan Bian <bianpan2016@163.com>
Signed-off-by: Ian Kent <raven@themaw.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c27d82f52f75fc9d8d9d40d120d2a96fdeeada5e ]
When superblock has lots of inodes without any pagecache (like is the
case for /proc), drop_pagecache_sb() will iterate through all of them
without dropping sb->s_inode_list_lock which can lead to softlockups
(one of our customers hit this).
Fix the problem by going to the slow path and doing cond_resched() in
case the process needs rescheduling.
Link: http://lkml.kernel.org/r/20190114085343.15011-1-jack@suse.cz
Signed-off-by: Jan Kara <jack@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 24feb47c5fa5b825efb0151f28906dfdad027e61 ]
If memory end is not aligned with the sparse memory section boundary,
the mapping of such a section is only partly initialized. This may lead
to VM_BUG_ON due to uninitialized struct pages access from
test_pages_in_a_zone() function triggered by memory_hotplug sysfs
handlers.
Here are the the panic examples:
CONFIG_DEBUG_VM_PGFLAGS=y
kernel parameter mem=2050M
--------------------------
page:000003d082008000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
test_pages_in_a_zone+0xde/0x160
show_valid_zones+0x5c/0x190
dev_attr_show+0x34/0x70
sysfs_kf_seq_show+0xc8/0x148
seq_read+0x204/0x480
__vfs_read+0x32/0x178
vfs_read+0x82/0x138
ksys_read+0x5a/0xb0
system_call+0xdc/0x2d8
Last Breaking-Event-Address:
test_pages_in_a_zone+0xde/0x160
Kernel panic - not syncing: Fatal exception: panic_on_oops
Fix this by checking whether the pfn to check is within the zone.
[mhocko@suse.com: separated this change from http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Link: http://lkml.kernel.org/r/20190128144506.15603-3-mhocko@kernel.org
[mhocko@suse.com: separated this change from
http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com]
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit efad4e475c312456edb3c789d0996d12ed744c13 ]
Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.
Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
[1]. I have pushed back on those fixes because I believed that it is
much better to plug the problem at the initialization time rather than
play whack-a-mole all over the hotplug code and find all the places
which expect the full memory section to be initialized.
We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
initialize struct pages for the full memory section") merged and cause a
regression [2][3]. The reason is that there might be memory layouts
when two NUMA nodes share the same memory section so the merged fix is
simply incorrect.
In order to plug this hole we really have to be zone range aware in
those handlers. I have split up the original patch into two. One is
unchanged (patch 2) and I took a different approach for `removable'
crash.
[1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslonko@linux.ibm.com
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
[3] http://lkml.kernel.org/r/20190125163938.GA20411@dhcp22.suse.cz
This patch (of 2):
Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
removable state of a memory block:
page:000003d08300c000 is uninitialized and poisoned
page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
Call Trace:
is_mem_section_removable+0xb4/0x190
show_mem_removable+0x9a/0xd8
dev_attr_show+0x34/0x70
sysfs_kf_seq_show+0xc8/0x148
seq_read+0x204/0x480
__vfs_read+0x32/0x178
vfs_read+0x82/0x138
ksys_read+0x5a/0xb0
system_call+0xdc/0x2d8
Last Breaking-Event-Address:
is_mem_section_removable+0xb4/0x190
Kernel panic - not syncing: Fatal exception: panic_on_oops
The reason is that the memory block spans the zone boundary and we are
stumbling over an unitialized struct page. Fix this by enforcing zone
range in is_mem_section_removable so that we never run away from a zone.
Link: http://lkml.kernel.org/r/20190128144506.15603-2-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Reported-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Debugged-by: Mikhail Zaslonko <zaslonko@linux.ibm.com>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a8e911d13540487942d53137c156bd7707f66e5d ]
If the kernel is configured with KASAN_EXTRA, the stack size is
increasted significantly because this option sets "-fstack-reuse" to
"none" in GCC [1]. As a result, it triggers stack overrun quite often
with 32k stack size compiled using GCC 8. For example, this reproducer
https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/madvise/madvise06.c
triggers a "corrupted stack end detected inside scheduler" very reliably
with CONFIG_SCHED_STACK_END_CHECK enabled.
There are just too many functions that could have a large stack with
KASAN_EXTRA due to large local variables that have been called over and
over again without being able to reuse the stacks. Some noticiable ones
are
size
7648 shrink_page_list
3584 xfs_rmap_convert
3312 migrate_page_move_mapping
3312 dev_ethtool
3200 migrate_misplaced_transhuge_page
3168 copy_process
There are other 49 functions are over 2k in size while compiling kernel
with "-Wframe-larger-than=" even with a related minimal config on this
machine. Hence, it is too much work to change Makefiles for each object
to compile without "-fsanitize-address-use-after-scope" individually.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715#c23
Although there is a patch in GCC 9 to help the situation, GCC 9 probably
won't be released in a few months and then it probably take another
6-month to 1-year for all major distros to include it as a default.
Hence, the stack usage with KASAN_EXTRA can be revisited again in 2020
when GCC 9 is everywhere. Until then, this patch will help users avoid
stack overrun.
This has already been fixed for arm64 for the same reason via
6e8830674ea ("arm64: kasan: Increase stack size for KASAN_EXTRA").
Link: http://lkml.kernel.org/r/20190109215209.2903-1-cai@lca.pw
Signed-off-by: Qian Cai <cai@lca.pw>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 58d15ed1203f4d858c339ea4d7dafa94bd2a56d3 ]
The size of the fixed part of the create response is 88 bytes not 56.
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0ee4b5f801b73b83a9fb3921d725f2162fd4a2e5 ]
Add BACKLIGHT_LCD_SUPPORT for SAMSUNG_Q10 to fix the
warning: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE.
SAMSUNG_Q10 selects BACKLIGHT_CLASS_DEVICE but BACKLIGHT_CLASS_DEVICE
depends on BACKLIGHT_LCD_SUPPORT.
Copy BACKLIGHT_LCD_SUPPORT dependency into SAMSUNG_Q10 to fix:
WARNING: unmet direct dependencies detected for BACKLIGHT_CLASS_DEVICE
Depends on [n]: HAS_IOMEM [=y] && BACKLIGHT_LCD_SUPPORT [=n]
Selected by [y]:
- SAMSUNG_Q10 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && ACPI [=y]
Signed-off-by: Sinan Kaya <okaya@kernel.org>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5d8fc4a9f0eec20b6c07895022a6bea3fb6dfb38 ]
The issue to be fixed in this commit is when libfc found it received a
invalid FLOGI response from FC switch, it would return without freeing the
fc frame, which is just the skb data. This would cause memory leak if FC
switch keeps sending invalid FLOGI responses.
This fix is just to make it execute `fc_frame_free(fp)` before returning
from function `fc_lport_flogi_resp`.
Signed-off-by: Ming Lu <ming.lu@citrix.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 80ff00172407e0aad4b10b94ef0816fc3e7813cb ]
There is a NULL pointer dereference of dev_name in nfs_parse_devname()
The oops looks something like:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
...
RIP: 0010:nfs_fs_mount+0x3b6/0xc20 [nfs]
...
Call Trace:
? ida_alloc_range+0x34b/0x3d0
? nfs_clone_super+0x80/0x80 [nfs]
? nfs_free_parsed_mount_data+0x60/0x60 [nfs]
mount_fs+0x52/0x170
? __init_waitqueue_head+0x3b/0x50
vfs_kern_mount+0x6b/0x170
do_mount+0x216/0xdc0
ksys_mount+0x83/0xd0
__x64_sys_mount+0x25/0x30
do_syscall_64+0x65/0x220
entry_SYSCALL_64_after_hwframe+0x49/0xbe
Fix this by adding a NULL check on dev_name
Signed-off-by: Yao Liu <yotta.liu@ucloud.cn>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ae710f9f8b2cf95297e7bbfe1c09789a7dc43d4 ]
On SoC reset all GPIO interrupts are disable. However, if kexec is
used to boot into a new kernel, the SoC does not experience a
reset. Hence GPIO interrupts can be left enabled from the previous
kernel. It is then possible for the interrupt to fire before an
interrupt handler is registered, resulting in the kernel complaining
of an "unexpected IRQ trap", the interrupt is never cleared, and so
fires again, resulting in an interrupt storm.
Disable all GPIO interrupts before registering the GPIO IRQ chip.
Fixes: 7f2691a196 ("gpio: vf610: add gpiolib/IRQ chip driver for Vybrid")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Acked-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit c69c29a1a0a8f68cd87e98ba4a5a79fb8ef2a58c ]
If phy_power_on() fails in rk_gmac_powerup(), clocks are left enabled.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cec8abba13e6a26729dfed41019720068eeeff2b ]
When reading phy registers via Clause 45 MDIO protocol, after write
address operation, the driver use another write address operation, so
can not read the right value of any phy registers. This patch fixes it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6571ebce112a21ec9be68ef2f53b96fcd41fd81b ]
If fill_level was not zero and status was not BUSY,
result of "tx_prod - tx_cons - inuse" might be zero.
Subtracting 1 unconditionally results invalid negative return value
on this case.
Make sure not to return an negative value.
Signed-off-by: Tomonori Sakita <tomonori.sakita@sord.co.jp>
Signed-off-by: Atsushi Nemoto <atsushi.nemoto@sord.co.jp>
Reviewed-by: Dalon L Westergreen <dalon.westergreen@linux.intel.com>
Acked-by: Thor Thayer <thor.thayer@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 25384ce5f9530def39421597b1457d9462df6455 ]
This fixes the following warning at boot when the kernel is booted on a
board with more CPU cores than was configured in NR_CPUS:
smp_init_cpus: Core Count = 8
smp_init_cpus: Core Id = 0
------------[ cut here ]------------
WARNING: CPU: 0 PID: 0 at include/linux/cpumask.h:121 smp_init_cpus+0x54/0x74
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.0.0-rc3-00015-g1459333f88a0 #124
Call Trace:
__warn$part$3+0x6a/0x7c
warn_slowpath_null+0x35/0x3c
smp_init_cpus+0x54/0x74
setup_arch+0x1c0/0x1d0
start_kernel+0x44/0x310
_startup+0x107/0x107
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b1c42cdd7181200dc1fff39dcb6ac1a3fac2c25 ]
Otherwise it is impossible to enable CPUs after booting with 'maxcpus'
parameter.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 306b38305c0f86de7f17c5b091a95451dcc93d7d ]
Secondary CPU reset vector overlaps part of the double exception handler
code, resulting in weird crashes and hangups when running user code.
Move exception vectors one page up so that they don't clash with the
secondary CPU reset vector.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32a7726c4f4aadfabdb82440d84f88a5a2c8fe13 ]
- add missing memory barriers to the secondary CPU synchronization spin
loops; add comment to the matching memory barrier in the boot_secondary
and __cpu_die functions;
- use READ_ONCE/WRITE_ONCE to access cpu_start_id/cpu_start_ccount
instead of reading/writing them directly;
- re-initialize cpu_running every time before starting secondary CPU to
flush possible previous CPU startup results.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4fe8713b873fc881284722ce4ac47995de7cf62c ]
ccount_timer_shutdown is called from the atomic context in the
secondary_start_kernel, resulting in the following BUG:
BUG: sleeping function called from invalid context
in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
Preemption disabled at:
secondary_start_kernel+0xa1/0x130
Call Trace:
___might_sleep+0xe7/0xfc
__might_sleep+0x41/0x44
synchronize_irq+0x24/0x64
disable_irq+0x11/0x14
ccount_timer_shutdown+0x12/0x20
clockevents_switch_state+0x82/0xb4
clockevents_exchange_device+0x54/0x60
tick_check_new_device+0x46/0x70
clockevents_register_device+0x8c/0xc8
clockevents_config_and_register+0x1d/0x2c
local_timer_setup+0x75/0x7c
secondary_start_kernel+0xb4/0x130
should_never_return+0x32/0x35
Use disable_irq_nosync instead of disable_irq to avoid it.
This is safe because the ccount timer IRQ is per-CPU, and once IRQ is
masked the ISR will not be called.
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9825bd94e3a2baae1f4874767ae3a7d4c049720e ]
When a VM is terminated, the VFIO driver detaches all pass-through
devices from VFIO domain by clearing domain id and page table root
pointer from each device table entry (DTE), and then invalidates
the DTE. Then, the VFIO driver unmap pages and invalidate IOMMU pages.
Currently, the IOMMU driver keeps track of which IOMMU and how many
devices are attached to the domain. When invalidate IOMMU pages,
the driver checks if the IOMMU is still attached to the domain before
issuing the invalidate page command.
However, since VFIO has already detached all devices from the domain,
the subsequent INVALIDATE_IOMMU_PAGES commands are being skipped as
there is no IOMMU attached to the domain. This results in data
corruption and could cause the PCI device to end up in indeterministic
state.
Fix this by invalidate IOMMU pages when detach a device, and
before decrementing the per-domain device reference counts.
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Suggested-by: Joerg Roedel <joro@8bytes.org>
Co-developed-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Brijesh Singh <brijesh.singh@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Fixes: 6de8ad9b9e ('x86/amd-iommu: Make iommu_flush_pages aware of multiple IOMMUs')
Signed-off-by: Joerg Roedel <jroedel@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 53ab60baa1ac4f20b080a22c13b77b6373922fd7 ]
There is a UBSAN bug report as below:
UBSAN: Undefined behaviour in net/netfilter/ipvs/ip_vs_ctl.c:2227:21
signed integer overflow:
-2147483647 * 1000 cannot be represented in type 'int'
Reproduce program:
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#define IPPROTO_IP 0
#define IPPROTO_RAW 255
#define IP_VS_BASE_CTL (64+1024+64)
#define IP_VS_SO_SET_TIMEOUT (IP_VS_BASE_CTL+10)
/* The argument to IP_VS_SO_GET_TIMEOUT */
struct ipvs_timeout_t {
int tcp_timeout;
int tcp_fin_timeout;
int udp_timeout;
};
int main() {
int ret = -1;
int sockfd = -1;
struct ipvs_timeout_t to;
sockfd = socket(AF_INET, SOCK_RAW, IPPROTO_RAW);
if (sockfd == -1) {
printf("socket init error\n");
return -1;
}
to.tcp_timeout = -2147483647;
to.tcp_fin_timeout = -2147483647;
to.udp_timeout = -2147483647;
ret = setsockopt(sockfd,
IPPROTO_IP,
IP_VS_SO_SET_TIMEOUT,
(char *)(&to),
sizeof(to));
printf("setsockopt return %d\n", ret);
return ret;
}
Return -EINVAL if the timeout value is negative or max than 'INT_MAX / HZ'.
Signed-off-by: ZhangXiaoxu <zhangxiaoxu5@huawei.com>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>