[ Upstream commit f7e43672683b097bb074a8fe7af9bc600a23f231 ]
syzbot reported we still access llc->sap in llc_backlog_rcv()
after it is freed in llc_sap_remove_socket():
Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x1b9/0x294 lib/dump_stack.c:113
print_address_description+0x6c/0x20b mm/kasan/report.c:256
kasan_report_error mm/kasan/report.c:354 [inline]
kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412
__asan_report_load1_noabort+0x14/0x20 mm/kasan/report.c:430
llc_conn_ac_send_sabme_cmd_p_set_x+0x3a8/0x460 net/llc/llc_c_ac.c:785
llc_exec_conn_trans_actions net/llc/llc_conn.c:475 [inline]
llc_conn_service net/llc/llc_conn.c:400 [inline]
llc_conn_state_process+0x4e1/0x13a0 net/llc/llc_conn.c:75
llc_backlog_rcv+0x195/0x1e0 net/llc/llc_conn.c:891
sk_backlog_rcv include/net/sock.h:909 [inline]
__release_sock+0x12f/0x3a0 net/core/sock.c:2335
release_sock+0xa4/0x2b0 net/core/sock.c:2850
llc_ui_release+0xc8/0x220 net/llc/af_llc.c:204
llc->sap is refcount'ed and llc_sap_remove_socket() is paired
with llc_sap_add_socket(). This can be amended by holding its refcount
before llc_sap_remove_socket() and releasing it after release_sock().
Reported-by: <syzbot+6e181fc95081c2cf9051@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 1071ec9d453a38023579714b64a951a2fb982071 ]
pf->cmp_addr() is called before binding a v6 address to the sock. It
should not check ports, like in sctp_inet_cmp_addr.
But sctp_inet6_cmp_addr checks the addr by invoking af(6)->cmp_addr,
sctp_v6_cmp_addr where it also compares the ports.
This would cause that setsockopt(SCTP_SOCKOPT_BINDX_ADD) could bind
multiple duplicated IPv6 addresses after Commit 40b4f0fd74e4 ("sctp:
lack the check for ports in sctp_v6_cmp_addr").
This patch is to remove af->cmp_addr called in sctp_inet6_cmp_addr,
but do the proper check for both v6 addrs and v4mapped addrs.
v1->v2:
- define __sctp_v6_cmp_addr to do the common address comparison
used for both pf and af v6 cmp_addr.
Fixes: 40b4f0fd74e4 ("sctp: lack the check for ports in sctp_v6_cmp_addr")
Reported-by: Jianwen Ji <jiji@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a49e2f5d5fb141884452ddb428f551b123d436b5 ]
We must validate sockaddr_len, otherwise userspace can pass fewer data
than we expect and we end up accessing invalid data.
Fixes: 224cf5ad14 ("ppp: Move the PPP drivers")
Reported-by: syzbot+4f03bdf92fdf9ef5ddab@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a6361f0ca4b25460f2cdf3235ebe8115f622901e ]
Updates to the bitfields in struct packet_sock are not atomic.
Serialize these read-modify-write cycles.
Move po->running into a separate variable. Its writes are protected by
po->bind_lock (except for one startup case at packet_create). Also
replace a textual precondition warning with lockdep annotation.
All others are set only in packet_setsockopt. Serialize these
updates by holding the socket lock. Analogous to other field updates,
also hold the lock when testing whether a ring is active (pg_vec).
Fixes: 8dc4194474 ("[PACKET]: Add optional checksum computation for recvmsg")
Reported-by: DaeRyong Jeong <threeearcat@gmail.com>
Reported-by: Byoungyoung Lee <byoungyoung@purdue.edu>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9cf2f437ca5b39828984064fad213e68fc17ef11 ]
The same fix in Commit dbe173079a ("bridge: fix netconsole
setup over bridge") is also needed for team driver.
While at it, remove the unnecessary parameter *team from
team_port_enable_netpoll().
v1->v2:
- fix it in a better way, as does bridge.
Fixes: 0fb52a27a0 ("team: cleanup netpoll clode")
Reported-by: João Avelino Bellomo Filho <jbellomo@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b905ef9ab90115d001c1658259af4b1c65088779 ]
The connection timers of an llc sock could be still flying
after we delete them in llc_sk_free(), and even possibly
after we free the sock. We could just wait synchronously
here in case of troubles.
Note, I leave other call paths as they are, since they may
not have to wait, at least we can change them to synchronously
when needed.
Also, move the code to net/llc/llc_conn.c, which is apparently
a better place.
Reported-by: <syzbot+f922284c18ea23a8e457@syzkaller.appspotmail.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit eb1c28c05894a4b1f6b56c5bf072205e64cfa280 ]
Check sockaddr_len before dereferencing sp->sa_protocol, to ensure that
it actually points to valid data.
Fixes: fd558d186d ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Reported-by: syzbot+a70ac890b23b1bf29f5c@syzkaller.appspotmail.com
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 9c438d7a3a52dcc2b9ed095cb87d3a5e83cf7e60 ]
Adding a dns_resolver key whose payload contains a very long option name
resulted in that string being printed in full. This hit the WARN_ONCE()
in set_precision() during the printk(), because printk() only supports a
precision of up to 32767 bytes:
precision 1000000 too large
WARNING: CPU: 0 PID: 752 at lib/vsprintf.c:2189 vsnprintf+0x4bc/0x5b0
Fix it by limiting option strings (combined name + value) to a much more
reasonable 128 bytes. The exact limit is arbitrary, but currently the
only recognized option is formatted as "dnserror=%lu" which fits well
within this limit.
Also ratelimit the printks.
Reproducer:
perl -e 'print "#", "A" x 1000000, "\x00"' | keyctl padd dns_resolver desc @s
This bug was found using syzkaller.
Reported-by: Mark Rutland <mark.rutland@arm.com>
Fixes: 4a2d789267 ("DNS: If the DNS server returns an error, allow that to be cached [ver #2]")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit ddea788c63094f7c483783265563dd5b50052e28 ]
After Commit 8a8efa22f5 ("bonding: sync netpoll code with bridge"), it
would set slave_dev npinfo in slave_enable_netpoll when enslaving a dev
if bond->dev->npinfo was set.
However now slave_dev npinfo is set with bond->dev->npinfo before calling
slave_enable_netpoll. With slave_dev npinfo set, __netpoll_setup called
in slave_enable_netpoll will not call slave dev's .ndo_netpoll_setup().
It causes that the lower dev of this slave dev can't set its npinfo.
One way to reproduce it:
# modprobe bonding
# brctl addbr br0
# brctl addif br0 eth1
# ifconfig bond0 192.168.122.1/24 up
# ifenslave bond0 eth2
# systemctl restart netconsole
# ifenslave bond0 br0
# ifconfig eth2 down
# systemctl restart netconsole
The netpoll won't really work.
This patch is to remove that slave_dev npinfo setting in bond_enslave().
Fixes: 8a8efa22f5 ("bonding: sync netpoll code with bridge")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6cf09958f32b9667bb3ebadf74367c791112771b ]
The main linker script vmlinux.lds.S for the kernel image merges
the expoline code patch tables into two section ".nospec_call_table"
and ".nospec_return_table". This is *not* done for the modules,
there the sections retain their original names as generated by gcc:
".s390_indirect_call", ".s390_return_mem" and ".s390_return_reg".
The module_finalize code has to check for the compiler generated
section names, otherwise no code patching is done. This slows down
the module code in case of "spectre_v2=off".
Cc: stable@vger.kernel.org # 4.16
Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6a3d1e81a434fc311f224b8be77258bafc18ccc6 ]
With CONFIG_EXPOLINE_AUTO=y the call of spectre_v2_auto_early() via
early_initcall is done *after* the early_param functions. This
overwrites any settings done with the nobp/no_spectre_v2/spectre_v2
parameters. The code patching for the kernel is done after the
evaluation of the early parameters but before the early_initcall
is done. The end result is a kernel image that is patched correctly
but the kernel modules are not.
Make sure that the nospec auto detection function is called before the
early parameters are evaluated and before the code patching is done.
Fixes: 6e179d64126b ("s390: add automatic detection of the spectre defense")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d424986f1d6b16079b3231db0314923f4f8deed1 ]
Set CONFIG_GENERIC_CPU_VULNERABILITIES and provide the two functions
cpu_show_spectre_v1 and cpu_show_spectre_v2 to report the spectre
mitigations.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit bc035599718412cfba9249aa713f90ef13f13ee9 ]
Add a boot message if either of the spectre defenses is active.
The message is
"Spectre V2 mitigation: execute trampolines."
or "Spectre V2 mitigation: limited branch prediction."
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6e179d64126b909f0b288fa63cdbf07c531e9b1d ]
Automatically decide between nobp vs. expolines if the spectre_v2=auto
kernel parameter is specified or CONFIG_EXPOLINE_AUTO=y is set.
The decision made at boot time due to CONFIG_EXPOLINE_AUTO=y being set
can be overruled with the nobp, nospec and spectre_v2 kernel parameters.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b2e2f43a01bace1a25bdbae04c9f9846882b727a ]
Keep the code for the nobp parameter handling with the code for
expolines. Both are related to the spectre v2 mitigation.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d3f468963cd6fd6d2aa5e26aed8b24232096d0e1 ]
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:
.Lcleanup_system_call:
[...]
0: # update accounting time stamp
mvc __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
# set up saved register r11
lg %r15,__LC_KERNEL_STACK
la %r9,STACK_FRAME_OVERHEAD(%r15)
stg %r9,24(%r11) # r11 pt_regs pointer
# fill pt_regs
mvc __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
---> stmg %r0,%r7,__PT_R0(%r9)
The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.
Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d5feec04fe578c8dbd9e2e1439afc2f0af761ed4 ]
The system call path can be interrupted before the switch back to the
standard branch prediction with BPENTER has been done. The critical
section cleanup code skips forward to .Lsysc_do_svc and bypasses the
BPENTER. In this case the kernel and all subsequent code will run with
the limited branch prediction.
Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f19fbd5ed642dc31c809596412dab1ed56f2f156 ]
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.
With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call
basr %r14,%r1
is replaced with a PC relative call to a new thunk
brasl %r14,__s390x_indirect_jump_r1
The thunk contains the EXRL/EX instruction to the indirect branch
__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1
The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6b73044b2b0081ee3dd1cd6eaab7dee552601efb ]
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.
To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit d768bd892fc8f066cd3aa000eb1867bcf32db0ee ]
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.
Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit cf1489984641369611556bf00c48f945c77bcf02 ]
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e2dd833389cc4069a96b57bdd24227b5f52288f5 ]
Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 7041d28115e91f2144f811ffe8a195c696b1e1d0 ]
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 35b3fde6203b932b2b1a5b53b3d8808abc9c4f60 ]
The new firmware interfaces for branch prediction behaviour changes
are transparently available for the guest. Nevertheless, there is
new state attached that should be migrated and properly resetted.
Provide a mechanism for handling reset, migration and VSIE.
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
[Changed capability number to 152. - Radim]
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 049a2c2d486e8cc82c5cd79fa479c5b105b109e9 ]
Remove the CPU_ALTERNATIVES config option and enable the code
unconditionally. The config option was only added to avoid a conflict
with the named saved segment support. Since that code is gone there is
no reason to keep the CPU_ALTERNATIVES config option.
Just enable it unconditionally to also reduce the number of config
options and make it less likely that something breaks.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 686140a1a9c41d85a4212a1c26d671139b76404b ]
Implement CPU alternatives, which allows to optionally patch newer
instructions at runtime, based on CPU facilities availability.
A new kernel boot parameter "noaltinstr" disables patching.
Current implementation is derived from x86 alternatives. Although
ideal instructions padding (when altinstr is longer then oldinstr)
is added at compile time, and no oldinstr nops optimization has to be
done at runtime. Also couple of compile time sanity checks are done:
1. oldinstr and altinstr must be <= 254 bytes long,
2. oldinstr and altinstr must not have an odd length.
alternative(oldinstr, altinstr, facility);
alternative_2(oldinstr, altinstr1, facility1, altinstr2, facility2);
Both compile time and runtime padding consists of either 6/4/2 bytes nop
or a jump (brcl) + 2 bytes nop filler if padding is longer then 6 bytes.
.altinstructions and .altinstr_replacement sections are part of
__init_begin : __init_end region and are freed after initialization.
Signed-off-by: Vasily Gorbik <gor@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit dbfcef6b0f4012c57bc0b6e0e660d5ed12a5eaed upstream.
Below is the synchronization issue between unmount and kjournald2
contexts, which results into use after free issue in kjournald2().
Fix this issue by using journal->j_state_lock to synchronize the
wait_event() done in journal_kill_thread() and the wake_up() done
in kjournald2().
TASK 1:
umount cmd:
|--jbd2_journal_destroy() {
|--journal_kill_thread() {
write_lock(&journal->j_state_lock);
journal->j_flags |= JBD2_UNMOUNT;
...
write_unlock(&journal->j_state_lock);
wake_up(&journal->j_wait_commit); TASK 2 wakes up here:
kjournald2() {
...
checks JBD2_UNMOUNT flag and calls goto end-loop;
...
end_loop:
write_unlock(&journal->j_state_lock);
journal->j_task = NULL; --> If this thread gets
pre-empted here, then TASK 1 wait_event will
exit even before this thread is completely
done.
wait_event(journal->j_wait_done_commit, journal->j_task == NULL);
...
write_lock(&journal->j_state_lock);
write_unlock(&journal->j_state_lock);
}
|--kfree(journal);
}
}
wake_up(&journal->j_wait_done_commit); --> this step
now results into use after free issue.
}
Signed-off-by: Sahitya Tummala <stummala@codeaurora.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a34d0a0da1abae46a5f6ebd06fb0ec484ca099d9 upstream.
In an RFC patch, Sven Eckelmann and Simon Wunderlich reported:
"QCA 802.11n chips (especially AR9330/AR9340) sometimes end up in a
state in which a read of AR_CFG always returns 0xdeadbeef.
This should not happen when when the power_mode of the device is
ATH9K_PM_AWAKE."
Include the check for the default register state in the existing MAC
hang check.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74c82dae6c474933f2be401976e1530b5f623221 upstream.
We were accidentally initializing haptics->rated_voltage twice, and did not
initialize overdrive voltage.
Acked-by: Dan Murphy <dmurphy@ti.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Amit Pundir <amit.pundir@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 90841047a01b452cc8c3f9b990698b264143334a upstream.
This linksys dongle by default comes up in cdc_ether mode.
This patch allows r8152 to claim the device:
Bus 002 Device 002: ID 13b1:0041 Linksys
Signed-off-by: Grant Grundler <grundler@chromium.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
[krzk: Rebase on v4.4]
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 2ef230531ee171a475fc3ddad5516dd7e09a8a77 upstream.
Since ion alloc can be called by userspace,eg gralloc.
When it is called frequently, the efficiency of kswapd is
to low. And the reclaimed memory is too lower. In this way,
the kswapd can use to much cpu resources.
With 3.5GB DMA Zone and 0.5 Normal Zone.
pgsteal_kswapd_dma 9364140
pgsteal_kswapd_normal 7071043
pgscan_kswapd_dma 10428250
pgscan_kswapd_normal 37840094
With this change the reclaim ratio has greatly improved
18.9% -> 72.5%
Signed-off-by: Chen Feng <puck.chen@hisilicon.com>
Signed-off-by: Lu bing <albert.lubing@hisilicon.com>
Reviewed-by: Laura Abbott <labbott@redhat.com>
Cc: Greg Hackmann <ghackmann@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 78b562fbfa2cf0a9fcb23c3154756b690f4905c1 upstream.
Return immediately when we find issue in the user stack checks. The
error value could get overwritten by following check for
PERF_SAMPLE_REGS_INTR.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: syzkaller-bugs@googlegroups.com
Cc: x86@kernel.org
Fixes: 60e2364e60 ("perf: Add ability to sample machine state on interrupt")
Link: http://lkml.kernel.org/r/20180415092352.12403-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit d3878e164dcd3925a237a20e879432400e369172 upstream.
The TSC calibration code uses HPET as reference. The conversion normalizes
the delta of two HPET timestamps:
hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
and then divides the normalized delta of the corresponding TSC timestamps
by the result to calulate the TSC frequency.
tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
This uses do_div() which takes an u32 as the divisor, which worked so far
because the HPET frequency was low enough that 'hpetref' never exceeded
32bit.
On Skylake machines the HPET frequency increased so 'hpetref' can exceed
32bit. do_div() truncates the divisor, which causes the calibration to
fail.
Use div64_u64() to avoid the problem.
[ tglx: Fixes whitespace mangled patch and rewrote changelog ]
Signed-off-by: Xiaoming Gao <newtongao@tencent.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Cc: peterz@infradead.org
Cc: hpa@zytor.com
Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 1d0cffa674cfa7d185a302c8c6850fc50b893bed upstream.
RHBZ: 1453123
Since at least the 3.10 kernel and likely a lot earlier we have
not been able to create unix domain sockets in a cifs share
when mounted using the SFU mount option (except when mounted
with the cifs unix extensions to Samba e.g.)
Trying to create a socket, for example using the af_unix command from
xfstests will cause :
BUG: unable to handle kernel NULL pointer dereference at 00000000
00000040
Since no one uses or depends on being able to create unix domains sockets
on a cifs share the easiest fix to stop this vulnerability is to simply
not allow creation of any other special files than char or block devices
when sfu is used.
Added update to Ronnie's patch to handle a tcon link leak, and
to address a buf leak noticed by Gustavo and Colin.
Acked-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
CC: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
Reported-by: Eryu Guan <eguan@redhat.com>
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
We get a build error in the irqsoff tracer in some configurations:
kernel/trace/trace_irqsoff.c: In function 'trace_preempt_on':
kernel/trace/trace_irqsoff.c:855:2: error: implicit declaration of function 'trace_preempt_enable_rcuidle'; did you mean 'trace_irq_enable_rcuidle'? [-Werror=implicit-function-declaration]
trace_preempt_enable_rcuidle(a0, a1);
The problem is that trace_preempt_enable_rcuidle() has different
definition based on multiple Kconfig symbols, but not all combinations
have a valid definition.
This changes the conditions so that we always get exactly one
definition of each of the four tracing macros. I have not tried
to verify that these definitions are sensible, but now we
can build all randconfig combinations again.
Link: http://lkml.kernel.org/r/20171019083230.2450779-1-arnd@arndb.de
Change-Id: I28715af208379e993df85c2fb35549290f4fbd6e
Fixes: d59158162e03 ("tracing: Add support for preempt and irq enable/disable events")
Acked-by: Joel Fernandes <joelaf@google.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Some debugging messages did not use %pK, but since those messages are
not very useful and have been removed upstream, just remove them
instead.
Bug: 77937819
Change-Id: Ie45897fe2d6ec3f842a02883e8ec929ed2e76933
Signed-off-by: Alistair Strachan <astrachan@google.com>
d_make_root will call iput on failure, so we
shouldn't try to do that ourselves.
Signed-off-by: Daniel Rosenberg <drosen@google.com>
Bug: 77923821
Change-Id: I1abb4afb0f894ab917b7c6be8c833676f436beb7
When an sdcardfs dentry is destroyed, it may not yet
have its fsdata initialized. It must be checked before
we try to access the paths in its private data.
Additionally, when cleaning up the superblock after
a failure, we don't have our sb private data, so
check for that case.
Bug: 77923821
Change-Id: I89caf6e121ed86480b42024664453fe0031bbcf3
Signed-off-by: Daniel Rosenberg <drosen@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlre3XwACgkQONu9yGCS
aT5KcRAAxB6w9SbjjlGv+PsN3ISQgnIPjWadBQ12WWnpr1sqZi0wrMZRsNiK5+UN
wPalUBiLiAIqNoDVSrDUgjyqC+wnQjhM/9tudEBqXQ6TQbSHQfQpZHQabLEtXxCP
Yd1EHwEgJrCHqaj17oFZFkps20ooKtSnYQ57pyZNem5EPR/ayaMWvo6WM7k6d2hD
E2WE57ShLbvslYaSvmDXML6o9f/bBKHOuL0GymVtDEUcyTLuw3GZaplnuaSLz6kc
o7tU2xVV+yajmpiEt4iR40Pgk+pygEGC14OI8dj/YHVotDzJKWnMgQ/HKxr8kyra
ImQPwu9DmaWqAUGr2SRmE/SXJpKdeYM1rxA/H3pMSaP9nRc2ccHyQF/ASGfHs+Mv
9hNQBjRugS4UXDzFhRlEh97CyfVa/ZuF0WgiBtBYnXSdXKA1xDq9cVf3UJg7k6om
1X7HLEVLhVLR7/liPjhOlTj9vrUzc6NcN+uVdfnmspI1BjTBe3ezzLqEP8VTUsNQ
p/V9r0i6TGR3gYQuTzjU/MaAuBZwj1D5sCnVUphCNUtSJf/0cjQsfYUcgtrtk67U
9Bjlo0pWHpAXxARiegBY3n5ClkZpdqEnt4Dp2MdR65pTSJ4MfC2UDLemUgB18arU
IllNzG2GywgQSouH3s5XPNZLkEvX8iK5lUWqRQ7ZiaA/0jVkn70=
=K6Qy
-----END PGP SIGNATURE-----
Merge 4.4.129 into android-4.4
Changes in 4.4.129
media: v4l2-compat-ioctl32: don't oops on overlay
parisc: Fix out of array access in match_pci_device()
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
perf intel-pt: Fix sync_switch
perf intel-pt: Fix error recovery from missing TIP packet
perf intel-pt: Fix timestamp following overflow
radeon: hide pointless #warning when compile testing
Revert "perf tests: Decompress kernel module before objdump"
block/loop: fix deadlock after loop_set_status
s390/qdio: don't retry EQBS after CCQ 96
s390/qdio: don't merge ERROR output buffers
s390/ipl: ensure loadparm valid flag is set
getname_kernel() needs to make sure that ->name != ->iname in long case
rtl8187: Fix NULL pointer dereference in priv->conf_mutex
hwmon: (ina2xx) Fix access to uninitialized mutex
cdc_ether: flag the Cinterion AHS8 modem by gemalto as WWAN
slip: Check if rstate is initialized before uncompressing
lan78xx: Correctly indicate invalid OTP
x86/hweight: Get rid of the special calling convention
x86/hweight: Don't clobber %rdi
tty: make n_tty_read() always abort if hangup is in progress
ubifs: Check ubifs_wbuf_sync() return code
ubi: fastmap: Don't flush fastmap work on detach
ubi: Fix error for write access
ubi: Reject MLC NAND
fs/reiserfs/journal.c: add missing resierfs_warning() arg
resource: fix integer overflow at reallocation
ipc/shm: fix use-after-free of shm file via remap_file_pages()
mm, slab: reschedule cache_reap() on the same CPU
usb: musb: gadget: misplaced out of bounds check
ARM: dts: at91: at91sam9g25: fix mux-mask pinctrl property
ARM: dts: at91: sama5d4: fix pinctrl compatible string
xen-netfront: Fix hang on device removal
regmap: Fix reversed bounds check in regmap_raw_write()
ACPI / video: Add quirk to force acpi-video backlight on Samsung 670Z5E
ACPI / hotplug / PCI: Check presence of slot itself in get_slot_status()
USB:fix USB3 devices behind USB3 hubs not resuming at hibernate thaw
usb: dwc3: pci: Properly cleanup resource
HID: i2c-hid: fix size check and type usage
powerpc/powernv: Handle unknown OPAL errors in opal_nvram_write()
powerpc/64: Fix smp_wmb barrier definition use use lwsync consistently
powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops
powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops
HID: Fix hid_report_len usage
HID: core: Fix size as type u32
ASoC: ssm2602: Replace reg_default_raw with reg_default
thunderbolt: Resume control channel after hibernation image is created
random: use a tighter cap in credit_entropy_bits_safe()
jbd2: if the journal is aborted then don't allow update of the log tail
ext4: don't update checksum of new initialized bitmaps
ext4: fail ext4_iget for root directory if unallocated
RDMA/ucma: Don't allow setting RDMA_OPTION_IB_PATH without an RDMA device
ALSA: pcm: Fix UAF at PCM release via PCM timer access
IB/srp: Fix srp_abort()
IB/srp: Fix completion vector assignment algorithm
dmaengine: at_xdmac: fix rare residue corruption
um: Use POSIX ucontext_t instead of struct ucontext
iommu/vt-d: Fix a potential memory leak
mmc: jz4740: Fix race condition in IRQ mask update
clk: mvebu: armada-38x: add support for 1866MHz variants
clk: mvebu: armada-38x: add support for missing clocks
clk: bcm2835: De-assert/assert PLL reset signal when appropriate
thermal: imx: Fix race condition in imx_thermal_probe()
watchdog: f71808e_wdt: Fix WD_EN register read
ALSA: oss: consolidate kmalloc/memset 0 call to kzalloc
ALSA: pcm: Use ERESTARTSYS instead of EINTR in OSS emulation
ALSA: pcm: Avoid potential races between OSS ioctls and read/write
ALSA: pcm: Return -EBUSY for OSS ioctls changing busy streams
ALSA: pcm: Fix mutex unbalance in OSS emulation ioctls
ALSA: pcm: Fix endless loop for XRUN recovery in OSS emulation
vfio-pci: Virtualize PCIe & AF FLR
vfio/pci: Virtualize Maximum Payload Size
vfio/pci: Virtualize Maximum Read Request Size
ext4: don't allow r/w mounts if metadata blocks overlap the superblock
drm/radeon: Fix PCIe lane width calculation
ext4: fix crashes in dioread_nolock mode
ext4: fix deadlock between inline_data and ext4_expand_extra_isize_ea()
ALSA: line6: Use correct endpoint type for midi output
ALSA: rawmidi: Fix missing input substream checks in compat ioctls
ALSA: hda - New VIA controller suppor no-snoop path
HID: hidraw: Fix crash on HIDIOCGFEATURE with a destroyed device
MIPS: uaccess: Add micromips clobbers to bzero invocation
MIPS: memset.S: EVA & fault support for small_memset
MIPS: memset.S: Fix return of __clear_user from Lpartial_fixup
MIPS: memset.S: Fix clobber of v1 in last_fixup
powerpc/eeh: Fix enabling bridge MMIO windows
powerpc/lib: Fix off-by-one in alternate feature patching
jffs2_kill_sb(): deal with failed allocations
hypfs_kill_super(): deal with failed allocations
rpc_pipefs: fix double-dput()
Don't leak MNT_INTERNAL away from internal mounts
autofs: mount point create should honour passed in mode
mm: allow GFP_{FS,IO} for page_cache_read page cache allocation
mm/filemap.c: fix NULL pointer in page_cache_tree_insert()
ext4: bugfix for mmaped pages in mpage_release_unused_pages()
fanotify: fix logic of events on child
writeback: safer lock nesting
Linux 4.4.129
Change-Id: I8806d2cc92fe512f27a349e8f630ced0cac9a8d7
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
commit 2e898e4c0a3897ccd434adac5abb8330194f527b upstream.
lock_page_memcg()/unlock_page_memcg() use spin_lock_irqsave/restore() if
the page's memcg is undergoing move accounting, which occurs when a
process leaves its memcg for a new one that has
memory.move_charge_at_immigrate set.
unlocked_inode_to_wb_begin,end() use spin_lock_irq/spin_unlock_irq() if
the given inode is switching writeback domains. Switches occur when
enough writes are issued from a new domain.
This existing pattern is thus suspicious:
lock_page_memcg(page);
unlocked_inode_to_wb_begin(inode, &locked);
...
unlocked_inode_to_wb_end(inode, locked);
unlock_page_memcg(page);
If both inode switch and process memcg migration are both in-flight then
unlocked_inode_to_wb_end() will unconditionally enable interrupts while
still holding the lock_page_memcg() irq spinlock. This suggests the
possibility of deadlock if an interrupt occurs before unlock_page_memcg().
truncate
__cancel_dirty_page
lock_page_memcg
unlocked_inode_to_wb_begin
unlocked_inode_to_wb_end
<interrupts mistakenly enabled>
<interrupt>
end_page_writeback
test_clear_page_writeback
lock_page_memcg
<deadlock>
unlock_page_memcg
Due to configuration limitations this deadlock is not currently possible
because we don't mix cgroup writeback (a cgroupv2 feature) and
memory.move_charge_at_immigrate (a cgroupv1 feature).
If the kernel is hacked to always claim inode switching and memcg
moving_account, then this script triggers lockup in less than a minute:
cd /mnt/cgroup/memory
mkdir a b
echo 1 > a/memory.move_charge_at_immigrate
echo 1 > b/memory.move_charge_at_immigrate
(
echo $BASHPID > a/cgroup.procs
while true; do
dd if=/dev/zero of=/mnt/big bs=1M count=256
done
) &
while true; do
sync
done &
sleep 1h &
SLEEP=$!
while true; do
echo $SLEEP > a/cgroup.procs
echo $SLEEP > b/cgroup.procs
done
The deadlock does not seem possible, so it's debatable if there's any
reason to modify the kernel. I suggest we should to prevent future
surprises. And Wang Long said "this deadlock occurs three times in our
environment", so there's more reason to apply this, even to stable.
Stable 4.4 has minor conflicts applying this patch. For a clean 4.4 patch
see "[PATCH for-4.4] writeback: safer lock nesting"
https://lkml.org/lkml/2018/4/11/146
Wang Long said "this deadlock occurs three times in our environment"
[gthelen@google.com: v4]
Link: http://lkml.kernel.org/r/20180411084653.254724-1-gthelen@google.com
[akpm@linux-foundation.org: comment tweaks, struct initialization simplification]
Change-Id: Ibb773e8045852978f6207074491d262f1b3fb613
Link: http://lkml.kernel.org/r/20180410005908.167976-1-gthelen@google.com
Fixes: 682aa8e1a6 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Signed-off-by: Greg Thelen <gthelen@google.com>
Reported-by: Wang Long <wanglong19@meituan.com>
Acked-by: Wang Long <wanglong19@meituan.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org> [v4.2+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
[natechancellor: Applied to 4.4 based on Greg's backport on lkml.org]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 54a307ba8d3cd00a3902337ffaae28f436eeb1a4 upstream.
When event on child inodes are sent to the parent inode mark and
parent inode mark was not marked with FAN_EVENT_ON_CHILD, the event
will not be delivered to the listener process. However, if the same
process also has a mount mark, the event to the parent inode will be
delivered regadless of the mount mark mask.
This behavior is incorrect in the case where the mount mark mask does
not contain the specific event type. For example, the process adds
a mark on a directory with mask FAN_MODIFY (without FAN_EVENT_ON_CHILD)
and a mount mark with mask FAN_CLOSE_NOWRITE (without FAN_ONDIR).
A modify event on a file inside that directory (and inside that mount)
should not create a FAN_MODIFY event, because neither of the marks
requested to get that event on the file.
Fixes: 1968f5eed5 ("fanotify: use both marks when possible")
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Jan Kara <jack@suse.cz>
[natechancellor: Fix small conflict due to lack of 3cd5eca8d7a2f]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 4e800c0359d9a53e6bf0ab216954971b2515247f upstream.
Pages clear buffers after ext4 delayed block allocation failed,
However, it does not clean its pte_dirty flag.
if the pages unmap ,in cording to the pte_dirty ,
unmap_page_range may try to call __set_page_dirty,
which may lead to the bugon at
mpage_prepare_extent_to_map:head = page_buffers(page);.
This patch just call clear_page_dirty_for_io to clean pte_dirty
at mpage_release_unused_pages for pages mmaped.
Steps to reproduce the bug:
(1) mmap a file in ext4
addr = (char *)mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED,
fd, 0);
memset(addr, 'i', 4096);
(2) return EIO at
ext4_writepages->mpage_map_and_submit_extent->mpage_map_one_extent
which causes this log message to be print:
ext4_msg(sb, KERN_CRIT,
"Delayed block allocation failed for "
"inode %lu at logical offset %llu with"
" max blocks %u with error %d",
inode->i_ino,
(unsigned long long)map->m_lblk,
(unsigned)map->m_len, -err);
(3)Unmap the addr cause warning at
__set_page_dirty:WARN_ON_ONCE(warn && !PageUptodate(page));
(4) wait for a minute,then bugon happen.
Cc: stable@vger.kernel.org
Signed-off-by: wangguang <wangguang03@zte.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
[@nathanchance: Resolved conflict from lack of 09cbfeaf1a5a6]
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>