If a socket has a valid dst cache, then xfrm_lookup_route will get
skipped. However, the cache is not invalidated when applying policy to a
socket (i.e. IPV6_XFRM_POLICY). The result is that new policies are
sometimes ignored on those sockets. (Note: This was broken for IPv4 and
IPv6 at different times.)
This can be demonstrated like so,
1. Create UDP socket.
2. connect() the socket.
3. Apply an outbound XFRM policy to the socket. (setsockopt)
4. send() data on the socket.
Packets will continue to be sent in the clear instead of matching an
xfrm or returning a no-match error (EAGAIN). This affects calls to
send() and not sendto().
Invalidating the sk_dst_cache is necessary to correctly apply xfrm
policies. Since we do this in xfrm_user_policy(), the sk_lock was
already acquired in either do_ip_setsockopt() or do_ipv6_setsockopt(),
and we may call __sk_dst_reset().
Performance impact should be negligible, since this code is only called
when changing xfrm policy, and only affects the socket in question.
Change-Id: I54b4ec422aa5f4e31652a8c6913696f0a5610a51
Fixes: 00bc0ef5880d ("ipv6: Skip XFRM lookup if dst_entry in socket cache is valid")
Tested: https://android-review.googlesource.com/517555
Tested: https://android-review.googlesource.com/418659
Signed-off-by: Jonathan Basseri <misterikkit@google.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 2b06cdf3e688b98fcc9945873b5d42792bd4eee0)
We all should be using (and improving) the schedutil governor now. Get
rid of the non-upstream governor.
Tested on Hikey.
Change-Id: Ic660756536e5da51952738c3c18b94e31f58cd57
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Boosted RT tasks can be deboosted quickly, this makes boost usless
for RT tasks and causes lots of glitching. Use timers to prevent
de-boost too soon and wait for long enough such that next enqueue
happens after a threshold.
While this can be solved in the governor, there are following
advantages:
- The approach used is governor-independent
- Reduces boost group lock contention for frequently sleepers/wakers
- Works with schedfreq without any other schedfreq hacks.
Bug: 30210506
Change-Id: I41788b235586988be446505deb7c0529758a9898
Signed-off-by: Joel Fernandes <joelaf@google.com>
This patch adds schedtune enqueue/dequeue to RT scheduling class.
Change-Id: If416e64319d62191f3aedd675d3e9a21fe2102fb
Signed-off-by: Joel Fernandes <joelaf@google.com>
Part of the above change was reverted in 240628085effc47e86f51fc3fb37bc0e628f9a85;
this change reverts the rest.
This ARM mmap change breaks AddressSanitizer:
Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
Revert it until ASAN runtime library is updated to handle it.
Bug: 67425063
This ARM mmap change breaks AddressSanitizer:
Shadow memory range interleaves with an existing memory mapping. ASan cannot proceed correctly. ABORTING.
Revert it until ASAN runtime library is updated to handle it.
Bug: 67425063
This reverts commit d2471b5e84.
util_delta becomes not zero in eenv_before, which will affect the
calculation of grp_util in group_idle_state(). Fix it under the
new condition.
Change-Id: Ic3853bb45876a8e388afcbe4e72d25fc42b1d7b0
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
If no energy saving for target_cpu in the calculation of energy_diff(),
backup_cpu will be set as the new dst_cpu for the next calculation. At this
point, we also need update the new trg_cpu as backup_cpu to make sure the
subsequent calculation of energy_diff() is correct.
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
Before commit 5f8b3a757d ("sched/fair: consider task utilization in
group_norm_util()"), eenv->util_delta is used to distinguish energy
before and energy after in sched_group_energy(). After that commit,
eenv->util_delta can not do that any more.
In this commit, use trg_cpu to distinguish energy before/after in
sched_group_energy().
Before apply this commit, cap_before/cap_delta is not correct:
<idle>-0 [001] 147504.608920: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=3 usage_delta=7 nrg_before=250 nrg_after=250 nrg_diff=0
cap_before=0 cap_after=528 cap_delta=1056 nrg_delta=0 nrg_payoff=0
After apply this commit, cap_before/cap_delta retrun to normal:
<idle>-0 [001] 220.494011: sched_energy_diff: pid=7 comm=rcu_preempt
src_cpu=1 dst_cpu=2 usage_delta=3 nrg_before=248 nrg_after=248 nrg_diff=0
cap_before=528 cap_after=528 cap_delta=0 nrg_delta=0 nrg_payoff=0
Signed-off-by: Ke Wang <ke.wang@spreadtrum.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAln62hMACgkQONu9yGCS
aT4KoRAAg9FasYL4oGtTiNglajWWP33eVv5NiwwdFdCfQGOEIzZPQy7ST/I/b1CB
Kql3D9oA7d8ZFtg05KyQb4csH+9WNjeLBr4G2NGZurEC0c6vbU/64ceADgzA4YKJ
RJ2iDWEQKq45RVH6BlrDyITu9H20TRgzcsQZ7fiswB3ZJsPTfdyvDlUlN3A4JhXY
2eTF3CNVPEGLnaT4PY5tuLZpLIZkQkZzw1xr9YCq9YA5aNYi2OywWbyQq27ruX2d
K238FiaYSN5LeUxU6JE2tk4CxrHJ0pOiw6kBiSgIv3MwDQa5iQypKVQA2tnAXHqL
rPb4cGAcDSQYzpCu4XimDlLEQhoAX2BceSakdYXoMu66AKewizSnopAljhPHp9uk
0GO6lSJv0f+NGoCpxOE2FDfMIwiPbLC9LfMDWqpFvPanMfMe156p6D+LL4GfTaus
x4oZZa61aPwjomobEM4hzZk5bp1AjkiDxKHCBvwpuVTOIFlxlVcuB4RyuY2VsuHN
4a/tw9iEHkyJYCt3tsePTltgrAws2j7KCWLx+F3LTXWzmZ9//9bFq63V6kIh0a2b
nPozkt0Xj7iygJwU1G2i5XAMTF5tPH8ELioGiakv0Rkj1ncMSXx1s2dO1uxR06a5
bx/MFLbo1AyZhE8Tk4LcT/rEHtjhj/24FX6sEq4xNjw/GvAzlp0=
=ScnL
-----END PGP SIGNATURE-----
Merge 4.4.96 into android-4.4
Changes in 4.4.96
workqueue: replace pool->manager_arb mutex with a flag
ALSA: hda/realtek - Add support for ALC236/ALC3204
ALSA: hda - fix headset mic problem for Dell machines with alc236
ceph: unlock dangling spinlock in try_flush_caps()
usb: xhci: Handle error condition in xhci_stop_device()
spi: uapi: spidev: add missing ioctl header
fuse: fix READDIRPLUS skipping an entry
xen/gntdev: avoid out of bounds access in case of partial gntdev_mmap()
Input: elan_i2c - add ELAN0611 to the ACPI table
Input: gtco - fix potential out-of-bound access
assoc_array: Fix a buggy node-splitting case
scsi: zfcp: fix erp_action use-before-initialize in REC action trace
scsi: sg: Re-fix off by one in sg_fill_request_table()
can: sun4i: fix loopback mode
can: kvaser_usb: Correct return value in printout
can: kvaser_usb: Ignore CMD_FLUSH_QUEUE_REPLY messages
regulator: fan53555: fix I2C device ids
x86/microcode/intel: Disable late loading on model 79
ecryptfs: fix dereference of NULL user_key_payload
Revert "drm: bridge: add DT bindings for TI ths8135"
Linux 4.4.96
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 11bf4a8e1d which is
commit 2e644be30fcc08c736f66b60f4898d274d4873ab upstream.
Ben pointed out that there is no driver or device trees referencing this
device in 4.4-stable, so the patch should not be present there.
Reported-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Archit Taneja <architt@codeaurora.org>
Cc: Sasha Levin <alexander.levin@verizon.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f66665c09ab489a11ca490d6a82df57cfc1bea3e upstream.
In eCryptfs, we failed to verify that the authentication token keys are
not revoked before dereferencing their payloads, which is problematic
because the payload of a revoked key is NULL. request_key() *does* skip
revoked keys, but there is still a window where the key can be revoked
before we acquire the key semaphore.
Fix it by updating ecryptfs_get_key_payload_data() to return
-EKEYREVOKED if the key payload is NULL. For completeness we check this
for "encrypted" keys as well as "user" keys, although encrypted keys
cannot be revoked currently.
Alternatively we could use key_validate(), but since we'll also need to
fix ecryptfs_get_key_payload_data() to validate the payload length, it
seems appropriate to just check the payload pointer.
Fixes: 237fead619 ("[PATCH] ecryptfs: fs/Makefile and fs/Kconfig")
Reviewed-by: James Morris <james.l.morris@oracle.com>
Cc: Michael Halcrow <mhalcrow@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 723f2828a98c8ca19842042f418fb30dd8cfc0f7 upstream.
Blacklist Broadwell X model 79 for late loading due to an erratum.
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20171018111225.25635-1-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit fc1111b885437f374ed54aadda44d8b241ebd2a3 upstream.
The device tree nodes all correctly describe the regulators as
syr827 or syr828, but the I2C device id is currently set to the
wildcard value of syr82x in the driver. This causes udev to fail
to match the driver module with the modalias data from sysfs.
Fix this by replacing the I2C device ids with ones that match the
device tree descriptions, with syr827 and syr828. Tested on
Firefly rk3288 board. The syr82x id was not used anywhere.
Fixes: e80c47bd73 (regulator: fan53555: Export I2C module alias information)
Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit e1d2d1329a5722dbecc9c278303fcc4aa01f8790 upstream.
To avoid kernel warning "Unhandled message (68)", ignore the
CMD_FLUSH_QUEUE_REPLY message for now.
As of Leaf v2 firmware version v4.1.844 (2017-02-15), flush tx queue is
synchronous. There is a capability bit indicating whether flushing tx
queue is synchronous or asynchronous.
A proper solution would be to query the device for capabilities. If the
synchronous tx flush capability bit is set, we should wait for
CMD_FLUSH_QUEUE_REPLY message, while flushing the tx queue.
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8f65a923e6b628e187d5e791cf49393dd5e8c2f9 upstream.
If the return value from kvaser_usb_send_simple_msg() was non-zero, the
return value from kvaser_usb_flush_queue() was printed in the kernel
warning.
Signed-off-by: Jimmy Assarsson <jimmyassarsson@gmail.com>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 3a379f5b36ae039dfeb6f73316e47ab1af4945df upstream.
Fix loopback mode by setting the right flag and remove presume mode.
Signed-off-by: Gerhard Bertelsmann <info@gerhard-bertelsmann.de>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 587c3c9f286cee5c9cac38d28c8ae1875f4ec85b upstream.
Commit 109bade9c625 ("scsi: sg: use standard lists for sg_requests")
introduced an off-by-one error in sg_ioctl(), which was fixed by commit
bd46fc406b30 ("scsi: sg: off by one in sg_ioctl()").
Unfortunately commit 4759df905a47 ("scsi: sg: factor out
sg_fill_request_table()") moved that code, and reintroduced the
bug (perhaps due to a botched rebase). Fix it again.
Fixes: 4759df905a47 ("scsi: sg: factor out sg_fill_request_table()")
Signed-off-by: Ben Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ab31fd0ce65ec93828b617123792c1bb7c6dcc42 upstream.
v4.10 commit 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN
recovery") extended accessing parent pointer fields of struct
zfcp_erp_action for tracing. If an erp_action has never been enqueued
before, these parent pointer fields are uninitialized and NULL. Examples
are zfcp objects freshly added to the parent object's children list,
before enqueueing their first recovery subsequently. In
zfcp_erp_try_rport_unblock(), we iterate such list. Accessing erp_action
fields can cause a NULL pointer dereference. Since the kernel can read
from lowcore on s390, it does not immediately cause a kernel page
fault. Instead it can cause hangs on trying to acquire the wrong
erp_action->adapter->dbf->rec_lock in zfcp_dbf_rec_action_lvl()
^bogus^
while holding already other locks with IRQs disabled.
Real life example from attaching lots of LUNs in parallel on many CPUs:
crash> bt 17723
PID: 17723 TASK: ... CPU: 25 COMMAND: "zfcperp0.0.1800"
LOWCORE INFO:
-psw : 0x0404300180000000 0x000000000038e424
-function : _raw_spin_lock_wait_flags at 38e424
...
#0 [fdde8fc90] zfcp_dbf_rec_action_lvl at 3e0004e9862 [zfcp]
#1 [fdde8fce8] zfcp_erp_try_rport_unblock at 3e0004dfddc [zfcp]
#2 [fdde8fd38] zfcp_erp_strategy at 3e0004e0234 [zfcp]
#3 [fdde8fda8] zfcp_erp_thread at 3e0004e0a12 [zfcp]
#4 [fdde8fe60] kthread at 173550
#5 [fdde8feb8] kernel_thread_starter at 10add2
zfcp_adapter
zfcp_port
zfcp_unit <address>, 0x404040d600000000
scsi_device NULL, returning early!
zfcp_scsi_dev.status = 0x40000000
0x40000000 ZFCP_STATUS_COMMON_RUNNING
crash> zfcp_unit <address>
struct zfcp_unit {
erp_action = {
adapter = 0x0,
port = 0x0,
unit = 0x0,
},
}
zfcp_erp_action is always fully embedded into its container object. Such
container object is never moved in its object tree (only add or delete).
Hence, erp_action parent pointers can never change.
To fix the issue, initialize the erp_action parent pointers before
adding the erp_action container to any list and thus before it becomes
accessible from outside of its initializing function.
In order to also close the time window between zfcp_erp_setup_act()
memsetting the entire erp_action to zero and setting the parent pointers
again, drop the memset and instead explicitly initialize individually
all erp_action fields except for parent pointers. To be extra careful
not to introduce any other unintended side effect, even keep zeroing the
erp_action fields for list and timer. Also double-check with
WARN_ON_ONCE that erp_action parent pointers never change, so we get to
know when we would deviate from previous behavior.
Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com>
Fixes: 6f2ce1c6af37 ("scsi: zfcp: fix rport unblock race with LUN recovery")
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ea6789980fdaa610d7eb63602c746bf6ec70cd2b upstream.
This fixes CVE-2017-12193.
Fix a case in the assoc_array implementation in which a new leaf is
added that needs to go into a node that happens to be full, where the
existing leaves in that node cluster together at that level to the
exclusion of new leaf.
What needs to happen is that the existing leaves get moved out to a new
node, N1, at level + 1 and the existing node needs replacing with one,
N0, that has pointers to the new leaf and to N1.
The code that tries to do this gets this wrong in two ways:
(1) The pointer that should've pointed from N0 to N1 is set to point
recursively to N0 instead.
(2) The backpointer from N0 needs to be set correctly in the case N0 is
either the root node or reached through a shortcut.
Fix this by removing this path and using the split_node path instead,
which achieves the same end, but in a more general way (thanks to Eric
Biggers for spotting the redundancy).
The problem manifests itself as:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: assoc_array_apply_edit+0x59/0xe5
Fixes: 3cb989501c ("Add a generic associative array implementation.")
Reported-and-tested-by: WU Fan <u3536072@connect.hku.hk>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a50829479f58416a013a4ccca791336af3c584c7 upstream.
parse_hid_report_descriptor() has a while (i < length) loop, which
only guarantees that there's at least 1 byte in the buffer, but the
loop body can read multiple bytes which causes out-of-bounds access.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Reviewed-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 57a95b41869b8f0d1949c24df2a9dac1ca7082ee upstream.
ELAN0611 touchpad uses elan_i2c as its driver. It can be found
on Lenovo ideapad 320-15IKB.
So add it to ACPI table to enable the touchpad.
[Ido Adiv <idoad123@gmail.com> reports that the same ACPI ID is used for
Elan touchpad in ideapad 520].
BugLink: https://bugs.launchpad.net/bugs/1723736
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 298d275d4d9bea3524ff4bc76678c140611d8a8d upstream.
In case gntdev_mmap() succeeds only partially in mapping grant pages
it will leave some vital information uninitialized needed later for
cleanup. This will lead to an out of bounds array access when unmapping
the already mapped pages.
So just initialize the data needed for unmapping the pages a little bit
earlier.
Reported-by: Arthur Borsboom <arthurborsboom@gmail.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c6cdd51404b7ac12dd95173ddfc548c59ecf037f upstream.
Marios Titas running a Haskell program noticed a problem with fuse's
readdirplus: when it is interrupted by a signal, it skips one directory
entry.
The reason is that fuse erronously updates ctx->pos after a failed
dir_emit().
The issue originates from the patch adding readdirplus support.
Reported-by: Jakob Unterwurzacher <jakobunt@gmail.com>
Tested-by: Marios Titas <redneb@gmx.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 0b05b18381 ("fuse: implement NFS-like readdirplus support")
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit a2b4a79b88b24c49d98d45a06a014ffd22ada1a4 upstream.
The SPI_IOC_MESSAGE() macro references _IOC_SIZEBITS. Add linux/ioctl.h
to make sure this macro is defined. This fixes the following build
failure of lcdproc with the musl libc:
In file included from .../sysroot/usr/include/sys/ioctl.h:7:0,
from hd44780-spi.c:31:
hd44780-spi.c: In function 'spi_transfer':
hd44780-spi.c:89:24: error: '_IOC_SIZEBITS' undeclared (first use in this function)
status = ioctl(p->fd, SPI_IOC_MESSAGE(1), &xfer);
^
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit b3207c65dfafae27e7c492cb9188c0dc0eeaf3fd upstream.
xhci_stop_device() calls xhci_queue_stop_endpoint() multiple times
without checking the return value. xhci_queue_stop_endpoint() can
return error if the HC is already halted or unable to queue commands.
This can cause a deadlock condition as xhci_stop_device() would
end up waiting indefinitely for a completion for the command that
didn't get queued. Fix this by checking the return value and bailing
out of xhci_stop_device() in case of error. This patch happens to fix
potential memory leaks of the allocated command structures as well.
Fixes: c311e391a7 ("xhci: rework command timeout and cancellation,")
Signed-off-by: Mayank Rana <mrana@codeaurora.org>
Signed-off-by: Jack Pham <jackp@codeaurora.org>
Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 6c2838fbdedb9b72a81c931d49e56b229b6cdbca upstream.
sparse warns:
fs/ceph/caps.c:2042:9: warning: context imbalance in 'try_flush_caps' - wrong count at exit
We need to exit this function with the lock unlocked, but a couple of
cases leave it locked.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit f265788c336979090ac80b9ae173aa817c4fe40d upstream.
We have several Dell laptops which use the codec alc236, the headset
mic can't work on these machines. Following the commit 736f20a70, we
add the pin cfg table to make the headset mic work.
Signed-off-by: Hui Wang <hui.wang@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 736f20a7060857ff569e9e9586ae6c1204a73e07 upstream.
Add support for ALC236/ALC3204.
Add headset mode support for ALC236/ALC3204.
Signed-off-by: Kailang Yang <kailang@realtek.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 692b48258dda7c302e777d7d5f4217244478f1f6 upstream.
Josef reported a HARDIRQ-safe -> HARDIRQ-unsafe lock order detected by
lockdep:
[ 1270.472259] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
[ 1270.472783] 4.14.0-rc1-xfstests-12888-g76833e8 #110 Not tainted
[ 1270.473240] -----------------------------------------------------
[ 1270.473710] kworker/u5:2/5157 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
[ 1270.474239] (&(&lock->wait_lock)->rlock){+.+.}, at: [<ffffffff8da253d2>] __mutex_unlock_slowpath+0xa2/0x280
[ 1270.474994]
[ 1270.474994] and this task is already holding:
[ 1270.475440] (&pool->lock/1){-.-.}, at: [<ffffffff8d2992f6>] worker_thread+0x366/0x3c0
[ 1270.476046] which would create a new lock dependency:
[ 1270.476436] (&pool->lock/1){-.-.} -> (&(&lock->wait_lock)->rlock){+.+.}
[ 1270.476949]
[ 1270.476949] but this new dependency connects a HARDIRQ-irq-safe lock:
[ 1270.477553] (&pool->lock/1){-.-.}
...
[ 1270.488900] to a HARDIRQ-irq-unsafe lock:
[ 1270.489327] (&(&lock->wait_lock)->rlock){+.+.}
...
[ 1270.494735] Possible interrupt unsafe locking scenario:
[ 1270.494735]
[ 1270.495250] CPU0 CPU1
[ 1270.495600] ---- ----
[ 1270.495947] lock(&(&lock->wait_lock)->rlock);
[ 1270.496295] local_irq_disable();
[ 1270.496753] lock(&pool->lock/1);
[ 1270.497205] lock(&(&lock->wait_lock)->rlock);
[ 1270.497744] <Interrupt>
[ 1270.497948] lock(&pool->lock/1);
, which will cause a irq inversion deadlock if the above lock scenario
happens.
The root cause of this safe -> unsafe lock order is the
mutex_unlock(pool->manager_arb) in manage_workers() with pool->lock
held.
Unlocking mutex while holding an irq spinlock was never safe and this
problem has been around forever but it never got noticed because the
only time the mutex is usually trylocked while holding irqlock making
actual failures very unlikely and lockdep annotation missed the
condition until the recent b9c16a0e1f73 ("locking/mutex: Fix
lockdep_assert_held() fail").
Using mutex for pool->manager_arb has always been a bit of stretch.
It primarily is an mechanism to arbitrate managership between workers
which can easily be done with a pool flag. The only reason it became
a mutex is that pool destruction path wants to exclude parallel
managing operations.
This patch replaces the mutex with a new pool flag POOL_MANAGER_ACTIVE
and make the destruction path wait for the current manager on a wait
queue.
v2: Drop unnecessary flag clearing before pool destruction as
suggested by Boqun.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Upmigrate misfit current task upon scheduler tick with stopper.
We can kick an random (not necessarily big CPU) NOHZ idle CPU when a
CPU bound task is in need of upmigration. But it's not efficient as that
way needs following unnecessary wakeups:
1. Busy little CPU A to kick idle B
2. B runs idle balancer and enqueue migration/A
3. B goes idle
4. A runs migration/A, enqueues busy task on B.
5. B wakes up again.
This change makes active upmigration more efficiently by doing:
1. Busy little CPU A find target CPU B upon tick.
2. CPU A enqueues migration/A.
Change-Id: Ie865738054ea3296f28e6ba01710635efa7193c0
[joonwoop: The original version had logic to reserve CPU. The logic is
omitted in this version.]
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Currently active_load_balance_cpu_stop is run by cpu stopper and it
pushes running tasks off the busiest CPU onto idle target CPU. But
there is no check to see whether target cpu is offline or not before
pushing the tasks. With the introduction of active migration in the
scheduler tick path (see check_for_migration()) there have been
instances of attempts to migrate tasks to offline CPUs.
Add a check as to whether the target cpu is online or not to prevent
scheduling on offline CPUs.
Change-Id: Ib8ac7f8aeabd3ca7365f3eae977075952dab4f21
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
Active balance currently picks one task to migrate from busy cpu to
a chosen cpu (push_cpu). This patch extends active load balance to
recognize a particular task ('push_task') that needs to be migrated to
'push_cpu'. This capability will be leveraged by HMP-aware task
placement in a subsequent patch.
Change-Id: If31320111e6cc7044e617b5c3fd6d8e0c0e16952
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org]: Port to msm-3.18]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
We should avoid using the space character when passing arguments to
clang, because static code analysis check tool such as sparse may
misinterpret the arguments followed by spaces as build targets hence
cause the build to fail.
Signed-off-by: David Lin <dtwlin@google.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
(cherry picked from linux-kbuild commit bb3f38c3c5b759163e09b9152629cc789731de47)
Bug: 66969589
Change-Id: I055719cbc89e7a12d1f46f0a9c2152738a278d2a
[ghackmann@google.com: tweak to preserve AOSP-specific CLANG_TRIPLE]
Signed-off-by: Greg Hackmann <ghackmann@google.com>
In isolation, the original change will break building efistub for ARM64
with gcc. This wasn't an issue upstream due to the earlier change
60f38de7a8d4 ("efi/libstub: Unify command line param parsing"). That's
now been backported to AOSP too.
This reverts commit 89805266af.
Change-Id: I44eff2d17809b18181e2084abaf129ca4e2eb8d6
Signed-off-by: Greg Hackmann <ghackmann@google.com>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAlny7PcACgkQONu9yGCS
aT7UAw//Zqfv7J8fAyfeTnVsbBg5GcD7SW7clzILMclb2nCGcS1g6ZFZAOoWRZV3
eSnYKLBvOxfnF+TunD/aNagtKMIEuPqA/9Srd3uL2sdso1FtMsL8PR07Ml1o3Qf3
eUeKt/JfchpD5r+1ca4CWqRDvFyuQwP8RMsbXUqjsDEy/5USeGOzPGa9jdQ5JI/O
zvXlHyWuXryvDWHlM52H67CSdZC2KSUNeybji8EqrJlK2yijJQ8CvN7jrBevSVVy
2XpMx9RoeAbOAH36VuW5R3/tpUPW70L3Tw8zTo8dfs5w/TEMgddzDd4aWz8PLLqo
mhD6V3bgkqzkbXEaLvmmht/IMg497K6HcAFzfDu/M8X1wSaNfNmJ8IBKmDTBeuBO
t5Ha2mqDxiCTrEXq/USEgiBW28PJXKv3C5MhdSlYPWo/QGho0boTRirmbbmYupf/
T02LwRWo8MQT9l+sX5zFx+/Tw/f5/f1bWSWVJ5ns+lFbQ5smnA2nxvcPLg6LuGEe
tXZ7R+7v5yFp5quyUPTz6eY8Tau4mswuzm4avob0QLr6ZDQXTt8WbD8kmaQRoDq4
U0k58ZZbdPnOyf0zxFTURQoPk+MI/EV9tclQcWEsR+AXWVzqA71rSUMCcSl7Gad2
/vSkgDcSQ1xt0UQ7Bqf7PdNSVLtfH/n/jXJGJXwYW/Q0r/zRTYY=
=vcfF
-----END PGP SIGNATURE-----
Merge 4.4.95 into android-4.4
Changes in 4.4.95
USB: devio: Revert "USB: devio: Don't corrupt user memory"
USB: core: fix out-of-bounds access bug in usb_get_bos_descriptor()
USB: serial: metro-usb: add MS7820 device id
usb: cdc_acm: Add quirk for Elatec TWN3
usb: quirks: add quirk for WORLDE MINI MIDI keyboard
usb: hub: Allow reset retry for USB2 devices on connect bounce
ALSA: usb-audio: Add native DSD support for Pro-Ject Pre Box S2 Digital
can: gs_usb: fix busy loop if no more TX context is available
usb: musb: sunxi: Explicitly release USB PHY on exit
usb: musb: Check for host-mode using is_host_active() on reset interrupt
can: esd_usb2: Fix can_dlc value for received RTR, frames
drm/nouveau/bsp/g92: disable by default
drm/nouveau/mmu: flush tlbs before deleting page tables
ALSA: seq: Enable 'use' locking in all configurations
ALSA: hda: Remove superfluous '-' added by printk conversion
i2c: ismt: Separate I2C block read from SMBus block read
brcmsmac: make some local variables 'static const' to reduce stack size
bus: mbus: fix window size calculation for 4GB windows
clockevents/drivers/cs5535: Improve resilience to spurious interrupts
rtlwifi: rtl8821ae: Fix connection lost problem
KEYS: encrypted: fix dereference of NULL user_key_payload
lib/digsig: fix dereference of NULL user_key_payload
KEYS: don't let add_key() update an uninstantiated key
pkcs7: Prevent NULL pointer dereference, since sinfo is not always set.
parisc: Avoid trashing sr2 and sr3 in LWS code
parisc: Fix double-word compare and exchange in LWS code on 32-bit kernels
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
f2fs crypto: replace some BUG_ON()'s with error checks
f2fs crypto: add missing locking for keyring_key access
fscrypt: fix dereference of NULL user_key_payload
KEYS: Fix race between updating and finding a negative key
fscrypto: require write access to mount to set encryption policy
FS-Cache: fix dereference of NULL user_key_payload
Linux 4.4.95
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
The linker routines that we rely on to produce a relocatable PIE binary
treat it as a shared ELF object in some ways, i.e., it emits symbol based
R_AARCH64_ABS64 relocations into the final binary since doing so would be
appropriate when linking a shared library that is subject to symbol
preemption. (This means that an executable can override certain symbols
that are exported by a shared library it is linked with, and that the
shared library *must* update all its internal references as well, and point
them to the version provided by the executable.)
Symbol preemption does not occur for OS hosted PIE executables, let alone
for vmlinux, and so we would prefer to get rid of these symbol based
relocations. This would allow us to simplify the relocation routines, and
to strip the .dynsym, .dynstr and .hash sections from the binary. (Note
that these are tiny, and are placed in the .init segment, but they clutter
up the vmlinux binary.)
Note that these R_AARCH64_ABS64 relocations are only emitted for absolute
references to symbols defined in the linker script, all other relocatable
quantities are covered by anonymous R_AARCH64_RELATIVE relocations that
simply list the offsets to all 64-bit values in the binary that need to be
fixed up based on the offset between the link time and run time addresses.
Fortunately, GNU ld has a -Bsymbolic option, which is intended for shared
libraries to allow them to ignore symbol preemption, and unconditionally
bind all internal symbol references to its own definitions. So set it for
our PIE binary as well, and get rid of the asoociated sections and the
relocation code that processes them.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
[will: fixed conflict with __dynsym_offset linker script entry]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Note: This backport only adds -Bsymbolic to LDFLAGS_vmlinux, but doesn't
remove R_AARCH64_ABS64 relocation handling, because those changes depend
on later refactoring of the code that we don't need in android-4.4.
Bug: 66932127
Change-Id: I56f664e02bc8d2fa3e5f496fb041bc3a8e1a4094
(cherry picked from commit 08cc55b2afd97a654f71b3bebf8bb0ec89fdc498)
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
have_sched_energy_data is defined only for CONFIG_SMP, so declare it
only with CONFIG_SMP.
Fixes warning from intel bot:
tree: https://android.googlesource.com/kernel/msm android-4.4
head: a21299785a
commit: a21299785a [5/5] sched/core: Warn
if ENERGY_AWARE is enabled but data is missing
config: i386-randconfig-x002-201743 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
git checkout a21299785a
# save the attached .config to linux build tree
make ARCH=i386
All warnings (new ones prefixed by >>):
>> kernel//sched/core.c:94:13: warning: 'have_sched_energy_data' used
but never defined
static bool have_sched_energy_data(void);
^~~~~~~~~~~~~~~~~~~~~~
vim +/have_sched_energy_data +94 kernel//sched/core.c
93
> 94 static bool have_sched_energy_data(void);
95
Change-Id: I266b63ece6fb31d2b5b11821a8244e147ba6d3a4
Signed-off-by: Joel Fernandes <joelaf@google.com>
If the EAS energy model is missing or incomplete, i.e. sd_scs is NULL, then
sched_group_energy will return -EINVAL on the assumption that it raced with a
CPU hotplug event. In that case, energy_diff will return 0 and the energy-aware
wake path will silently fail to trigger any migrations.
This case can be triggered by disabling CONFIG_SCHED_MC on existing platforms,
so that there are no sched_groups with the SD_SHARE_CAP_STATES flag, so that
sd_scs is NULL.
Add checks so that a warning is printed if EAS is ever enabled while the
necessary data is not present.
Change-Id: Id233a510b5ad8b7fcecac0b1d789e730bbfc7c4a
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
It is preferable that WALT window rollover occurs just
before a tick, since the tick is an opportune moment
to record a complete window's statistics, as well as report
those stats to the cpu frequency governor. When CONFIG_HZ
results in a TICK_NSEC that isn't a integral number, this
requirement may be violated. Account for this by reducing
the WALT window size to the nearest multiple of TICK_NSEC.
Commit d368c6faa1 ("sched: walt: fix window misalignment
when HZ=300") attempted to do this but WALT isn't using
MIN_SCHED_RAVG_WINDOW as the window size and the patch was
doing nothing.
Also, change the type of 'walt_disabled' to bool and warn
if an invalid window size causes WALT to be disabled.
Change-Id: Ie3dcfc21a3df4408254ca1165a355bbe391ed5c7
Signed-off-by: Vikram Mulukutla <markivx@codeaurora.org>
(from https://patchwork.kernel.org/patch/9895261/)
This patch adds a parameter to select_task_rq, sibling_count_hint
allowing the caller, where it has this information, to inform the
sched_class the number of tasks that are being woken up as part of
the same event.
The wake_q mechanism is one case where this information is available.
select_task_rq_fair can then use the information to detect that it
needs to widen the search space for task placement in order to avoid
overloading the last-level cache domain's CPUs.
* * *
The reason I am investigating this change is the following use case
on ARM big.LITTLE (asymmetrical CPU capacity): 1 task per CPU, which
all repeatedly do X amount of work then
pthread_barrier_wait (i.e. sleep until the last task finishes its X
and hits the barrier). On big.LITTLE, the tasks which get a "big" CPU
finish faster, and then those CPUs pull over the tasks that are still
running:
v CPU v ->time->
-------------
0 (big) 11111 /333
-------------
1 (big) 22222 /444|
-------------
2 (LITTLE) 333333/
-------------
3 (LITTLE) 444444/
-------------
Now when task 4 hits the barrier (at |) and wakes the others up,
there are 4 tasks with prev_cpu=<big> and 0 tasks with
prev_cpu=<little>. want_affine therefore means that we'll only look
in CPUs 0 and 1 (sd_llc), so tasks will be unnecessarily coscheduled
on the bigs until the next load balance, something like this:
v CPU v ->time->
------------------------
0 (big) 11111 /333 31313\33333
------------------------
1 (big) 22222 /444|424\4444444
------------------------
2 (LITTLE) 333333/ \222222
------------------------
3 (LITTLE) 444444/ \1111
------------------------
^^^
underutilization
So, I'm trying to get want_affine = 0 for these tasks.
I don't _think_ any incarnation of the wakee_flips mechanism can help
us here because which task is waker and which tasks are wakees
generally changes with each iteration.
However pthread_barrier_wait (or more accurately FUTEX_WAKE) has the
nice property that we know exactly how many tasks are being woken, so
we can cheat.
It might be a disadvantage that we "widen" _every_ task that's woken in
an event, while select_idle_sibling would work fine for the first
sd_llc_size - 1 tasks.
IIUC, if wake_affine() behaves correctly this trick wouldn't be
necessary on SMP systems, so it might be best guarded by the presence
of SD_ASYM_CPUCAPACITY?
* * *
Final note..
In order to observe "perfect" behaviour for this use case, I also had
to disable the TTWU_QUEUE sched feature. Suppose during the wakeup
above we are working through the work queue and have placed tasks 3
and 2, and are about to place task 1:
v CPU v ->time->
--------------
0 (big) 11111 /333 3
--------------
1 (big) 22222 /444|4
--------------
2 (LITTLE) 333333/ 2
--------------
3 (LITTLE) 444444/ <- Task 1 should go here
--------------
If TTWU_QUEUE is enabled, we will not yet have enqueued task
2 (having instead sent a reschedule IPI) or attached its load to CPU
2. So we are likely to also place task 1 on cpu 2. Disabling
TTWU_QUEUE means that we enqueue task 2 before placing task 1,
solving this issue. TTWU_QUEUE is there to minimise rq lock
contention, and I guess that this contention is less of an issue on
big.LITTLE systems since they have relatively few CPUs, which
suggests the trade-off makes sense here.
Change-Id: I2080302839a263e0841a89efea8589ea53bbda9c
Signed-off-by: Brendan Jackman <brendan.jackman@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Josef Bacik <josef@toxicpanda.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Matt Fleming <matt@codeblueprint.co.uk>
Energy cost estimation has been a long lasting challenge for WALT
because WALT guides CPU frequency based on the CPU utilization of
previous window. Consequently it's not possible to know newly
waking-up task's energy cost until WALT's end of the current window.
The WALT already tracks 'Previous Runnable Sum' (prev_runnable_sum)
and 'Cumulative Runnable Average' (cr_avg). They are designed for
CPU frequency guidance and task placement but unfortunately both
are not suitable for the energy cost estimation.
It's because using prev_runnable_sum for energy cost calculation would
make us to account CPU and task's energy solely based on activity in the
previous window so for example, any task didn't have an activity in the
previous window will be accounted as a 'zero energy cost' task.
Energy estimation with cr_avg is what energy_diff() relies on at present.
However cr_avg can only represent instantaneous picture of energy cost
thus for example, if a CPU was fully occupied for an entire WALT window
and became idle just before window boundary, and if there is a wake-up,
energy_diff() accounts that CPU is a 'zero energy cost' CPU.
As a result, introduce a new accounting unit 'Cumulative Window Demand'.
The cumulative window demand tracks all the tasks' demands have seen in
current window which is neither instantaneous nor actual execution time.
Because task demand represents estimated scaled execution time when the
task runs a full window, accumulation of all the demands represents
predicted CPU load at the end of window.
Thus we can estimate CPU's frequency at the end of current WALT window
with the cumulative window demand.
The use of prev_runnable_sum for the CPU frequency guidance and cr_avg
for the task placement have not changed and these are going to be used
for both purpose while this patch aims to add an additional statistics.
Change-Id: I9908c77ead9973a26dea2b36c001c2baf944d4f5
Signed-off-by: Joonwoo Park <joonwoop@codeaurora.org>
Patch 5680f23f20 ("sched/fair: streamline find_best_target
heuristics") has reworked function find_best_target, as result the
variable "target_util" is useless now. So remove it.
Change-Id: I5447062419e5828a49115119984fac6cd37db034
Signed-off-by: Leo Yan <leo.yan@linaro.org>
schedtune_initialized is protected by CONFIG_CGROUP_SCHEDTUNE, but
is being used without CONFIG_CGROUP_SCHEDTUNE being defined. Add
appropriate ifdefs around the usage of schedtune_initialized to
avoid a compilation error when CONFIG_CGROUP_SCHEDTUNE is not
defined.
Change-Id: Iab79bf053d74db3eeb84c09d71d43b4e39746ed2
Signed-off-by: Russ Weight <russell.h.weight@intel.com>
Signed-off-by: Fei Yang <fei.yang@intel.com>
The group_max_util() function is used to compute the maximum utilization
across the CPUs of a certain energy_env configuration.
Its main client is the energy_diff function when it needs to compute the
SG capacity for one of the before/after scheduling candidates.
Currently, the energy_diff function sets util_delta = 0 when it wants to
compute the energy corresponding to the scheduling candidate where the
task runs in the previous CPU. This implies that, for the task waking up
in the previous CPU we consider only its blocked load tracked by the CPU
RQ. However, in case of a medium-big task which is waking up on a long
time idle CPU, this blocked load can be already completely decayed.
More in general, the current approach is biased towards under-estimating
the capacity requirements for the "before" scheduling candidate.
This patch fixes this by:
- always use the cpu_util_wake() to properly get the utilization of a CPU
without any (partially decayed) contribution of the waking up task
- adding the task utilization to the cpu_util_wake just for the target
cpu
The "target CPU" is defined by the energy_env to be either the src_cpu or
the dst_cpu, depending on which scheduling candidate we are considering.
Finally, since this update removes the last usage of calc_util_delta()
this function is now safely removed.
Change-Id: I20ee1bcf40cee6bf6e265fb2d32ef79061ad6ced
Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Signed-off-by: Chris Redpath <chris.redpath@arm.com>